Felipe Vicentin

Decision Trees & Random Forests

Classification models We are interested in classifying a data set among many classes. Each point in the data set has many attributes. Those can be either discrete or continuous, but the classes can only be discrete. If a continuous class is required, then we should use a regression model. It is also worth noting that the classes have no order relation (we cannot say that class 5 is greater than class 2)....

Estimation methods

Consider a statistical model with unknown parameter $\theta$. We want to develop some methods to find $\theta$. $\DeclareMathOperator*{\argmax}{arg \,max \,} \DeclareMathOperator*{\argmin}{arg \,min \,}$ Rao-Blackwell theorem Tip For any two random variables $X$ and $Y$, $$ \begin{align*} \mathbb{E}[\mathbb{E}[X \mid Y]] &= \mathbb{E}[X] \\ \mathbb{E}[\mathbb{E}[X \mid Y]^2] &\leq \mathbb{E}[\mathbb{E}[X^2 \mid Y]] \\ \end{align*} $$ Theorem: Rao-Blackwell theorem Let $T$ be a sufficient statistic, and let $\delta$ be an unbiased estimator of $\theta$. The estimator $\hat{\theta}$, defined as follows, is unbiased and has a lower quadratic risk than $\delta$....

Image restoration

There are inherit defects when an image is captured. Noise The noise is an error of measurement in each pixel. There are essentially 2 sources: Read noise (uniform); Photonic noise (depends on the luminosity): each photon has a probability $p$ to be measured ($p < 1$ to read only the color we want) in a Poisson distribution, then, $\mathbb{E}[Y] = N p (1 - p)$ and $\mathrm{Var}(Y) = N p (1 - p)$....

Interpolation, geometric transformations and filtering

Interpolation Our goal is to calculate the value of a pixel that is outside the grid of the image. To this end, we will interpolate the image. Depending on our hypothesis, we can arrive to different interpolation methods. Constant by parts: nearest neighbor; Continue: bi-linear; Polynomial of degree 3: bi-cubic; Limited band: Shannon interpolation. Method Initialization Operations per pixel Nearest neighbor $0$ $1$ Bi-linear $0$ $4$ Bi-cubic $4 N ^2$ $16$ Shannon $0$ $N^2$ We will call our discrete image $I_d$ and the continuous image after interpolation $I_c$....

Efficient Estimation

Our goal is to characterize efficient estimators for $\theta$ in terms of mean squared error using the notion of Fisher information. Estimator Let $P_\theta$ be a probability distribution where $\theta \in \Theta \subset \mathbb{R}^d$, $d \in \mathbb{N}$. Definition: Estimator An estimator of $\theta$ is any statistic $\hat{\theta}$ taking values in $\Theta$. Bias We want $\hat{\theta}(X)$ to be close to $\theta$. Since the estimator is a random variable, we can calculate its expectation....

Contrast

The perception of an image’s content changes little when an increasing function is applied to it. However, if the function is non-increasing, then the perception of the content changes completely. We are sensible to local rather than global changes of contrast. From now on, we are denoting $u \colon \Omega \to \mathbb{R}$ our image, where $\vert \Omega \vert = M \times N$ is a discrete rectangular grid. Also, we are assuming that $u$ takes discrete values $y_0 < \dots < y_{n - 1}$ (usually $0$ to $255$)....

Clustering

Our goal in clustering is to group similar data points together. Each group will be called a cluster. Ideally, the intra-cluster distances are minimized and the inter-cluster distances are maximized. Note that this is an unsupervised model, so the following cannot be considered as clustering: Supervised classification; Simple segmentation; Results of a query; Graph partitioning. There are two types of clustering: Partitional clustering: divide data into non-overlapping subsets & each data is in exactly one subset; Hierarchical clustering: A set of nested clusters organized as a hierarchical tree....

Linear Systems

Let $A \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$. Our goal is to solve the linear system $A\mathbf{x} = \mathbf{b}$ where $\mathbf{x} \in \mathbb{R}^n$. Info Some facts about matrices: If $A A^\top = A^\top A$, then $A$ is normal; If $A = A^\top$, then $A$ is symmetric; If $A A^\top = A^\top A = I_n$, then $A$ is orthogonal. Norms The following norms are used often: $L_1$ norm: $\Vert \mathbf{x} \Vert_1 = \sum_{i=1}^n |x_i|$; $L_2$ norm: $\Vert \mathbf{x} \Vert_2 = \sqrt{\sum_{i=1}^n x_i^2}$; Frobenius norm: $\Vert A \Vert_F = \sqrt{\mathrm{tr}(A^\top A)}$; $L_1$ norm: $\Vert A \Vert_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|$, which is the maximum absolute column sum of $A$; $L_\infty$ norm: $\Vert A \Vert_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|$, which is the maximum absolute row sum of $A$; $L_2$ norm: $\Vert A \Vert_2 = \rho(A) = \max\{\vert \lambda(A) \vert \}$, which is the spectral radius of $A$, i....

Statistics and Information

Parametric model A statistical model is parametric if the probability distribution of $X$ belongs to some family of distributions indexed by some parameter $\theta$ of finite dimension. Definition: Parametric model A parametric model is a set of probability distributions $\mathcal{P} = \{P_\theta, \theta \in \Theta\}$ with $\Theta \subset \mathbb{R}^d$ for some finite dimension $d$. Our main goal is to use the observations $X_1, \dots, X_n$ to learn the value of $\theta$....

Image aquisition

The pinhole model The first very simple way to acquire images is with the pinhole model. Here, part of the light coming from the object passes through a small aperture $O$ and is projected onto the focal plane. We do this so that each point of the object is represented by a ray. Otherwise, the image would not be formed. The distance $f$ is called the focal length. Let’s suppose we have the following model to describe the pinhole:...