Statistics

Hypothesis Testing

We present different methods to test data against hypotheses. Statistical test We consider the null hypothesis ($H_0$) and the alternative hypothesis ($H_1$). We are interested in rejecting or not the null hypothesis. Definition: Null hypothesis The null hypothesis $H_0$ is that considered true in the absence of data (default choice). Here, let $\delta$ denote the decision function used to reject or not the null hypothesis. $$ \delta(x) = \begin{cases} 0 & \text{do not reject $H_0$} \\ 1 & \text{reject $H_0$ in favor of $H_1$} \end{cases} $$ Definition: Error types The Type-I error rate is the rate of false positives: $\alpha = \mathbb{P}(\delta(x) = 1 \mid H_0)$....

Bayesian Statistics

Tip: Beta distribution The Beta distribution is a continuous probability distribution defined on the interval $[0, 1]$. It has two parameters: $\theta = (\alpha, \beta)$. Then, we have: $$ \begin{align*} p_\theta(x) &= \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)} \\ \mathbb{E}[X] &= \frac{\alpha}{\alpha + \beta} \\ \mathrm{Var}(X) &= \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} \end{align*} $$ where $B(\alpha, \beta)$ is the Beta function, defined as: $$ B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)} $$ Here, we are considering $\theta \in \Theta$ as a random variable....

Estimation methods

Consider a statistical model with unknown parameter $\theta$. We want to develop some methods to find $\theta$. $\DeclareMathOperator*{\argmax}{arg \,max \,} \DeclareMathOperator*{\argmin}{arg \,min \,}$ Rao-Blackwell theorem Tip For any two random variables $X$ and $Y$, $$ \begin{align*} \mathbb{E}[\mathbb{E}[X \mid Y]] &= \mathbb{E}[X] \\ \mathbb{E}[\mathbb{E}[X \mid Y]^2] &\leq \mathbb{E}[\mathbb{E}[X^2 \mid Y]] \\ \end{align*} $$ Theorem: Rao-Blackwell theorem Let $T$ be a sufficient statistic, and let $\delta$ be an unbiased estimator of $\theta$. The estimator $\hat{\theta}$, defined as follows, is unbiased and has a lower quadratic risk than $\delta$....

Efficient Estimation

Our goal is to characterize efficient estimators for $\theta$ in terms of mean squared error using the notion of Fisher information. Estimator Let $P_\theta$ be a probability distribution where $\theta \in \Theta \subset \mathbb{R}^d$, $d \in \mathbb{N}$. Definition: Estimator An estimator of $\theta$ is any statistic $\hat{\theta}$ taking values in $\Theta$. Bias We want $\hat{\theta}(X)$ to be close to $\theta$. Since the estimator is a random variable, we can calculate its expectation....

Statistics and Information

Parametric model A statistical model is parametric if the probability distribution of $X$ belongs to some family of distributions indexed by some parameter $\theta$ of finite dimension. Definition: Parametric model A parametric model is a set of probability distributions $\mathcal{P} = \{P_\theta, \theta \in \Theta\}$ with $\Theta \subset \mathbb{R}^d$ for some finite dimension $d$. Our main goal is to use the observations $X_1, \dots, X_n$ to learn the value of $\theta$....