$$ \newcommand{\st}{\text{ s.t. }} \newcommand{\and}{\text{ and }} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\O}{\mathcal{O}} \newcommand{\dist}{\text{dist}} \newcommand{\vec}[1]{\mathbf{#1}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\d}{\mathrm{d}} \newcommand{\L}{\mathcal{L}} \newcommand{\Tr}{\mathrm{\mathbf{Tr}}} \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\indep}{\perp \!\!\! \perp} \newcommand{\KL}[2]{\mathrm{KL}(#1 \parallel #2)} \newcommand{\W}{\mathbf{W}} % Wasserstein distance \newcommand{\SW}{\mathbf{SW}} % Sliced-Wasserstein distance $$

Hypothesis Testing

We present different methods to test data against hypotheses. Statistical test We consider the null hypothesis ($H_0$) and the alternative hypothesis ($H_1$). We are interested in rejecting or not the null hypothesis. Definition: Null hypothesis The null hypothesis $H_0$ is that considered true in the absence of data (default choice). Here, let $\delta$ denote the decision function used to reject or not the null hypothesis. $$ \delta(x) = \begin{cases} 0 & \text{do not reject $H_0$} \\ 1 & \text{reject $H_0$ in favor of $H_1$} \end{cases} $$ Definition: Error types The Type-I error rate is the rate of false positives: $\alpha = \mathbb{P}(\delta(x) = 1 \mid H_0)$. The Type-II error rate is the rate of false negatives: $\beta = \mathbb{P}(\delta(x) = 0 \mid H_1)$. Example If the question is “Is there a danger?”, the null hypothesis is the absence of any danger. A type-I error corresponds to a false alarm, while a type-II error rate corresponds to the non-detection of the danger. Info: Neyman-Pearson principle In order to do a hypothesis test, we first set $\alpha$ (test level) and, then, we try to minimize $\beta$ as much as possible. The power of the test is $1 - \beta$. Definition: $p$-value The $p$-value of a sample is the probability of observing a given value under the null hypothesis. Example Consider a test of level $\alpha = 5\%$. The null hypothesis is rejected whenever the observed sample has a $p$-value below $\alpha$. If the $p$-value is $1\%$, there is a $1\%$ (or less) probability of having observed this sample under the null hypothesis. So, the null hypothesis is rejected with high confidence. Parametric model In parametric models, the hypotheses form a subset of the parameters: ...

October 15, 2024 · 7 min

Bayesian Statistics

Tip: Beta distribution The Beta distribution is a continuous probability distribution defined on the interval $[0, 1]$. It has two parameters: $\theta = (\alpha, \beta)$. Then, we have: $$ \begin{align*} p_\theta(x) &= \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)} \\ \mathbb{E}[X] &= \frac{\alpha}{\alpha + \beta} \\ \mathrm{Var}(X) &= \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}, \end{align*} $$ where $B(\alpha, \beta)$ is the Beta function, defined as: ...

October 7, 2024 · 4 min

Estimation methods

Consider a statistical model with unknown parameter $\theta$. We want to develop some methods to find $\theta$. $\DeclareMathOperator*{\argmax}{arg \,max \,} \DeclareMathOperator*{\argmin}{arg \,min \,}$ Rao-Blackwell theorem Tip For any two random variables $X$ and $Y$, $$ \begin{align*} \mathbb{E}[\mathbb{E}[X \mid Y]] &= \mathbb{E}[X] \\ \mathbb{E}[\mathbb{E}[X \mid Y]^2] &\leq \mathbb{E}[\mathbb{E}[X^2 \mid Y]] \\ \end{align*} $$ Theorem: Rao-Blackwell theorem Let $T$ be a sufficient statistic, and let $\delta$ be an unbiased estimator of $\theta$. The estimator $\hat{\theta}$, defined as follows, is unbiased and has a lower quadratic risk than $\delta$. ...

October 1, 2024 · 7 min

Efficient Estimation

Our goal is to characterize efficient estimators for $\theta$ in terms of mean squared error using the notion of Fisher information. Estimator Let $P_\theta$ be a probability distribution where $\theta \in \Theta \subset \mathbb{R}^d$, $d \in \mathbb{N}$. Definition: Estimator An estimator of $\theta$ is any statistic $\hat{\theta}$ taking values in $\Theta$. Bias We want $\hat{\theta}(X)$ to be close to $\theta$. Since the estimator is a random variable, we can calculate its expectation. ...

September 24, 2024 · 11 min

Statistics and Information

Parametric model A statistical model is parametric if the probability distribution of $X$ belongs to some family of distributions indexed by some parameter $\theta$ of finite dimension. Definition: Parametric model A parametric model is a set of probability distributions $\mathcal{P} = \{P_\theta, \theta \in \Theta\}$ with $\Theta \subset \mathbb{R}^d$ for some finite dimension $d$. Our main goal is to use the observations $X_1, \dots, X_n$ to learn the value of $\theta$. Note that it is possible to do so only if each probability distribution $P_\theta \in \mathcal{P}$ is defined by a unique parameter $\theta$. ...

September 17, 2024 · 18 min