$$ \newcommand{\st}{\text{ s.t. }} \newcommand{\and}{\text{ and }} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\liminf}{lim\,inf} \DeclareMathOperator*{\limsup}{lim\,sup} \DeclareMathOperator*{\dom}{dom} \DeclareMathOperator*{\epi}{epi} \newcommand{\<}{\langle} \newcommand{\>}{\rangle} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\O}{\mathcal{O}} \newcommand{\dist}{\text{dist}} \newcommand{\vec}[1]{\mathbf{#1}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\d}{\mathrm{d}} \newcommand{\L}{\mathcal{L}} \newcommand{\H}{\mathcal{H}} \newcommand{\Tr}{\mathrm{\mathbf{Tr}}} \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\indep}{\perp \!\!\! \perp} \newcommand{\KL}[2]{\mathrm{KL}(#1 \parallel #2)} \newcommand{\W}{\mathbf{W}} % Wasserstein distance \newcommand{\SW}{\mathbf{SW}} % Sliced-Wasserstein distance $$

Rule Mining

Goal: Identify items that are bought together by sufficiently many customers. We will do so by finding dependencies among items from data collected. Our input is composed of a table of items and a table of baskets. Each basket is a list of items purchased together. Then, we use the baskets to find dependencies of the form $\{x, y, z\} \rightarrow \{v, w\}$. Definition: Support The support of a set of items $I$ is the number of baskets that contain all the items in $I$. Given a support threshold, we can classify itemsets as frequent of not. ...

November 6, 2024 · 2 min

Non-Linear Programming

Tip: Vectorial product rule Let $F, G \colon \mathbb{R}^n \to \mathbb{R}^n$. Then, $$ \nabla (F(X)^\top G(X)) = \nabla F(X)^\top G(X) + \nabla G(X)^\top F(X) $$ $$ \DeclareMathOperator*{\argmax}{arg \,max \,} \DeclareMathOperator*{\argmin}{arg \,min \,} $$Without constraints We are interested in minimizing an arbitrary function $f \colon \mathbb{R}^n \to \mathbb{R}$. To do so, we will use the gradient descent method. Let $\mathbf{x}^{(0)} \in \mathbb{R}^n$ be a random starting point. Then, for $k \in \mathbb{N}$, ...

November 5, 2024 · 2 min

Linear Programming: Simplex method

Simplex We want to solve the following problem: $$ \max{z} = \sum_{j = 1}^n c_j x_j $$ where $$ \begin{cases} \sum_{j = 1}^n a_{ij} x_j &\leq b_i, \qquad i \in \{1, \dots, m\} \\ x_j &\geq 0, \qquad j \in \{1, \dots, n\} \end{cases} $$Then, we find a tuple $x^\ast = (x_1^\ast, \dots, x_n^\ast)$ that is the solution of the optimization problem. To this end, we build dictionaries such that ...

October 30, 2024 · 4 min

Mathematical Morphology

In mathematical morphology, the basic structure of an image is a complete lattice. Definition: Complete lattice A complete lattice is a set $K$ equipped with an order relation $\leq$ that satisfies: Reflexivity: $\forall x \in K$, $x \leq x$; Antisymmetry: $\forall x, y \in K$, $x \leq y$ and $y \leq x$ implies $x = y$; Transitivity: $\forall x, y, z \in K$, $x \leq y$ and $y \leq z$ implies $x \leq z$; $\forall x, y \in K$, the supremum and infimum of $x$ and $y$ exist and are denoted $x \lor y$ and $x \land y$ respectively. We will focus on the boolean lattice and the function lattice. ...

October 21, 2024 · 5 min

Hypothesis Testing

We present different methods to test data against hypotheses. Statistical test We consider the null hypothesis ($H_0$) and the alternative hypothesis ($H_1$). We are interested in rejecting or not the null hypothesis. Definition: Null hypothesis The null hypothesis $H_0$ is that considered true in the absence of data (default choice). Here, let $\delta$ denote the decision function used to reject or not the null hypothesis. $$ \delta(x) = \begin{cases} 0 & \text{do not reject $H_0$} \\ 1 & \text{reject $H_0$ in favor of $H_1$} \end{cases} $$ Definition: Error types The Type-I error rate is the rate of false positives: $\alpha = \mathbb{P}(\delta(x) = 1 \mid H_0)$. The Type-II error rate is the rate of false negatives: $\beta = \mathbb{P}(\delta(x) = 0 \mid H_1)$. Example If the question is “Is there a danger?”, the null hypothesis is the absence of any danger. A type-I error corresponds to a false alarm, while a type-II error rate corresponds to the non-detection of the danger. Info: Neyman-Pearson principle In order to do a hypothesis test, we first set $\alpha$ (test level) and, then, we try to minimize $\beta$ as much as possible. The power of the test is $1 - \beta$. Definition: $p$-value The $p$-value of a sample is the probability of observing a given value under the null hypothesis. Example Consider a test of level $\alpha = 5\%$. The null hypothesis is rejected whenever the observed sample has a $p$-value below $\alpha$. If the $p$-value is $1\%$, there is a $1\%$ (or less) probability of having observed this sample under the null hypothesis. So, the null hypothesis is rejected with high confidence. Parametric model In parametric models, the hypotheses form a subset of the parameters: ...

October 15, 2024 · 7 min