The Beta distribution is a continuous probability distribution defined on the interval $[0, 1]$. It has two parameters: $\theta = (\alpha, \beta)$. Then, we have:
$$ \begin{align*} p_\theta(x) &= \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)} \\ \mathbb{E}[X] &= \frac{\alpha}{\alpha + \beta} \\ \mathrm{Var}(X) &= \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} \end{align*} $$where $B(\alpha, \beta)$ is the Beta function, defined as:
$$ B(\alpha, \beta) = \frac{\Gamma(\alpha) \Gamma(\beta)}{\Gamma(\alpha + \beta)} $$Here, we are considering $\theta \in \Theta$ as a random variable. $\DeclareMathOperator*{\argmax}{arg \,max \,} \DeclareMathOperator*{\argmin}{arg \,min \,}$
Prior and posterior distributions
For the Bernoulli model with an uniform prior, we get:
$$ \pi(\theta) = \mathbb{1}_{[0, 1]}(\theta) $$and
$$ \mathbb{P}_\theta(X) = \mathbb{P}(x \mid \theta) \sim \mathcal{B}(\theta). $$We can compute the distribution of $\theta$ after observation $x$ using Bayes’s theorem:
$$ p(\theta \mid x) = \frac{p(x \mid \theta) \pi(\theta)}{p(x)}. $$Here, we consider $p(x)$ as a constant. This gives rise to the posterior of $\theta$.
The posterior $p(\theta \mid x)$ is the distribution of $\theta$ after observation; it depends on the prior:
$$ p(\theta \mid x) \propto \pi(\theta) p(x \mid \theta) $$For the Bernoulli model with an uniform prior, we have $p(x \mid \theta) = \theta^x (1 - \theta)^{1 - x}$, and $\pi(\theta) = \mathbb{1}_{[0, 1]}(\theta)$, so that:
$$ \begin{align*} p(\theta \mid X = 0) &= \mathbb{1}_{[0, 1]}(\theta) (1 -\theta), \\ p(\theta \mid X = 1) &= \mathbb{1}_{[0, 1]}(\theta) \theta. \end{align*} $$Since the posterior is also a distribution, it must obey the usual property of distributions:
$$ \int_{-\infty}^{\infty} p(\theta \mid x) \, \mathrm{d}\theta = 1. $$This way, we can deduce that
$$ \begin{align*} p(\theta \mid X = 0) &= 2 (1 - \theta) \mathbb{1}_{[0, 1]}(\theta), \\ p(\theta \mid X = 1) &= 2 \theta \mathbb{1}_{[0, 1]}(\theta). \end{align*} $$These are $\mathrm{Beta}(2, 1)$ and $\mathrm{Beta}(1, 2)$ distributions, respectively.
Bayes metrics
Bayes bias and Bayes quadratic risk are the expected values of the bias and the quadratic risk according to the prior distribution:
$$ \begin{align*} b &= \mathbb{E}[\hat{\theta} - \theta] \\ R &= \mathbb{E}[(\hat{\theta} - \theta)^2] \end{align*} $$Bayes estimator
Given the posterior distribution $p(\theta \mid x)$, a natural choice for estimating $\theta$ is the expected value. This is called Bayes estimator.
Consider the Bernoulli model with a uniform prior. We already know that
$$ \begin{align*} p(\theta \mid X = 0) &= 2 (1 - \theta) \mathbb{1}_{[0, 1]}(\theta), \\ p(\theta \mid X = 1) &= 2 \theta \mathbb{1}_{[0, 1]}(\theta). \end{align*} $$Then,
$$ \begin{align*} \hat{\theta}(0) &= \mathbb{E}[\theta \mid X = 0] \\ &= \int_{-\infty}^{\infty} \theta \times 2 (1 - \theta) \mathbb{1}_{[0, 1]}(\theta) \, \mathrm{d} \theta \\ &= \int_0^1 2 \theta - 2 \theta^2 \, \mathrm{d} \theta \\ &= \frac{1}{3}. \end{align*} $$Similarly, $\hat{\theta}(1) = \frac{2}{3}$.
Maximum a posteriori
Another natural choice for the estimator is the mode of the posterior distribution. This is called the Maximum A Posteriori (MAP) estimator.
Consider the Bernoulli model with a uniform prior. We already know that
$$ \begin{align*} p(\theta \mid X = 0) &= 2 (1 - \theta) \mathbb{1}_{[0, 1]}(\theta), \\ p(\theta \mid X = 1) &= 2 \theta \mathbb{1}_{[0, 1]}(\theta). \end{align*} $$Then,
$$ \begin{align*} \hat{\theta}(0) &= \argmax_{\theta \in [0, 1]}{2 (1 - \theta) \mathbb{1}_{[0, 1]}(\theta)} \\ &= 0 \end{align*} $$Similarly, $\hat{\theta}(1) = 1$. Observation: in this example, the MAP estimator coincides with the MLE estimator.
Conjugate prior
When the prior and posterior distributions belong to the same family (are conjugate), it is usually harder to compute the posterior distribution.
Consider the Bernoulli model with $n$ observations. The Beta distribution is a conjugate prior. Let $s = \sum_{i = 1}^n x_i$ and assume the prior is a Beta distribution with parameters $(a, b)$:
$$ \pi(\theta) \propto \theta^{a - 1} (1 - \theta)^{b - 1}. $$Since $p(x \mid \theta) = \theta^s (1 - \theta)^{n - s}$, we have the following:
$$ \begin{align*} p(\theta \mid x) &\propto \pi(\theta) \times p(x \mid \theta) \\ &\propto \theta^{a - 1} (1 - \theta)^{b - 1} \times \theta^s (1 - \theta)^{n - s} \\ &\propto \theta^{a + s - 1} (1 - \theta)^{n + b - s - 1} \\ &\sim \mathrm{Beta}(a + s, b + n - s) \end{align*} $$Therefore, the posterior distribution has a Beta distribution as well.