Bayesian Inference

Bayes' theorem provides the mathematical basis to the solution of the problem of estimation. Suppose a discretised system S can be described by the vector $\bf x$ containing the unknown parameter variables for which one would like to obtain estimates. $\bf x$ will be referred to as the parameter vector. Let us assume that a number of measurements of the system S are available, being represented by a vector $\bf d$. Using Bayes' rule, one can write

{\rm p} \left( {\bf x} \vert {\bf d} \right) = \frac{{\rm p} \left( {\bf d} \vert {\bf x} \right) {\rm p} \left( {\bf x} \right)}{\displaystyle\int {\rm p} \left( {\bf d} \vert {\bf x} \right) {\rm p} \left( {\bf x} \right) {\rm d}{\bf x}}
where
• ${\rm p} \left( {\bf x} \vert {\bf d} \right)$ is the posterior probability density function of $\bf x$. It corresponds to the probability density of the parameter of interest, namely $\bf x$, given the measurements $\bf d$. ${\rm p} \left( {\bf x} \vert {\bf d} \right)$ therefore reflects all of the information provided by the measurements.
• ${\rm p} \left( {\bf d} \vert {\bf x} \right)$ is the likelihood function. It is derived directly from the measurement equation that relates the measurements to the state vector and the noise process that corrupts them. In most scenarios, this function can be evaluated without much conceptual or computational difficulty.
• ${\rm p} \left( {\bf x} \right)$ is the prior probability density function of $\bf x$. It represents the knowledge of the state vector prior to the assimilation of the measurements.
The main difficulty of implementing Bayes' rule in practice is the evaluation of the normalizing integral (i.e. the denominator in the above equation). In the general case, this integral can not be obtained analytically and its numerical approximation becomes increasingly difficult to obtain as $\bf x$ increases in dimension. Fortunately, Markov Chain Monte Carlo (MCMC) methods provide samples from complex distributions while circumventing the need for high-dimensional integration. MCMC approaches provide new samples that are generated using the previous sample(s) by constructing a Markov chain. MCMC methods are rooted in the Metropolis algorithm which was initially employed by physicists to compute complex integrals by rewriting them as expectations of some distributions, whereby MCMC provides samples from these distributions.

Application: Signal Identification

In this example, Bayesian inference will be used for identification of a noise-contaminated sinusoidal signal. More specifically, the problem involves identifying the parameters $a_1, a_2, a_3$ for

y\left( t\right) = a_1 {\rm cos} \left( a_3 t\right) + a_2 {\rm sin} \left( a_3 t\right)

The noisy measurements $\inline y_k$, $\inline k = 1, ..., N$ are given by

y_k = a_1 {\rm cos} \left( a_3 t_k\right) + a_2 {\rm sin} \left( a_3 t_k\right) + n_k = \hat{y}_k + n_k

where $n_k$ are independent and identically distributed (IID) Gaussian random variables with zero mean and variance equal to $\gamma$, to model Gaussian white noise.

According to Bayes' theorem, the posterior pdf of $a_1, a_2, a_3$ is proportional to the product of likelihood function and prior pdf:

{\rm p} (a_1, a_2, a_3 \vert y_1, \hdots, y_N) \ \propto \ {\rm p}(y_1, \hdots, y_N \vert a_1, a_2, a_3) {\rm p}(a_1, a_2, a_3)

where the likelihood pdf is given by
{\rm p} (y_1, \hdots, y_N \vert a_1, a_2, a_3) \propto \prod_{i=1}^{N} {\rm p} (y_i \vert a_1, a_2, a_3)

For this example, as the data is sufficiently dense in time, a flat prior will be used, i.e. ${\rm p}(a_1, a_2, a_3) \propto 1$, and thus the posterior pdf is equal to the likelihood pdf. In certain cases, the likelihood function needs to be complemented by an informative prior to provide a meaningful estimate of the parameters. This depends on how much information regarding the unknown parameters can one extract from the measurements. In this case, it will be assumed that the available observations contain enough information to provide meaningful estimates. Possible realizations of the true signal are obtained from MCMC simulation of the posterior pdf of $a_1, a_2, a_3$. These, along with the marginal posterior distributions of the parameters, are shown below using 100,000 MCMC samples. The black curve represents the true signal, for which only noisy observations are available (shown using red circles). One can observe the unimodality in the marginal pdfs, indicating the well-posedness of the problem.