# Active Learning using uncertainties in the Posterior Predictive Distribution with Bayesian Linear Ridge Regression in Python

The following problems appeared as a project in the edX course ColumbiaX: CSMM.102x Machine Learning. The following problem description is taken from the course project itself.

### INSTRUCTIONS

This assignment has two parts that we need to implement.

#### PART 1

In this part we need to implement the ℓ2-regularized least squares linear regression algorithm (ridge regression). The objective function takes the following form:

The task will be to implement a function that takes in data y and X and outputs w_RR for an arbitrary value of λ.

Let’s use the following equation to compute the weights at a single step, given that we know that the closed-form solution for Ridge Regression exists, as shown in the following figure. We need to be sure to exclude the intercept from L2 penalty, either by scaling the data appropriately or explicitly forcing the corresponding term in the identity matrix to zero.

The following table shows a few rows of the training data along with the intercept term as the last column.

0 1 2 3 4
0 8.34 40.77 1010.84 90.01 1
1 23.64 58.49 1011.40 74.20 1
2 29.74 56.90 1007.15 41.91 1
3 19.07 49.69 1007.22 76.79 1
4 11.80 40.66 1017.13 97.20 1

The following figures show the Ridge coefficient paths with different λ values.

#### PART 2

Next task is to implement the active learning procedure. For this problem, we are provided with an arbitrary setting of λ and σ2 and asked to provide with the first 10 locations to be measured from a set D={x}D={x} given a set of measured pairs (y,X). Need to be careful about the sequential evolution of the sets D and (y,X).

The following theory from Bayesian Linear Regression will be used to compute the uncertainty in prediction with the posterior distribution and for active learning the data points from the new dataset for which the model is the most uncertain in prediction and use Bayesian Sequential Posterior Update as shown in the following figures:

The following animation shows the first 10 data points chosen using active learning with different λ and σ^2 parameters:

λ = 1, σ^2 = 2

λ = 2, σ^2 = 3

λ = 10, σ^2 = 5