The following problems appeared as a project in the *edX course ColumbiaX: CSMM.102x Machine Learning*. The following problem description is taken from the course project itself.

### INSTRUCTIONS

This assignment has two parts that we need to implement.

#### PART 1

In this part we need to implement the **ℓ2-regularized least squares linear regression** algorithm (**ridge regression**). The *objective function* takes the following form:

The task will be to implement a function that takes in data **y** and **X** and outputs **w_RR** for an arbitrary value of ** λ**.

Let’s use the following equation to compute the *weights* at a single step, given that we know that the closed-form solution for *Ridge Regression* exists, as shown in the following figure. We need to be sure to **exclude** the **intercept** from **L2** penalty, either by scaling the data appropriately or explicitly forcing the corresponding term in the identity matrix to zero.

The following table shows a few rows of the training data along with the intercept term as the last column.

0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|

0 | 8.34 | 40.77 | 1010.84 | 90.01 | 1 |

1 | 23.64 | 58.49 | 1011.40 | 74.20 | 1 |

2 | 29.74 | 56.90 | 1007.15 | 41.91 | 1 |

3 | 19.07 | 49.69 | 1007.22 | 76.79 | 1 |

4 | 11.80 | 40.66 | 1017.13 | 97.20 | 1 |

The following figures show the **Ridge coefficient paths** with different **λ** values.

#### PART 2

Next task is to implement the **active learning procedure**. For this problem, we are provided with an arbitrary setting of λ and σ2 and asked to provide with the first 10 locations to be measured from a set D={x}D={x} given a set of measured pairs **(y,X)**. Need to be careful about the sequential evolution of the sets **D** and **(y,X)**.

The following theory from **Bayesian Linear Regression** will be used to compute the **uncertainty** in **prediction** with the **posterior distribution** and for **active learning** the data points from the new dataset for which the model is the most uncertain in prediction and use **Bayesian Sequential Posterior Update** as shown in the following figures:

The following animation shows the **first 10 data points** chosen using **active learning **with different** λ **and

**parameters:**

*σ^2**λ = 1,** σ^2 = 2*

*λ = 2,** σ^2 = 3*

*λ = 10,** σ^2 = 5*