This problem appeared as an assignment in the ** coursera course Machine Learning – Regression**, part of

**by the**

*Machine Learning specialization***University of Washington.**The following description of the problem is taken directly from the assignment.

The goal of this assignment is to implement an *LASSO Solver* using *coordinate descent*. The following are the sub-tasks:

- First the features are needed to be
*normalized*. - The
*coordinate descent*for*LASSO*needs to be implemented (with the*subgradient*of the*L1 penalty*). - The effects of
*L1 penalty*are going to be explored.

The following dataset (few rows and columns are shown in the below table) is from **house sales** in *King County*, the region where the city of Seattle, WA is located. This is going to be our input datatset for training the LASSO model, with the *response variable* as **price**.

## Normalize features

In the house dataset, features vary wildly in their relative magnitude: `sqft_living`

is very large overall compared to `bedrooms`

, for instance. As a result, weight for `sqft_living`

would be much smaller than weight for `bedrooms`

. This is problematic because “small” weights are dropped first as `l1_penalty`

goes up.

To give equal considerations for all features, we need to **normalize features** by dividing each feature by its *2-norm* so that the transformed feature has norm 1.

## Implementing Coordinate Descent with normalized features

**, we shall have**

*feature_j***):**

*z_j=1*## Effect of L1 penalty

Let us consider a simple model with 2 features:

simple_features = ['sqft_living', 'bedrooms']

## Single Coordinate Descent Step

Using the formula above, let’s implement *coordinate descent* that *minimizes* the *cost function* over a *single feature i*. Note that the intercept (weight 0) is not regularized. The function should accept feature matrix, output, current weights, *l1 penalty*, and index of feature to optimize over. The function should return new weight for feature i.

## Cyclical coordinate descent

Now that we have a function that optimizes the cost function over a single coordinate, let us implement cyclical coordinate descent where we optimize coordinates 0, 1, …, (k-1) in order and repeat.

When do we know to stop? Each time we scan all the coordinates (features) once, we measure the change in weight for each coordinate. If no coordinate changes by more than a specified threshold, we stop.

For each iteration:

- As we loop over features in order and perform coordinate descent, measure how much each coordinate changes.
- After the loop, if the maximum change across all coordinates is falls below the tolerance, stop. Otherwise, go back to step 1.

Return weights

**IMPORTANT: when computing a new weight for coordinate i, make sure to incorporate the new weights for coordinates 0, 1, …, i-1. One good way is to update the weights variable in-place. **

The following figure shows the **coefficient path** for **LASSO** for different values of the **L1-penalty **using two simple features

## Evaluating LASSO fit with more features

Let us consider the following set of features.

```
all_features = ['bedrooms',
'bathrooms',
'sqft_living',
'sqft_lot',
'floors',
'waterfront',
'view',
'condition',
'grade',
'sqft_above',
'sqft_basement',
'yr_built',
'yr_renovated']
```

The following figure shows the **coefficient path** for **LASSO** for different values of the **L1-penalty **using all the above features: