In this article, the *low rank matrix facotrization* will be used to predict the *unrated movie ratings* for the users from *MovieLense (100k)* dataset (given that each user has rated some of the movies). The data was collected through the MovieLens web site (*movielens.umn.edu*) during the seven-month period from September 19th, 1997 through April 22nd, 1998. The data set contains about 100,000 ratings (1-5) from 943 users on 1664 movies.

- Each cell R{i,j} of the
*movie rating matrix*contains the*Rating*given to the i_th movie by the j_th user (if any, otherwise*NA*), so that the rows represent the movies and the columns represents the users. - The intuition behind using
*matrix factorization*is that there must be some*latent features*(much smaller than the number of users and the number of movies) that determine how a user rates a movie. We have a set n_m of movies and a set n_u of users and a rating matrix R of size n_m×n_u be the matrix that contains all the ratings that the movies are assigned by the users. We are interested to discover n*latent features*(n=10 latent features are used). Then our task, then, is to find two matrices θ (a n_m×n matrix) and X (a n_u×n matrix) such that their product approximates R, such that the error for the users/movie pairs where we know the correct ratings is minimized. - The couple of optimization problems are to be solved that can be done in two different ways:
- Solve with
*Alternate Least Squares*Method (appears as an assigment in the*BerkeleyX edX Course on Big Data Analysis with Spark*): The*ALS*algorithm first randomly fills the users matrix with values and then optimizes the value of the movies such that the error is minimized. Then, it holds the movies matrix constant and optimizes the value of the user’s matrix. This alternation between which matrix to optimize is the reason for the “alternating” in the name. Given a fixed set of user factors (i.e., values in the users matrix), we use the known ratings to find the best values for the movie factors using the optimization written at the bottom of the figure. Then we “alternate” and pick the best user factors given fixed movie factors. The next figure shows the outline of this algorithm. *Simultaneous optimization*Method (from the*Coursera ML Course by Stanford Prof. Andrew Ng.*): By combining the two optimization problems and solving them simultaneously. Both the methods are show in the next figure.

- Solve with
- The next figures show the results from both the algorithms:
*Conjugate Gradient*method (with 100 iterations) is used in both cases and both the methods were iteratively called for*20*times, with regularization parameter λ=10. Then the*RMSE*value between the*original rating matrix*and the*estimated rating matrix*is computed for both the cases. The following figures show that the*Simultaneous Optimization*method achieved lower*RMSE*value immediately, while*ALS*took some iterations to get to the same*RMSE*value. - The next figures show how the
*RMSE*changed with the*regularization parameter*λ, for both the methods. As expected, with lower values of λ the*RMSE*is much lower (due to possible*overfitting*). - The next figures show the
*original rating matrix*and the*estimated rating matrix*with*ALS matrix factorization*(the rating matrix is shown as an*adjacency matrix*of a*neighborhood graph*).

Advertisements