This semester I’m taking the Ecommerce System Structure, and we have come to learn about personalization. In Ecommerce, the form of personalization is to recommend items to the user.
Recommendation Metrics
 Prediction accuracy
 Coverage
 Novelty
 Diversity
 Privacy
 Scalability
Recommendation Strategies
 Popularity
 Association role
 Content based
 Collaborative filtering
 Hybrid
In the corresponding experiment, our assignment is to read the paper ItemBased Collaborative Filtering Recommendation Algorithms, a famous paper in recommender system. And I’d like to take a note of some key point, with some specific examples and implementation codes.
Overview of Collaborative Filtering Based Recommender Systems
Figure 1 shows the schematic diagram of the collaborative filtering process. CF algorithms represent the entire m×n useritem data as a ratings matrix, A. Here ‘m’ is the number of users and ‘n’ is the number of items. Each entry in A represents the preference score (ratings) of the ith user on the jth item. Each individual ratings is within a numerical scale(e.g., from 1 to 5, the higher the better) and it can as well be 0 or NULL indicating that the user has not yet rated that item.
Prediction is a numerical value.
Recommendation is a list of N items.
The image may be a bit abstract. Here’s my specific example with simplified toy examples. The rating scales from 1 to 5, where 1 indicates ‘dislike the movie very much’, and 5 indicates ‘like the movie very much’. If the rating is blank, then the user has not seen(voted) the movie yet. I’d refer to this table later.
Harry Potter  Pirates of the Caribbean  Titanic  Avatar  Transformers  

Alice  3  5  4  1  
Bob  3  4  4  1  
Carol  4  3  3  1  
Dave  4  4  4  3  1 
Eve  5  4  5  3 
Memorybased and Modelbased algorithms
Memorybased CF Algorithms
Memorybased algorithms utilize the entire useritem database to generate a prediction. These systems employ statistical techniques to find a set of users, known as neighbors, that have a history of agreeing with the target user. The techniques, also known as nearestneighbor or userbased collaborative filtering, are more popular and widely used in practice.
Modelbased CF Algorithms
Modelbased collaborative filtering algorithms provide item recommendation by first developing a model of user ratings. Algorithms in this category take a probabilistic approach and envision the collaborative filtering process as computing the expected value of a user prediction, give his/her ratings on other items.
Challenges of Userbased CF Algorithms
 Sparsity. Large Ecommerce systems like Amazon and Taobao have large item sets, but users can only by a small fraction of the total item set.
 Scalability. With millions of users and items, a typical webbased recommender system running existing algorithms will suffer serious scalability problems.
Itembased Collaborative Filtering Algorithm
The itembased approach looks into the set of items the target user has rated and computes how similar they are to the target item i and then selects k most similar items. At the same time their corresponding similarities are also computed. Once the most similar items are found, the prediction is then computed by taking a weighted average of the target user’s ratings on these similar items.
Item Similarity Computation
Cosinebased Similarity
Two items are thought of as two vectors in the m dimensional userspace. The similarity between them is measured by computing the cosine of the angle between these two vectors.
Take the previous example: Each column can be viewed as a vector. Therefore, the item Harry Potter has a vector of and the item Pirates of the Caribbean has a vector of . Note that if the rating is treated as 0 if blank. And the result should be
Correlationbased Similarity
Similarity between two items i and j is measured by computing the Pearsonr correlation. To make the correlation computation accurate, first isolate the corated cases(i.e., cases where the users rated both i and j). Let the set of users who both rated i and j are denoted by U.
Take the example again: , . The set of users who both rated Harry Potter and The Avengers is . Then the similarity is


Adjusted Cosine Similarity
There’re tough users and easy users. Tough users tend to rate a relatively low score and maybe he has an average rate of 2.5, while easy users tend to have an average rate of 4.0. When computing the similarity, we have to consider the difference between users, and this is what adjusted cosine similarity does. Subtracting each valid rating by the user’s average rating to diminish the difference between users, and that would produce a more accurate result.


Prediction Computation
Weighted Sum
This method computes the prediction on an item i for a user u by computing the sum of the ratings given by the user on the items similar to i. Each ratings is weighted by the corresponding similarity between items i and j.
For this to work best, should be a value in the range 1 to 1. Our ratings are in the range 1 to 5. So we will need to convert our ratings to the 1 and 1 scale. Here’s two formulas:(reference from IT533 ItemBased Recommender Systems (with Cosine Similarity and Python Code))
Normalization
Denormalization
Example of predicting Alice’s rating on Harry Potter:
If we use the prediction formula directly:
This is not an accurate prediction. Since it’s intuitive that Harry Potter is similar to Titanic(Dave rated 4 both and Eve rated 5 both), so Alice’s rate on Harry Potter should be similar to her rate on Titanic. So let’s have a look on normalization result:
Normalize Alice’s rating: . Using these normalized ratings, prediction would be:
This is Alice normalized rating on Harry Potter. And to denormalize it:
And it’s an more accurate prediction.




Regression
In practice, the similarities computed using cosine or correlation measures may be misleading in the sense that two rating vectors may be distant(in Euclidean sense) yet may have very high similarity. The basic idea is to use the same formula as the weighted sum technique, but instead of using the similar item N’s “raw” ratings values ‘s, this model uses their approximated values based on a linear regression model.
The respective vectors of the target item i and the similar item N are denoted by and .
About linear regression, maybe it’s a good choice to refer to the machine learning article, in which I briefly introduced linear regression.
Experimental Evaluation
x: determines what percentage of data is used as training and test sets.
ML: the data set with 943 rows and 1682 columns.
sparsity level:
MAE: mean absolute error.