This semester I’m taking the E-commerce System Structure, and we have come to learn about personalization. In E-commerce, the form of personalization is to recommend items to the user.
Recommendation Metrics
- Prediction accuracy
- Coverage
- Novelty
- Diversity
- Privacy
- Scalability
Recommendation Strategies
- Popularity
- Association role
- Content based
- Collaborative filtering
- Hybrid
In the corresponding experiment, our assignment is to read the paper Item-Based Collaborative Filtering Recommendation Algorithms, a famous paper in recommender system. And I’d like to take a note of some key point, with some specific examples and implementation codes.
Overview of Collaborative Filtering Based Recommender Systems
Figure 1 shows the schematic diagram of the collaborative filtering process. CF algorithms represent the entire m×n user-item data as a ratings matrix, A. Here ‘m’ is the number of users and ‘n’ is the number of items. Each entry in A represents the preference score (ratings) of the ith user on the jth item. Each individual ratings is within a numerical scale(e.g., from 1 to 5, the higher the better) and it can as well be 0 or NULL indicating that the user has not yet rated that item.
Prediction is a numerical value.
Recommendation is a list of N items.
The image may be a bit abstract. Here’s my specific example with simplified toy examples. The rating scales from 1 to 5, where 1 indicates ‘dislike the movie very much’, and 5 indicates ‘like the movie very much’. If the rating is blank, then the user has not seen(voted) the movie yet. I’d refer to this table later.
Harry Potter | Pirates of the Caribbean | Titanic | Avatar | Transformers | |
---|---|---|---|---|---|
Alice | 3 | 5 | 4 | 1 | |
Bob | 3 | 4 | 4 | 1 | |
Carol | 4 | 3 | 3 | 1 | |
Dave | 4 | 4 | 4 | 3 | 1 |
Eve | 5 | 4 | 5 | 3 |
Memory-based and Model-based algorithms
Memory-based CF Algorithms
Memory-based algorithms utilize the entire user-item database to generate a prediction. These systems employ statistical techniques to find a set of users, known as neighbors, that have a history of agreeing with the target user. The techniques, also known as nearest-neighbor or user-based collaborative filtering, are more popular and widely used in practice.
Model-based CF Algorithms
Model-based collaborative filtering algorithms provide item recommendation by first developing a model of user ratings. Algorithms in this category take a probabilistic approach and envision the collaborative filtering process as computing the expected value of a user prediction, give his/her ratings on other items.
Challenges of User-based CF Algorithms
- Sparsity. Large E-commerce systems like Amazon and Taobao have large item sets, but users can only by a small fraction of the total item set.
- Scalability. With millions of users and items, a typical web-based recommender system running existing algorithms will suffer serious scalability problems.
Item-based Collaborative Filtering Algorithm
The item-based approach looks into the set of items the target user has rated and computes how similar they are to the target item i and then selects k most similar items. At the same time their corresponding similarities are also computed. Once the most similar items are found, the prediction is then computed by taking a weighted average of the target user’s ratings on these similar items.
Item Similarity Computation
Cosine-based Similarity
Two items are thought of as two vectors in the m dimensional user-space. The similarity between them is measured by computing the cosine of the angle between these two vectors.
Take the previous example: Each column can be viewed as a vector. Therefore, the item Harry Potter has a vector of and the item Pirates of the Caribbean has a vector of . Note that if the rating is treated as 0 if blank. And the result should be
Correlation-based Similarity
Similarity between two items i and j is measured by computing the Pearson-r correlation. To make the correlation computation accurate, first isolate the co-rated cases(i.e., cases where the users rated both i and j). Let the set of users who both rated i and j are denoted by U.
Take the example again: , . The set of users who both rated Harry Potter and The Avengers is . Then the similarity is
1 | /** |
Adjusted Cosine Similarity
There’re tough users and easy users. Tough users tend to rate a relatively low score and maybe he has an average rate of 2.5, while easy users tend to have an average rate of 4.0. When computing the similarity, we have to consider the difference between users, and this is what adjusted cosine similarity does. Subtracting each valid rating by the user’s average rating to diminish the difference between users, and that would produce a more accurate result.
1 | /** |
Prediction Computation
Weighted Sum
This method computes the prediction on an item i for a user u by computing the sum of the ratings given by the user on the items similar to i. Each ratings is weighted by the corresponding similarity between items i and j.
For this to work best, should be a value in the range -1 to 1. Our ratings are in the range 1 to 5. So we will need to convert our ratings to the -1 and 1 scale. Here’s two formulas:(reference from IT-533 Item-Based Recommender Systems (with Cosine Similarity and Python Code))
Normalization
Denormalization
Example of predicting Alice’s rating on Harry Potter:
If we use the prediction formula directly:
This is not an accurate prediction. Since it’s intuitive that Harry Potter is similar to Titanic(Dave rated 4 both and Eve rated 5 both), so Alice’s rate on Harry Potter should be similar to her rate on Titanic. So let’s have a look on normalization result:
Normalize Alice’s rating: . Using these normalized ratings, prediction would be:
This is Alice normalized rating on Harry Potter. And to denormalize it:
And it’s an more accurate prediction.
1 | /** |
1 | /** |
Regression
In practice, the similarities computed using cosine or correlation measures may be misleading in the sense that two rating vectors may be distant(in Euclidean sense) yet may have very high similarity. The basic idea is to use the same formula as the weighted sum technique, but instead of using the similar item N’s “raw” ratings values ‘s, this model uses their approximated values based on a linear regression model.
The respective vectors of the target item i and the similar item N are denoted by and .
About linear regression, maybe it’s a good choice to refer to the machine learning article, in which I briefly introduced linear regression.
Experimental Evaluation
x: determines what percentage of data is used as training and test sets.
ML: the data set with 943 rows and 1682 columns.
sparsity level:
MAE: mean absolute error.