Item-Based Collaborative Filtering Recommendation Algorithms reading report

This semester I’m taking the E-commerce System Structure, and we have come to learn about personalization. In E-commerce, the form of personalization is to recommend items to the user.

Recommendation Metrics

  • Prediction accuracy
  • Coverage
  • Novelty
  • Diversity
  • Privacy
  • Scalability

Recommendation Strategies

  • Popularity
  • Association role
  • Content based
  • Collaborative filtering
  • Hybrid

In the corresponding experiment, our assignment is to read the paper Item-Based Collaborative Filtering Recommendation Algorithms, a famous paper in recommender system. And I’d like to take a note of some key point, with some specific examples and implementation codes.

Overview of Collaborative Filtering Based Recommender Systems


Figure 1 shows the schematic diagram of the collaborative filtering process. CF algorithms represent the entire m×n user-item data as a ratings matrix, A. Here ‘m’ is the number of users and ‘n’ is the number of items. Each entry in A represents the preference score (ratings) of the ith user on the jth item. Each individual ratings is within a numerical scale(e.g., from 1 to 5, the higher the better) and it can as well be 0 or NULL indicating that the user has not yet rated that item.
Prediction is a numerical value.
Recommendation is a list of N items.

The image may be a bit abstract. Here’s my specific example with simplified toy examples. The rating scales from 1 to 5, where 1 indicates ‘dislike the movie very much’, and 5 indicates ‘like the movie very much’. If the rating is blank, then the user has not seen(voted) the movie yet. I’d refer to this table later.

Harry Potter Pirates of the Caribbean Titanic Avatar Transformers
Alice 3 5 4 1
Bob 3 4 4 1
Carol 4 3 3 1
Dave 4 4 4 3 1
Eve 5 4 5 3

Memory-based and Model-based algorithms

Memory-based CF Algorithms
Memory-based algorithms utilize the entire user-item database to generate a prediction. These systems employ statistical techniques to find a set of users, known as neighbors, that have a history of agreeing with the target user. The techniques, also known as nearest-neighbor or user-based collaborative filtering, are more popular and widely used in practice.

Model-based CF Algorithms
Model-based collaborative filtering algorithms provide item recommendation by first developing a model of user ratings. Algorithms in this category take a probabilistic approach and envision the collaborative filtering process as computing the expected value of a user prediction, give his/her ratings on other items.

Challenges of User-based CF Algorithms

  • Sparsity. Large E-commerce systems like Amazon and Taobao have large item sets, but users can only by a small fraction of the total item set.
  • Scalability. With millions of users and items, a typical web-based recommender system running existing algorithms will suffer serious scalability problems.

Item-based Collaborative Filtering Algorithm

The item-based approach looks into the set of items the target user has rated and computes how similar they are to the target item i and then selects k most similar items. At the same time their corresponding similarities are also computed. Once the most similar items are found, the prediction is then computed by taking a weighted average of the target user’s ratings on these similar items.

Item Similarity Computation

Cosine-based Similarity

Two items are thought of as two vectors in the m dimensional user-space. The similarity between them is measured by computing the cosine of the angle between these two vectors.

Take the previous example: Each column can be viewed as a vector. Therefore, the item Harry Potter has a vector of and the item Pirates of the Caribbean has a vector of . Note that if the rating is treated as 0 if blank. And the result should be

Correlation-based Similarity

Similarity between two items i and j is measured by computing the Pearson-r correlation. To make the correlation computation accurate, first isolate the co-rated cases(i.e., cases where the users rated both i and j). Let the set of users who both rated i and j are denoted by U.

Take the example again: , . The set of users who both rated Harry Potter and The Avengers is . Then the similarity is

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/**
* Implementation of Correlation-based Similarity
* reference: Wang Menghan's implementation
*
* @author Hael Chan
* @version 1.0
* @time 2017-11-05 23:51:09
*/
public double[][] calSimilarity(double [][]uiRating) {
/* uiRating is a u*i matrix,
* where u is the number of users,
* and i is the number of items
*/
int userNum = uiRating.length; // the number of rows of uiRating
int itemNum = uiRating[0].length; // the number of columns of uiRating
double[][] simResult = new double[itemNum][itemNum]; // a new matrix for similarity among items
double[] average = new double[itemNum];
int[] nonZero = new int[itemNum];
for (int c = 0; c < itemNum; c++) { // for every item, iterate every user's rating,
for (int r = 0; r < userNum; r++) { // so the nested for loop may be a bit nonintuitive
if (uiRating[r][c] > 0) { // if uiRating[r][c] == 0, then just ignore it
average[c] += uiRating[r][c];
nonZero[c]++;
}
}
}
// Calculate every item's average rating
for (int i = 0; i < itemNum; i++) {
average[i] /= nonZero[i];
}
// Calculate the similarity
for (int i = 0; i < itemNum; i++) {
for (int j = i + 1; j < itemNum; j++) { // here j starts with i + 1 to improve efficient and avoid (i == j) case
double num = 0, den = 0;
double den1 = 0, den2 = 0;
for (int k = 0; k < userNum; k++) {
double diff1 = 0, diff2 = 0;
if (uiRating[k][i] > 0 && uiRating[k][j] > 0) {
diff1 = uiRating[k][i] - average[i]; // R_{u,i}-\overline R_i (in LaTex syntax)
diff2 = uiRating[k][j] - average[j]; // R_{u,j}-\overline R_j (in LaTex syntax)
num += diff1 * diff2;
den1 += diff1 * diff1;
den2 += diff2 * diff2;
}
}
den = Math.sqrt(den1) * Math.sqrt(den2);
simResult[i][j] = (den == 0) ? 0 : num / den; // remember the case that den may be zero
simResult[j][i] = simResult[i][j];
}
}
return simResult;
}

Adjusted Cosine Similarity

There’re tough users and easy users. Tough users tend to rate a relatively low score and maybe he has an average rate of 2.5, while easy users tend to have an average rate of 4.0. When computing the similarity, we have to consider the difference between users, and this is what adjusted cosine similarity does. Subtracting each valid rating by the user’s average rating to diminish the difference between users, and that would produce a more accurate result.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
/**
* Implementation of Adjusted Cosine Similarity
*
* @author Hael Chan
* @version 1.0
* @time 2017-11-07 09:49:51
*/
public double[][] calSimilarity(double [][]uiRating) {
/*
* Basically the method is the same as the
* implementation of Correlation-based Similarity,
* except the computation of average is different.
* The correlation-based calculates the average of
* column vector, while the adjusted cosine computes
* the average of row vector.
*/
int userNum = uiRating.length;
int itemNum = uiRating[0].length;
double[][] similarityResult = new double[itemNum][itemNum];
double[] average = new double[userNum];
int[] nonZero = new int[userNum];
for (int r = 0; r < userNum; r++) {
for (int c = 0; c < itemNum; c++) {
if (uiRating[r][c] > 0) {
average[r] += uiRating[r][c];
nonZero[r]++;
}
}
}
// Calculate every user's average rating instead of every item's
for (int i = 0; i < userNum; i++) {
average[i] /= nonZero[i];
}
for (int i = 0; i < itemNum; i++) {
for (int j = i + 1; j < itemNum; j++) {
double num = 0, den = 0;
double den1 = 0, den2 = 0;
for (int k = 0; k < userNum; k++) {
double diff1 = 0, diff2 = 0;
// only calculate those co-rated items,
// which means the user k both rated i and j
if (uiRating[k][i] > 0 && uiRating[k][j] > 0) {
diff1 = uiRating[k][i] - average[k]; // subtract user's average rating
diff2 = uiRating[k][j] - average[k];
num += diff1 * diff2;
den1 += diff1 * diff1;
den2 += diff2 * diff2;
}
}
den = Math.sqrt(den1) * Math.sqrt(den2);
similarityResult[i][j] = (den == 0) ? 0 : num / den;
similarityResult[j][i] = similarityResult[i][j];
}
}
return similarityResult;
}

Prediction Computation

Weighted Sum

This method computes the prediction on an item i for a user u by computing the sum of the ratings given by the user on the items similar to i. Each ratings is weighted by the corresponding similarity between items i and j.

For this to work best, should be a value in the range -1 to 1. Our ratings are in the range 1 to 5. So we will need to convert our ratings to the -1 and 1 scale. Here’s two formulas:(reference from IT-533 Item-Based Recommender Systems (with Cosine Similarity and Python Code))
Normalization

Denormalization

Example of predicting Alice’s rating on Harry Potter:
If we use the prediction formula directly:

This is not an accurate prediction. Since it’s intuitive that Harry Potter is similar to Titanic(Dave rated 4 both and Eve rated 5 both), so Alice’s rate on Harry Potter should be similar to her rate on Titanic. So let’s have a look on normalization result:
Normalize Alice’s rating: . Using these normalized ratings, prediction would be:

This is Alice normalized rating on Harry Potter. And to denormalize it:

And it’s an more accurate prediction.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/**
* originalPrediction
* @author Hael Chan
* @version 1.0
* @time 2017-11-08 13:30:23
*/
public double originalPrediction(int userID, int itemID){
// compute the prediction on an item i for a user u
// by computing the sum of the ratings given by the user
// on the items similar to i
double den = 0.0, num = 0.0;
for (int i = 0; i < uiRating[0].length; i++){ // go through every item
// Here all items, except for the object item itself,
// is calculated in the weighted sum(for convenience).
// For better performance, only those similar items should be considered.
if (i != itemID && uiRating[userID][i] > 0){
den += Math.abs(similarityResult[i][itemID]);
num += similarityResult[i][itemID] * uiRating[userID][i];
}
}
double predictScore = 0.0;
if (den != 0) {
predictScore = num / den;
}
return predictScore;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/**
* normalizedPrediction
* @author Hael Chan
* @version 1.0
* @time 2017-11-18 13:28:57
*/
public double normalizedPrediction(int userID, int itemID) {
// using normalization and denormalization to
// make a more accurate prediction
int itemNum = uiRating[0].length;
double[] normalizedRate = new double[itemNum];
// maxR and minR are constants which are predefined
// final static int maxR = 5;
// final static int minR = 1;
// create a vector, storing user's normalized rating
for (int i = 0; i < itemNum; i++) {
if (uiRating[userID][i] != 0) {
normalization[i] = (2 * (uiRating[userID][i] - minR) - (maxR - minR)) / (maxR - minR);
}
}
double den = 0.0, num = 0.0, normalizedScore = 0.0, denormalizedScore = 0.0;
for (int i = 0; i < itemNum; i++) {
if (i != itemID && uiRating[userID][i] != 0) {
den += Math.abs(similarityResult[i][itemID]);
num += similarityResult[i][itemID] * normalization[i];
}
}
if (den > 0) {
normalizedScore = num / den;
denormalizedScore = 0.5 * ((normalizedScore + 1) * (maxR - minR)) + minR;
}
return denormalizedScore;
}

Regression

In practice, the similarities computed using cosine or correlation measures may be misleading in the sense that two rating vectors may be distant(in Euclidean sense) yet may have very high similarity. The basic idea is to use the same formula as the weighted sum technique, but instead of using the similar item N’s “raw” ratings values ‘s, this model uses their approximated values based on a linear regression model.

The respective vectors of the target item i and the similar item N are denoted by and .

About linear regression, maybe it’s a good choice to refer to the machine learning article, in which I briefly introduced linear regression.

Experimental Evaluation

x: determines what percentage of data is used as training and test sets.
ML: the data set with 943 rows and 1682 columns.
sparsity level:
MAE: mean absolute error.

Reference: