Item Recommendation using Jaccard Coefficient

Have you ever wondered when you see the Suggestion For You section on your social media, e-commerce, or even your banking account? Where’s this recommendation coming from? How is the logic behind them? You won’t get the real answer here!

In this article I’ll try to break one of a simple approach that can help to define the item can be considered as suggestion based on users similarity. This method is: Jaccard Coefficient.

The Jaccard Coefficient basically is a measure of similarity of two sets by looking at the relative size of their intersection. Hence, mathematically written as:

As illustration, if we have two sets: A and B and there’s 2 elements in their intersection from total 7 elements in both A and B, then the Jaccard Coefficient of these two sets is 2/7.

Thus, Jaccard Coefficient should be in the range of 0–1, higher value indicate higher similarity between two sets.

Implementation Example

Let’s take a look at how is the implementation example using dummy dataset of Stranger Things Series cast name and their favorite movie.

Using simple calculation, we can got the Jaccard Coefficient of each user’s combination.

Randomly, let’s take a look at the Dustin data. It shown us that the highest Jaccard Coefficient of Dustin data is Mike, followed by Eleven, Max, Will, and Lucas.

We can interpret that Dustin have highest similarity with Mike, hence we can consider to give Dustin recommendation to watch Peaky Blinder as the Mike’s favorite movie that not intersected yet with Dustin’s current favorite. Also, recommend Mike to watch Stranger Things.

Things to Consider

Since this approach using user similarity concept, it might return Popularity Bias, where non-popular items, or in this case: non popular movies, will less likely appear in recommendation.

Summary

In summary, Jaccard Coefficient is a simple method that can be utilized not only for movie suggestion just like our case, but also another personalization recommendation e.g. content, ads, product, song, etc by utilizing the concept of similarity. Furthermore, we can do an experiment to define the threshold of the optimum coefficient and also combine this value with another more complex method.