From the course: Recommendation Systems: A Practical Hands-On Introduction
Data in recommendation systems - Python Tutorial
From the course: Recommendation Systems: A Practical Hands-On Introduction
Data in recommendation systems
- Let's start our conversation on recommendation systems. The first thing we're going to talk is about data. Did you know that the data in recommendation systems is quite different to other machine learning solutions? In this video, we're going to talk about the different data types available in reco and how to think about them. In reco, we generally have three types of data. First, we have the user. The user is the customer who interacted with the company and to whom the system recommends products. Second, we have the item, which is a product that can be recommended to a user. And the third type is interaction or feedback. It is the type of interaction between a user and an item. There are multiple types depending on the business. For example, in e-commerce, we have interactions like view, click, or buy. In media, we have click, start viewing, view 50%, view 100%, et cetera. Not all interactions are created equal; some interactions are more informative than others. Let's say you are on a video site and a movie has a rating of 4.5. Also, maybe there is a review saying that the movie is fantastic. This kind of feedback is called explicit. Explicit feedback is the most informative and reaches interaction. It's when the user explicitly gives feedback about an item. Unfortunately, explicit feedback is typically limited. It requires a higher effort by the user, which makes it inconvenient for them. Also, it can be biased towards certain users. Some people love to write reviews while others like me hate it. And in cases where the feedback is written, it is not straightforward to compare. Now, let's say you click on the movie and watch it until the end. That is implicit feedback. There is indirect evidence of interest. You did watch the video after all, right? But we can still get a lot of information from it. Implicit feedback is abundant, easy to collect, it's not disruptive to users, and typically less biased than explicit feedback. On the negative side, implicit feedback is not easy to interpret. But the biggest limitation of implicit feedback is negative feedback. Consider this. You are on a video site, and of all the movies that appear in the homepage, you select one. Does that mean that you dislike the other options showing the homepage? Well, that is difficult to answer, right? Maybe you dislike some of the movies. Maybe you like them, but you don't want to watch them now. In AI, this is what we call a partially observable system. We can access all the information of that system. In particular, in implicit feedback, we don't know the negative feedback. There are advanced techniques to treat the problem of negative feedback, but we are not going to cover them during this course. Finally, there is another type of data that is very useful for recommendation systems. We call it user and item features. User features are details that describe the user, such as age, location, gender, address, et cetera. Item features are details of the item, such as price, brand, description, product type, et cetera. User and item features can be incorporated to some of the machine learning algorithms that we're going to use in reco. As a summary. In reco, the datasets we build are different to other machine learning areas. The main components are user, item, and interactions. The interactions are divided into explicit feedback, which is more informative but less common, and implicit feedback, which is more accessible but not that easy to interpret. And finally, we can leverage user and item features to enrich the recommendation algorithm.