Today it’s sunny in Copenhagen so it would be a great day for starting to collect data for an ice cream vendor recommender system. What you need is a lot of ratings of different ice-cream shops. Maybe some data on which types of ice-cream they sell, and whether they have seats and such.
Then you can spend the days where it’s too hot to be outside on reading about how to handle geodata and then how to implement the recommender systems.
You can implement a content-based recommender using the data on the vendor for suggesting similar places for people planning a trip, and use a “learning to rank” algorithm to combine ratings and position into recommendations for users on the move.
Incidentally, today is a good start to on such a project because Manning has a deal of today on “Geoprocessing with Python” which means you can get it at half price off today, and tomorrow you will find the same deal for my book “Practical Recommender systems“. Go and have an ice-cream and think about it.
And if you do it, please let me know how it went!
(A deleted scene from the Practical Recommender Systems book)
People use a computer in different ways – when I need to copy-paste something I always use <ctlr>-c and <ctlr>-v, while my wife insists (how irritating!) on always using the mouse right-click menu. I do not know if such habits can be translated to implicit ratings, unless they could be based on whether to be a geek or not. Just for fun, we could pursue the thought a bit. Let’s pretend that we are looking at films. If we have the Matrix – the geek movie of all time- who would you recommend it to if you only knew how the user performs copy-paste? I would answer the ones using the <ctlr> way. My wife is helping me a lot with this (book), so I should be careful about guessing what kind of films a person who does copy-paste with the right-click menu likes, but I would put her way in between the geek way and the ones doing copy-pasting using the menu in the top of the window.
Asking around, I tried to find members of each group, the ones that use the window menu to copy-paste, the ones that right-click and finally the ones that use only the keyboard. Then I asked them to point at which of three movies they liked more. This was the result of my little survey:
- <ctrl>-way = The Matrix.
- Right-click-way = Life of Walter Mitty
- Window menu-way = You’ve Got Mail
You can maybe think of it like shown in the figure.
To put this into practice I would need to record a series of copy-paste events from the user, at which point the system could then recommend the movie, which fits to this user’s behaviour.
I hope that this has never been implemented in practice, but if you think about it, then if you have the choice of recommending between “The Matrix” and “You’ve got mail”, the copy-paste behaviour could maybe contribute to give the system a better understanding of what to recommend.
The conclusion is that even if evidence might not be an obvious telltale about users taste, but it might contribute in making the implicit ratings more precise.
For those of you who read my introduction of my book, but haven’t bought it (yet :)), there is a small excerpt of the first chapter over on Dzone.
It is part of the description of what evidence is for a recommender system.
Front page of Practical Recommender Systems
For a computer scientist like me, the world of IT is such an exciting place! Since I started at university, I have seen the creation of companies like Amazon and Google, and later Netflix. They were for sure lucky to be in the right place at the right time. But it was ingenuity that has kept them in the market. What they did is a long story, but what I find interesting is that they have taken large quantities of content and made it accessible to the masses.
One of the advantages of being an internet business is the fact that you are not limited by physical walls like traditional shops and your list of products can be close to never ending. If a physical store was truly so vast, customers would struggle to find anything and simply get lost. They would probably go to the shop next door, which has fewer products and buy things that are not exactly what they wanted, but are easily accessible.
Offering lots of content does not ensure success, not even if you have precisely what your users want. Often 20% of your content will produce 80% of your business, if you can match the rest of the 80% of the content with your users, you will have more happy users and more business. The problem of activating the last 80% of the content is called the long tail problem.
A way to enhance the accessibility to the content for the users is to add a recommender system to you site. This can attempt to predict what your customers want and serve it to them.
Implementation of Recommender systems is an intriguing task. The actual algorithms like collaborative or content-based filtering are just a small part of it. If you do not feed the algorithm with the right data, it will not produce anything worth looking at. Using user ratings will often not produce the results that users want. Looking at context is also often something worth thinking about. And when it is all implemented and running, how do you know that it is working, how do you measure improvements?
I never found a book answering these questions; I found lots of good books explaining how to implement the algorithms mentioned above, but never a book that described everything around as well. So I started working on one. It just came out in an early release at Manning
Go and have a look, the first chapter is free!