I am just back from Como, Italy where I attended RecSys2017. It was great to meet all the people in person whose work I have been reading and whose presentations I have YouTubed. And more importantly, it was great to talk about my work and get it verified by people working on similar problems. I am in the legal content business now, and I have to admit that I didn’t find anybody who was having the same type of content as us, but it has some of the same types of challenges as for example news sites battle with. A large part of my reason going to a conference like the RecSys, is also to polish off the reasons why I do what I do. To meet so many geeks and discuss all the nerdy details is a great way to charge my motivation for doing it.
RecSys is an academic conference, where aspiring researchers come to show off their research, improving accuracy in predicting ratings on a narrow list of datasets. But it also has an industry track, where the industry comes to show off how they do recommender systems in practice. I think that both parts are great, only I would wish that research is a little bit more geared towards running them online, and the industry being a bit more technical. But it is a huge strength that both industry and academia are present. And I hope that will continue.
At lunch, a group of us would meet under the trees of the beautiful Villa Erba, where we had lots of interesting talks about how we could make the research part more relevant for business’ – most of us being professionals. And we all agreed (stop me if I am wrong :)) it is a shame that all research is evaluated offline using metrics that most businesses (and researchers) actually agree that it doesn’t say anything about the quality of the recommender system. The result is that researchers are left with the task of optimizing machine learning algorithms using datasets, which are meant to be used for recommender systems, rather than actually doing recommender systems.
The metrics used are also a point to discuss because using precision and recall at K is not a good measurement of a recommendation, but it is the best that they/we have come up with. But if that is what we should measure the quality of research, and use it as a benchmark, we should also agree on the how to split the data into training and test set, how we should sort the ratings split by the users in the test set. I have seen several good talks at the conference about creating a framework for doing this, but it seems that it gets forgotten again as soon as the question round is finished.
Next year’s conference has a challenge sponsored by Spotify, which is great, and I look forward to playing with the data, I wish that they could also make a live service available for researchers to try out their algorithms live, and there by also focusing research on many of practical issues that are faced by a recommender systems engineer.
No matter what I hope to be there again at RecSys 2018.