Dont randomize training data for recommender systems.

Posted by kimfalk

When you train machine learning models, you often have to randomize the data so that the model doesn’t learn unintended patterns from the order of training data. In recommender systems, the same concern could be stated. Still, suppose you randomize a user’s data such that a recommender system might know what a user has consumed after the time the recommendations should happen. In that case, you introduce a data leak, which will be a more significant concern.

Training and evaluation of recommender systems are done to make them make customers happy and have better lives. To prepare the recommenders to do that task as well as possible, it’s essential to train them to mimic the environments they should perform in, and that is not a reality where they know things from the future. At least not yet.

You should always pretend events are time-sensitive. When training a recommender, you should split it at a specific timestamp. To stimulate what the recommender can do at that point, knowing the data recorded before. Then, use the remaining data (logged after the timestamp) to evaluate the predictions, i.e. see how many content items the recommender can predict.

I needed to get it off my chest (again).

RecSys Summer School 2023

Posted by kimfalk

Thank you to everyone who planned, executed, and joined the RecSys Summer School #RSSS2023. It was an intense week with many great talks, but also due to all discussions and different perspectives heard and had during the breaks and social events. I love the fierce submersion into the RecSys topic, and it is hard not to feel a bit empty after it, but luckily RecSys 2023 (https://lnkd.in/eiDAJP2U) is just around the corner (in time, not location).

I am still digesting the talks and content. I was happy that the week also reflected that evaluation is discussed extensively in Recommender Systems circles. Evaluating Recommenders is hard, if not almost impossible offline and very difficult in online scenarios. There are many reasons for that, such as data bias and the frivolous nature of users. But one thing that was highlighted by many (including myself (vink vink)) was that an evaluation is not a number. Evaluation is answering a question, where metrics provide evidence to answer that question. Lien Michiels did my favourite talk of the week about off-line evaluation (even if she did have many slides with bullet points).

A recommender system is never a single objective component, and I enjoyed Robin Burke talking about the integrities of multistakeholder recommenders. There are always many parties who expect to be considered in recommender systems. Besides the stakeholders, there are also social constraints, such as fairness, that should be considered. Christine Bauer gave a very interesting talk on this subject.

Recommenders should also educate the user about what is in the catalogue. Most importantly, the recommender should not confuse users and push them away from the current use-case. This and other subjects were discussed in the frame of E-Commerce in an excellent talk delivered by Humberto Corona 🎧 sharing his experiences with recommenders in e-commerce.

Conversational recommenders will be a big thing with the new chatbots like ChatGPT, and Cataldo Musto gave an excellent introduction to these. It is still a research question of how to merge the power of the bots with good recommendations.

Knowledge-based recommendations are everything that starts with the content metadata and uses it to find similarities and create recommendations based on that. Pasquale Lops and Marco de Gemmis gave us a good introduction. They explained why we should consider content data first and behavioural data as side information, not vice versa.

This was just to mention a few. All talks were fascinating!

Thank you again for organising Alan Said, Toine Bogers and Maria Maistro and inviting me to talk!

No alternative text description for this image

The book is Out!

Posted by kimfalk

The printed book is out!

Four years in the making, nothing on the standards of George R. R. Martin, but still a loong time. I was happy to see that last week, before being released fully, it appeared amongst the 10th most sold in Mannings early releases.
Thanks to all who bought it, supported it, reviewed it and waited so long for the final version!

screenshot 2019-01-18 15.24.06

I am delighted it is completed and I hope that you will enjoy it. Please feel free to comment, review or discuss with me. I also do talks if there is an audience that would like to hear about recommenders.

For now, happy days! Can’t wait to hold a paper copy later this week.

The printed book (and the ebook) are available here and will be for sale on all good webshops in the near future.

Get the book here