Let’s talk about sex, mr ChatBot

The accepted abstract for my lightning talk at the AltRecSys workshop at RecSys2024 in Bari. (the image was not part of the submission)

Almost any news source, whether a regular newspaper, LinkedIn, or similar, will tell you these days that ChatGPT and LLMs will generally solve any problem, steal your job and do a better job. We see one blog post after another showing how to create cold start recommenders with an LLM without providing user data that outperforms SOTA algorithms. The LLM just “knows.” 

However, this is only true for domains and topics often discussed on the internet and topics falling within the ethics and rules of the country where the model was created or, indeed, the company itself. As soon as you move away from those, you start having problems. 

One example where this is especially true is in the booming market of sex toys. Many of the product features and words and phrases used to describe sex toys will fall outside of company guidelines and often be part of the stop word list. As a result, it is tough to use the embeddings created by the LLMs. The LLMs will simply ignore essential features of the product or otherwise produce embeddings that are, in fact, useless due to misunderstanding the semantics of the words. 

Facing these problems, many customers looking for sex toys get lost in the many different choices. Since many product types are taboo, it’s difficult for customers to learn and the stores to personalise experiences and create recommendations for the customers. Furthermore, from an industry practitioner’s point of view, this also sets unrealistic expectations on what can be done using these technologies, as the expectation is that you can create great recommendations as long as you use an LMM. 

Most LLMs are trained on publicly available data, and the articles that do talk about sex toys are, in many cases, presenting a distorted worldview and push bias’ which most system engineers want to remove. This also hints at a broader problem since many use cases encounter similar issues. Namely, LLMs can only learn what they are given. So, decisions and availability of data have a significant impact on the applicability of the LLMs

This becomes even worse when used in minor languages. Everything that works in English becomes complex when moved to a small language like one of the Scandinavian languages. The models often confuse the meaning of words because many of these languages have similar words with slightly different meanings, and often, the training data will be mixed between the languages due to poor language classification. 

Sex toys and Scandinavian languages might both be extreme examples, but it does illustrate problems and biases which are introduced into our systems. 

Does the solution have to contain machine learning?

Does a solution have to contain machine learning to be good or to tap into the voice of the many – do we need to have an LLM?

In many cases, the answer might be no, for sure not as the first solution. If you consider recommender systems or reranking models, simply reordering them according to recency can significantly improve the experience.

But of course, don’t stop there. There can be many more things to try which might be possible improvements. However, one of the most significant issues of recommender development is that it is tough to evaluate a system without testing it on users. If you have a simple idea that could improve your KPI, it will likely earn you a lot of money while battling with more complex algorithms.

Having something simple in production also enables you to start evaluating and monitoring and ensure you have set up the collection of data needed before adding more complexity to the system. A simple solution also provides you not only evidence and data that could enable your (machine learning model/)system to become even better. If nothing else, it provides a benchmark to compare to the much more complex solution.

Dont randomize training data for recommender systems.

When you train machine learning models, you often have to randomize the data so that the model doesn’t learn unintended patterns from the order of training data. In recommender systems, the same concern could be stated. Still, suppose you randomize a user’s data such that a recommender system might know what a user has consumed after the time the recommendations should happen. In that case, you introduce a data leak, which will be a more significant concern.

Training and evaluation of recommender systems are done to make them make customers happy and have better lives. To prepare the recommenders to do that task as well as possible, it’s essential to train them to mimic the environments they should perform in, and that is not a reality where they know things from the future. At least not yet.

You should always pretend events are time-sensitive. When training a recommender, you should split it at a specific timestamp. To stimulate what the recommender can do at that point, knowing the data recorded before. Then, use the remaining data (logged after the timestamp) to evaluate the predictions, i.e. see how many content items the recommender can predict.

I needed to get it off my chest (again).

What is Retrieval Augmented Generation (RAG)

A RAG framework contains an LLM paired with a knowledge base.

A RAG process takes a query and assesses if it relates to subjects defined in the paired knowledge base. If yes, it searches its knowledge base to extract information related to the user’s question. Any relevant context in the knowledge base is then passed to the LLM along with the original query, and an answer is produced.

This helps with two things: firstly, it reduces the risk of hallucinations, and secondly, it reduces the chances that an LLM will leak sensitive data, as you can leave it out of the training data.

The knowledge base can also be a recommender system, which will allow the LLM to extract context and feed that into the recommender that, in return, delivers crisp recommendations. (this idea is investigated in the RecSys23 article: Retrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models (https://lnkd.in/dHvK8SNJ)
)

A book conversation

Hi ChatGPT, thank you for recommending my book to me. I am delighted you wrote so positively about it.

What is also worth knowing, before I praise it for being the oracle it must be to recommend my book, is that I actually tried regenerating the text four times, and then when my book didn’t appear, told it that:

Then I reposted the question and finally came up with the version I sought, where it was recommending my book.

You usually never see how much work went into making it say the things it’s quoted to say. Remember, the text is only ever as credible as its sources and writers.

The book​ is Out!

The printed book is out!

Four years in the making, nothing on the standards of George R. R. Martin, but still a loong time. I was happy to see that last week, before being released fully, it appeared amongst the 10th most sold in Mannings early releases.
Thanks to all who bought it, supported it, reviewed it and waited so long for the final version!

screenshot 2019-01-18 15.24.06

I am delighted it is completed and I hope that you will enjoy it. Please feel free to comment, review or discuss with me. I also do talks if there is an audience that would like to hear about recommenders.

For now, happy days! Can’t wait to hold a paper copy later this week.

The printed book (and the ebook) are available here and will be for sale on all good webshops in the near future.

Get the book here

Introducing Practical Recommender Systems

Practical Recommender Systems

Front page of Practical Recommender Systems

For a computer scientist like me, the world of IT is such an exciting place! Since I started at  university, I have seen the creation of companies like Amazon and Google, and later Netflix. They were for sure lucky to be in the right place at the right time. But it was ingenuity that has kept them in the market. What they did is a long story, but what I find interesting is that they have taken large quantities of content and made it accessible to the masses.

One of the advantages of being an internet business is the fact that you are not limited by physical walls like traditional shops and your list of products can be close to never ending. If a physical store was truly so vast, customers would struggle to find anything and  simply get lost. They would probably  go to the shop next door, which has fewer products and buy things that are not exactly what they wanted, but are easily accessible.

Offering lots of content does not ensure success, not even if you have precisely what your users want. Often 20% of your content will produce 80% of your business, if you can match the rest of the 80% of the content with your users, you will have more happy users and more business. The problem of activating the last 80% of the content is called the long tail problem.

A way to enhance the accessibility to the content for the users is to add a recommender system to you site. This can attempt to predict what your customers want and serve it to them.

Implementation of Recommender systems is an intriguing task. The actual algorithms like collaborative or content-based filtering are just a small part of it. If you do not feed the algorithm with the right data, it will not produce anything worth looking at. Using user ratings will often not produce the results that users want. Looking at context is also often something worth thinking about. And when it is all implemented and running, how do you know that it is working, how do you measure improvements?

I never found a book answering these questions; I found lots of good books explaining how to implement the algorithms mentioned above, but never a book that described everything around as well. So I started working on one. It just came out in an early release at Manning

Go and have a look, the first chapter is free!

Manning.com/falk.