Let’s talk about sex, mr ChatBot

The accepted abstract for my lightning talk at the AltRecSys workshop at RecSys2024 in Bari. (the image was not part of the submission)

Almost any news source, whether a regular newspaper, LinkedIn, or similar, will tell you these days that ChatGPT and LLMs will generally solve any problem, steal your job and do a better job. We see one blog post after another showing how to create cold start recommenders with an LLM without providing user data that outperforms SOTA algorithms. The LLM just “knows.” 

However, this is only true for domains and topics often discussed on the internet and topics falling within the ethics and rules of the country where the model was created or, indeed, the company itself. As soon as you move away from those, you start having problems. 

One example where this is especially true is in the booming market of sex toys. Many of the product features and words and phrases used to describe sex toys will fall outside of company guidelines and often be part of the stop word list. As a result, it is tough to use the embeddings created by the LLMs. The LLMs will simply ignore essential features of the product or otherwise produce embeddings that are, in fact, useless due to misunderstanding the semantics of the words. 

Facing these problems, many customers looking for sex toys get lost in the many different choices. Since many product types are taboo, it’s difficult for customers to learn and the stores to personalise experiences and create recommendations for the customers. Furthermore, from an industry practitioner’s point of view, this also sets unrealistic expectations on what can be done using these technologies, as the expectation is that you can create great recommendations as long as you use an LMM. 

Most LLMs are trained on publicly available data, and the articles that do talk about sex toys are, in many cases, presenting a distorted worldview and push bias’ which most system engineers want to remove. This also hints at a broader problem since many use cases encounter similar issues. Namely, LLMs can only learn what they are given. So, decisions and availability of data have a significant impact on the applicability of the LLMs

This becomes even worse when used in minor languages. Everything that works in English becomes complex when moved to a small language like one of the Scandinavian languages. The models often confuse the meaning of words because many of these languages have similar words with slightly different meanings, and often, the training data will be mixed between the languages due to poor language classification. 

Sex toys and Scandinavian languages might both be extreme examples, but it does illustrate problems and biases which are introduced into our systems. 

Using LLMs doesn’t always help readability.

created with DALL-E

As an experienced reviewer of recommender systems articles, I have had the privilege of evaluating submissions for numerous large conferences, primarily on the industry track but also on the research track.

A rarely discussed barrier to getting your article accepted at one of these conferences is its readability.

While numerous tools are available to assist writers, it’s important to exercise caution. Before the introduction of LLMs, one of the biggest offenders was Google Translate, which has “helped” non-English speakers translate text. Unfortunately, many of these translations don’t actually mean the same thing. With the introduction of LLMs, many English-speaking authors also hurt readability using tools.

An LLM is a great tool for making your language sound richer and more colorful, which is great if you are writing a novel or other creative piece of content. However, in a scientific article, the best approach is to simplify as much as possible. To convey your research, please make it easier for the reader.

If you do use an LLM or any other tool to help you write it, please do the reviewers and future readers a favor and ensure you and others understand what is written first before submitting.