Using LLMs doesn’t always help readability.

created with DALL-E

As an experienced reviewer of recommender systems articles, I have had the privilege of evaluating submissions for numerous large conferences, primarily on the industry track but also on the research track.

A rarely discussed barrier to getting your article accepted at one of these conferences is its readability.

While numerous tools are available to assist writers, it’s important to exercise caution. Before the introduction of LLMs, one of the biggest offenders was Google Translate, which has “helped” non-English speakers translate text. Unfortunately, many of these translations don’t actually mean the same thing. With the introduction of LLMs, many English-speaking authors also hurt readability using tools.

An LLM is a great tool for making your language sound richer and more colorful, which is great if you are writing a novel or other creative piece of content. However, in a scientific article, the best approach is to simplify as much as possible. To convey your research, please make it easier for the reader.

If you do use an LLM or any other tool to help you write it, please do the reviewers and future readers a favor and ensure you and others understand what is written first before submitting.

LLMs are great but are not making Recommender systems obsolete (yet)

LLMs are great and can do mind-boggling things with their language comprehension capabilities. They have generative abilities that make them seem like oracles, but please caution yourself because they are not.

Stuffing an LLM into a recommender system does not solve all problems. In fact, they might create quite a few more than they will solve at this point.

That’s not to say that they don’t have a place in the world of RecSys, but it is another component rather than a replacement altogether. The idea that it will make behavioral data obsolete seems a bit naive to me. Language Models can enhance recommender systems. LLMs significantly enhance recommender systems by leveraging their advanced language comprehension capabilities to generate personalized recommendations. However, it’s essential to recognize their limitations. While they excel in understanding language, they may not adequately address all complexities of user behavior and context, potentially creating more issues than they solve.

Does the solution have to contain machine learning?

Does a solution have to contain machine learning to be good or to tap into the voice of the many – do we need to have an LLM?

In many cases, the answer might be no, for sure not as the first solution. If you consider recommender systems or reranking models, simply reordering them according to recency can significantly improve the experience.

But of course, don’t stop there. There can be many more things to try which might be possible improvements. However, one of the most significant issues of recommender development is that it is tough to evaluate a system without testing it on users. If you have a simple idea that could improve your KPI, it will likely earn you a lot of money while battling with more complex algorithms.

Having something simple in production also enables you to start evaluating and monitoring and ensure you have set up the collection of data needed before adding more complexity to the system. A simple solution also provides you not only evidence and data that could enable your (machine learning model/)system to become even better. If nothing else, it provides a benchmark to compare to the much more complex solution.

What is Retrieval Augmented Generation (RAG)

A RAG framework contains an LLM paired with a knowledge base.

A RAG process takes a query and assesses if it relates to subjects defined in the paired knowledge base. If yes, it searches its knowledge base to extract information related to the user’s question. Any relevant context in the knowledge base is then passed to the LLM along with the original query, and an answer is produced.

This helps with two things: firstly, it reduces the risk of hallucinations, and secondly, it reduces the chances that an LLM will leak sensitive data, as you can leave it out of the training data.

The knowledge base can also be a recommender system, which will allow the LLM to extract context and feed that into the recommender that, in return, delivers crisp recommendations. (this idea is investigated in the RecSys23 article: Retrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models (https://lnkd.in/dHvK8SNJ)
)

Token names can change the output of your LLM when creating dialogue summaries

I ran into an interesting but unexpected effect when I was playing around with generating dialogue summaries with the FLAN-T5 LLM. 

If you are new to LLM dialogue summarization have a look at this post Dialogue summaries with LLMs

Changing the token names changes the output. In the first version of the dialogue, it had: #kim# and #person1#

To my knowledge dialogue members are usually defined as #person1#, #person2# ect. I was lazy, so I used #kim# as one, and I discovered that the model assumed me to be a woman (Kim is a female name in many places). And generated the following resume: 

Kim is unsure of her dream job.

I looked at it and thought, okay, I will change the name to see if a more common male name will make it write he instead of she. So I changed the Kim to Lars instead, a more common male name, to see if it used information from the token. I had expected it to return the same as above, only with Lars and he instead. But it returned: 

#Lars#: I don’t think a dream job exists, or that there is one out there.

Now if it’s a male name, we are no longer unsure. Lastly, I tried to do #person2# instead any specific name, and got the following:

#Person1#: I don’t think a dream job exists, or that there is one out there

This is the same as the one with Lars only the subject has changed. Now, it’s person1 who doesn’t think a dream job is out there. Im not sure if I have investigated it enough to call bias, but I would be sure to keep my subject tokens to #person1# in the future.

The dialogue was about defining your dream job, but the exact content is unimportant here.

The problem described here is not the transformers trained inside the LLMs in particular, I would have thought that the tokenizer would have some reserved tokens for words on the form #<something>#. On the other hand, the transformers should be able to understand it, if the data used to train it also contained different types of words like #<something>#.