Explaining the Basics of Retrieval Augmented Generation

There can now be no doubt: we have entered a new technological epoch.

Explaining the Basics of Retrieval Augmented Generation

In the same way there will always be a clear dividing line between our pre- and post-internet lives, the emergence of AI will similarly mark a key milestone in our technological evolution.

The arrival of large language models (LLMs) like ChatGPT and machine learning means we now not only have a vast collection of human knowledge to explore but lightning quick programs to help us sift through it, using it to answer questions in ways that make sense.

But LLMs are not without their limitations. Their training data might be vast, giving them a huge ‘general knowledge’ but their understanding of specifics is often limited. They can’t provide answers based on information they’ve never had access to.

This can quickly turn an application that looks brilliant into one that’s essentially useless, and even dangerous when the AI starts hallucinating answers to questions it should never be able to respond to.

Enter RAG, or retrieval augmented generation.

Where did RAG come from?

The term RAG was first coined in 2020 in a paper written by an international team of researchers including some of the brightest minds working at Facebook (now Meta). They were exploring how LLMs’ excellent ability to answer general questions could be fine-tuned for specific tasks.

Since then, RAG has elevated the potential of AI systems to new heights.

It allows for answer retrieval with the highest possible accuracy, as it can be looped into the most up to date data sources – essential in areas like law and medicine.

It can provide real-time assistance to students, as it can include their live actions in its body of knowledge.

And it can allow AI systems to generate relevant and contextually accurate responses in conversations, transforming customer service applications.

What is RAG?

Retrieval augmented generation is an extension to the LLMs knowledge base.

If you think of the training data set – a vast collection of information that gives the LLM an understanding of how human knowledge and language work – as its general knowledge, then RAG is the textbook for a particular exam. 

That means when you ask the generative AI system a specific question, it has an open book to turn to when looking for an answer.

Without this, the system will either fail to give an answer or even worse will try to come up with an answer that looks right based on the language you’ve used and the information it has in its training data.

At best this is frustrating. At worst, it is a dangerous proliferation of misinformation when chatbots hallucinate an answer that looks right but is completely made up.  

Who is RAG for?

In principle, RAG will be a game-changer for almost any AI use case. But its applications are particularly potent in specific contexts.

For one, areas that require a system to quickly search and process large bodies of very recent data. This might help a legal clerk looking for precedents, or a doctor looking for cutting-edge treatments.

For another, it allows for precise personalization. RAG allows the LLM to fetch specific information about the user, meaning any answers it gives will be relevant and accurate for the person asking the question.

But ultimately, RAG is a way to transcend the most pronounced limitations of LLMs, namely that their training data is finite. 

Before long, generative AI is going to be a feature of our daily tech usage. For many, it already is. It will transform the way we live and work, and AI is likely to have a profound impact on job prospects for many.

RAG is a key component in turning technology that looks great and seems to make science fiction a reality into something that is practically helpful and reliable. It is, in many ways, the key to a future defined by AI.