From Keyword Search to Semantic Similarity - Understanding the Retrieval Techniques For RAG

Category

Blog

Author

Wissen Technology Team

Date

February 17, 2025

Generative AI is becoming more mainstream with every passing day. In a key survey about AI, McKinsey found that 65% of organizations they surveyed were already using Generative AI in some form.  Enterprises are looking to leverage more powerful LLM ecosystems to drive better and more relevant experiences across their AI-powered initiatives. The initial models of Generative AI wherein outcomes were derived from pre-trained data are no longer considered an adequate option for business use cases.

For LLMs to deliver maximum value, they must accommodate proprietary information from external sources like the Internet and use it to improve responses. Thus, the concept of Retrieval Augmented Generation (RAG) came into existence. In simple terms, RAG works like any Generative AI model with the exception that the AI system will use the internet or any updated external source to verify the generated response before presenting it to the user. This offers enterprises the ability to build highly reliable GenAI platforms for various use cases. The possibilities are limitless and this is why studies estimate the global market for RAG to grow at 44.7% CAGR from 2024 to 2030.

How Does RAG Work? Understanding the Retrieval Techniques

RAG can be broadly broken down into two key components – Retrieving information and Augmenting generation. In the first step, the AI model finds information relevant to a user query or prompt from external sources such as relevant artifacts, books, or websites. In the second step, the retrieved information from the 1st step is augmented with the information it already has about the topic from its pre-trained knowledge base. Finally, the augmented answer is provided to the user which is more reliable and accurate and is relevant in the current context.

We have just touched upon the very basics of RAG in the introduction. For businesses looking to leverage RAG in their Generative AI initiatives, there is a critical need to understand how RAG retrieves insights from its knowledge base as well as external sources to respond. 

Several approaches can be used to train LLMs to achieve RAG capabilities. Let us explore the top 5:

  • Keyword Search

This is the simplest mode of retrieving information from documents or other artifacts the AI model has access to. It directly matches the keywords in the user query to the text in the external resource and finds a match. The technique is very simple and doesn’t take much computing time or analytical resources. However, it suffers from the limitations of excessive research restriction. In most cases, external research artifacts like a book or a webpage may not have the exact keywords as in the use query. It could be represented with an alternative choice of words. The direct keyword search model will not work in such contexts. Thus, the outcomes of this search technique cannot be relied upon for relevancy as they may miss out on aggregating the right information because of no direct keyword match.

  • Semantic Search

In semantic search, the AI model leverages natural language processing to understand the real context behind a user query. The exact intent, the scope of usage of the outcome, the minor and major nuances in search queries, etc., will be analyzed and then used to query the available external and internal knowledge bases. This provides the model with the added advantage of gaining relevant insights from documents or external sources even if there is no exact keyword match with the query and thus improves the relevancy of the results.

  • Transformers

In the case of semantic search, the intelligence surrounding context is mostly related to the query sentence in its entirety. With the transformers approach, the AI model gets to go one step further. By using powerful neural network architecture for NLP capabilities, the model can decipher relationships and contextual dependencies between words in the user query. They can then retrieve outputs having matched contextual dependencies in the target document or external sources as well. This allows for far greater levels of accuracy than semantic search. Such powerful models can be deployed in critical use cases like high-level customer support or for powering conversational bots in areas like medical assistance.

  • Vector Space Model (VSM)

This is an approach that works on spatial pattern representation and recognition. The user queries as well as reference knowledge documents or sources of research are converted into numerical vectors in spatial dimensions. Then complex analytical processing is used to match patterns of similarity in the vector representation of both. Matching patterns will indicate a potentially relevant answer for a query from the research source. VSM can be extremely complicated to implement but can exhibit very high levels of accuracy at scale which is useful for complex Generative AI tasks.

  • Hybrid approach

In this approach, two or more of the search retrieval models are combined at different levels to bring out the best possible answers for queries. For example, a response obtained through transformer retrieval can be cross-checked with VSM to ensure critical accuracy. It improves retrieval performance and accuracy by considering multiple dimensions of the queries in multiple representations to ensure a high degree of match between queries and responses.

RAG will be a major game changer for Generative AI initiatives. Industries ranging from retail to healthcare can leverage the capabilities of RAG to build powerful user response systems with very high levels of autonomous operations. However, the intricacies of developing an RAG model and then adopting the appropriate retrieval techniques can be a challenging journey that requires a high degree of expertise with AI and ML technologies. This is where a technology partner like Wissen can be a huge asset for your business. Get in touch with us to know more.