Four Kitchens

Build a better chatbot: How RAG secures better results from AI

5 Min. ReadDigital strategy

As the buzz around generative AI continues, organizations continue to investigate ways to apply the technology in useful, impactful ways. AI’s capabilities are evolving at a rapid pace, and areas where large language models (LLMs) have produced limited results have grown more reliable in the span of a few months.

One of the most promising applications for generative AI is chatbots, which capitalize on the technology’s ability to draw answers from a defined database. But the development process for a custom chatbot doesn’t follow the norms of software development.

With traditional development, you can expect the same results repeatedly from a set of provided inputs. But in AI development, the technology produces different results, and in fact can generate incorrect responses known as “hallucinations.”

However, through a process called retrieval-augmented generation (RAG), you can improve the accuracy of AI’s conclusions. RAG provides valuable guardrails for chatbot development in a way that allows organizations to use AI without diminishing trust among users.

Why AI development differs from conventional software processes

Generative AI development workflows differ from traditional software development in two ways. One, the variable nature of the responses AI creates to the same inputs creates a challenging environment for testing how well your chatbot works. Secondly, AI requires a different approach to how developers write instructions.

When creating software, the code functions like a complex math equation. Your developers write instructions and can easily trace the path a program takes from input to output. Every step can be investigated and debugged individually.

When developing for AI, the decision-making in a tool like ChatGPT effectively takes place in a black box. The technology is based on a deep learning algorithm, or large language model (LLM), and the conclusions drawn by LLMs are challenging to predict. Further complicating matters, the instructions you give AI are written in English, which introduces more ambiguity into the results.

Addressing inconsistent results from an AI project

Given that AI’s output is more varied than the results from traditional development, your AI project requires more experimentation and testing before release. Along with demanding more adversarial testing designed to “break” the chatbot (called “red teaming”), AI development requires a greater emphasis on content.

AI tools for content generation require considerable oversight and refinement through prompt engineering, which makes sure its questions are clear and include enough context. The same principle applies to chatbot development. Plus, creating generative answers from your defined database introduces gaps in content.

For example, if you want to create a chatbot that generates responses from your employee handbook, you need to verify its content and metadata to make sure AI will correctly find the answer. Your help desk may have Q&A sheets they draw from or unique instructions for how to answer questions. You need to ensure the chatbot has access to this information as well.

How RAG improves accuracy for AI

LLMs generate responses by drawing information from their databases, which adds speed to their response time but introduces the potential for errors. RAG is an architecture that mitigates against these variables by prompting the AI tool to ground its answer solely on the information it finds. The model will not fill in any gaps in how it responds with outside information.

RAG supplies AI with source material to cite in the same way a research paper will include footnotes. This allows users to verify any answers they receive, which builds trust in your chatbot. Better still, RAG reduces the possibilities of a chatbot producing the wrong answer in its response.

Stages of development for AI chatbots

One of the challenges with AI is that the technology has evolved so fast it’s hard to create development processes. At Four Kitchens, we’ve created a process for a chatbot project that progresses through the following stages:

Discovery: Is AI right for this project?

Much like traditional software development, AI development work begins by defining the problem as well as the project’s audience and goals. First, you need to make sure that AI is the right tool to resolve the problem. Some potential chat interfaces simply introduce too many variables in the questions and answers for an AI tool to resolve.

Next, you have to identify the content or data the chatbot will use as its source material. Does it support the use case? Building test cases for AI’s application should be among the first exercises to make sure the technology suits your project.

Building test cases involves documenting the questions and answers you want AI to field. Adding citations to your answers will help support the RAG process and provide better results. Realistic user personas for the chatbot will also help refine how AI should be used.

Design and architecture: Building the right bot to serve your users

Development begins by choosing a vendor and AI model for your chatbot. The OpenAI model has so far performed well, but each instance of the platform offers differing skill sets.

This stage also includes initial user experience designs, which should incorporate any warnings or context messages for the chatbot. For example, indicating that results are generated by AI will help your users better understand the scope of its capabilities. These messages help build trust in your users and allow them to set realistic expectations.

Initial buildout: Interface development and prompt engineering

After deploying the model, your team can develop the interface and start prompt engineering to make sure it produces the expected results. Prompt engineering can function at the same time your model is ingesting the data it will use to generate responses.

Two cycles of testing: Internal and red teaming

With the model in place, your internal teams can begin testing the chatbot to find out if it is satisfying your established test cases. In a deviation from conventional software development, you can also start adversarial red team testing.

Red team testing tries to break the AI system by forcing it to answer questions outside of scope, respond to incorrect information, or process bad input. As rounds of testing are completed, you can scale the project by further building out the dataset. Then, your team should run more tests.

Public testing, deployment, and ongoing improvements

Once your internal testing is complete, you can invite a subset of users to try the chatbot before releasing it to the public. After launch, work continues as you watch for malicious actors and bias, which is an ongoing issue for AI models.

Just as with traditional development, iteration is crucial to the success of your project. AI requires consistent oversight and improvement to be successful, especially as ‌technology continues to evolve.

But the most challenging aspect of working with AI may be understanding where to begin. If this is a question you’ve been trying to resolve, we should talk about the next steps for your organization.