The Intersection of AI and Software Development

Written ByYiwen Wu
Jun 17, 2024

Introduction

Aligned with our mission, Toast’s goal is to use AI in ways that enable the restaurant community to delight guests, do what they love, and thrive. We are strong believers of AI’s potential, but also recognize that this technology presents new considerations and challenges. We intend to leverage the Toast Technology blog to publish content on AI and AI engineering at Toast. Stay tuned for future deep dives and insights. Now, let’s dive into some of the basics: Building Generative AI Applications. 

Building Generative AI Applications

Generative AI (GenAI), equipped with Large Language Models (LLM) trained on a large body of general human knowledge, made AI/ML easily accessible to all; AI backed smart features can be added to existing software by software engineers with minimal training in AI or Machine Learning.  With the availability of multimodal models, which can process a variety of text, audio and visual inputs, the power of generative AI opens up a great deal of possibilities.  However, the path to perfecting smart features is fraught with unexpected difficulties and tempting illusions of progress.

As an ML / AI engineer at Toast working on various data/AI related initiatives, here are a few key learnings for developers to consider when building generative AI applications:

  • Build a human-friendly UI

  • Employ a range of strategies to enhance accuracy

  • Iteratively refine function

  • Deliver quality software

Build a Human-Friendly UI

Quite often, software applications employ LLMs to interact with the users via a chat interface.  The user supplies text as a prompt to an LLM, and the LLM extends or completes the text, which is accepted by the user as an LLM response.  

Whereas the chat interface is simple to start, it can be a barrier for end users because the choice of words is infinite.  It is well-documented that an overwhelming number of choices can become a burden, leading to anxiety, paralysis, and ultimately, dissatisfaction with the chosen option.  Example prompts or structured workflow guides can help users who have never been exposed to LLMs ramp up.

For example, by providing a prompt library with sample prompts organized under different categories, new users can start their experiments with existing prompts and also add new ones to help democratize the knowledge gained during experimentations.

Another approach is to have a role-reversal between the chatbot and the human user.  The application asks a set of questions to guide the workflow, and then the user can ask open questions to gain insights to the content in discussion.

We need to design software for humans to drive wide adoption.  

Employ a range of strategies to enhance accuracy

Users are accustomed to the fact that modern software applications produce consistent and reliable results. Results provided by LLMs alone cannot guarantee that, as predictions from LLMs are probability-based.  Even with high accuracy, users may observe variations in wording, slight conceptual drift and some arbitrary randomness, as our brains are wired to detect anomalies.  There are a few ways to minimize surprises.

Retrieval Augmented Generation

LLMs are prone to hallucination.  It helps the LLM to retrieve the relevant information if the prompt includes not only the question but also the context of the question.  Imagine the LLM as a large multi-dimensional knowledge base, where one can easily become disoriented with too many directions to explore, the context can help the LLM zoom in to the relevant subspace.

A common and effective approach to provide context is Retrieval Augmented Generation (RAG).  Typically, domain-specific documents are converted to NLP embeddings and stored in a vector database which is optimized for high-dimensional embedding vector search.  When the user issues a query, documents related to the query are retrieved from the embedding database and then added to the original query to form an LLM prompt.  The LLM then generates answers with the hints provided in the augmented prompt.

What’s more, RAG can augment the knowledge of a pre-trained LLM by providing the data in context which it may have not seen during its training

Building a RAG service is a common pattern that is typically provided in addition to LLM APIs, as seen in services like AWS Bedrock™.

Agents and Tools

LLMs currently struggle with tasks requiring logical reasoning; they simply locate information highly correlated with the given question which is not necessarily the correct solution.  Sometimes, they suffer from availability bias just like humans do.

For example, LLMs do poorly on math questions.  To workaround this limitation, instead of requesting the LLMs to solve a math problem, the application employs an agent which asks LLMs to formulate a plan to decompose the problem into multiple steps, and then delegate the actual calculation steps to math libraries apt at such problems.  The agent layer manages LLMs, tools, reasoning engine, and knowledge base to create more powerful and versatile AI systems.  

Employ Guardrails

To prevent surprises, the format and content of LLMs’ responses are validated, with the help of LLMs.

Both AWS Bedrock and other vendors provide libraries or APIs that allow developers to craft a set of rules to filter out irrelevant or abusive user input and inappropriate model output.

A comprehensive set of test cases needs to be created to verify the efficacy of the guardrail.  

Domain Specific LLMs

Not all LLMs are equal, even with the same amount of weights and parameters.  Depending on its training data, an LLM can be an expert in one area and a total novice in another.  The distances or relationships between embeddings of the same words, phrases or sentences by different LLMs may vary a great deal.  Early data distribution and accuracy analysis is important for choosing the right LLM.  This is where the data science approaches play a crucial role in Generative AI application development.

Iteratively Refine Function

Typically, before we start a software development project, the development team starts with requirements and specifies how the feature would work.  With Generative AI features, the rules of traditional software development don't always apply.  To meet your users' needs consider making user testing a top priority from the outset, and then use that feedback to refine and perfect your product through continuous iteration.

The real acceptance lies with the end user.  With user feedback, we can optimize our RAG approach with expanded retrieval, updated few shot examples, or fine tuned LLMs.   And, sometimes even the UI needs to be redesigned to improve the user experience.  The software development process follows a rapid prototyping process.  

Deliver Quality Software

LLMs are general purpose NLP models that respond to a wide range of topics.  When building Generative AI applications, a best practice is to develop a thorough suite of unit and integration tests to support the quality and reliability of your product.  A key consideration is to focus on semantic equivalence when verifying text output, rather than exact wording, to account for the inherent variability of Large Language Models.  This should help you validate your software as it undergoes frequent revisions, and ideally deliver a better user experience.

Conclusion

It has become common to leverage foundational infrastructure, e.g. unified LLM API, RAG service, agent framework, to support efforts to embed generative AI features across software applications.  This is a fast moving space; it is a real opportunity for professionalism and creativity.  There are infinite possibilities for us to improve the user's experience and adopt new forms of interaction.