Navigating the Leader’s Dilemma
One of the most challenging parts of any LLM is finding the needle in the haystack! It is particularly true if you are looking for enterprise knowledge that allows you to differentiate while maintaining privacy and security of your information.
There are many popular foundational models that make excellent conversational tools, and they are trained on a torrential amount of textual information that can be found on the internet. For example, ChatGPT can provide reasonably accurate answers to questions regarding general knowledge from the internet. It does, however, have some significant limitations. ChatGPT cannot answer questions whose answers are not part of its training data. Additionally, LLM has several limitations such as hallucinations, infringement of intellectual property rights, accuracy and context-related issues, as well as privacy, security, and the emerging nature of risks.
The adoption of Generative AI poses one of today’s most daunting challenges due to the rapid and multifaceted evolution of foundation models and deployment options. The following variables in the market make it all the more complex:
- Navigating this intricate landscape, technology leaders grapple to discern the dizzying number of available options along with their respective advantages and drawbacks.
- It’s paramount that the Proof of Value concentrates on pragmatic problem solving rather than merely focusing on technical feasibility, ensuring the meaningful application of the technology.
- Incorporating a Human-Centric AI Strategy and responsible controls is crucial to creating ethically aligned and effective solutions.
- Furthermore, the use cases for Generative AI span a wide spectrum, ranging from generalized applications such as customer service and content generation, to highly specialized industry applications including drug discovery and material design, each presenting its own set of challenges and opportunities.
Challenges Leaders Face Today with Generative AI Integration
When it comes to investing in Generative AI models, the first and foremost consideration a leader has is connecting to enterprise knowledge securely for long-term sustainability. Further, differentiating their approach to enable competitive advantage that aligns with their overarching strategy to create value while safeguarding it against multitude of issues inherent to Generative AI.
Here are some other factors for consideration: The costs and resources to finetune and maintain reasonable technology stack are never clearcut. Additionally, identifying and building the right skills is vital to ensure the successful integration and optimization of Generative AI within the organization. Leaders are tasked with discerning the subtle balance between innovation and practicality, ensuring that the adoption of Generative AI translates to tangible value while upholding the principles of ethical and responsible AI.
In a recent study by Gartner on methods for Deploying Generative AI summarized the following key insights:
- The pace at which generative AI providers are launching new capabilities is overwhelming IT leaders.
- The number of pretrained generative AI models and applications is growing exponentially; however, steering them to align with enterprise use cases and AI governance is challenging.
- Technology innovation leaders don’t fully understand the variety of generative AI deployment approaches, and their pros and cons.
- Consuming models as embedded applications, embedding model APIs and steering them via prompt engineering are currently popular; extending them via a retrieval augmented generation architecture is an emerging approach.
Generative AI Deployment: A Decision Framework
In the rapidly evolving landscape of Generative AI, technology leaders often find themselves navigating through a myriad of options, each with its unique set of advantages and disadvantages. The multitude of available choices makes it imperative for leaders to have a structured ‘Decision Framework’ in place to facilitate informed and insightful decisions that align with the organizational goals and objectives.
A well-documented decision framework is not merely a tool for choice but a comprehensive mechanism for documenting vendors, services, libraries, tools, data, risks, and other market options including cost and effort. It documents a clearcut shared responsibility model with ownership on liabilities, ensuring that organizational leaders are well-equipped with the necessary insights and knowledge. This framework becomes an essential conduit for informed decision-making, providing a detailed analysis of cost, effort, sustainability, and valuable market insights, thus allowing leaders to strategically plan and integrate generative AI into the enterprise AI roadmap.
Moreover, acknowledging and documenting the potential downside risks for each approach and the associated costs to safeguard Gen AI investments are equally crucial. It ensures that leaders are not only creating value but also protecting it, fortifying the organization’s AI endeavors against unforeseen challenges and vulnerabilities. In essence, a robust decision framework is indispensable for technology leaders to navigate the complexities and uncertainties of Generative AI adoption, allowing them to harness its potential responsibly and effectively. Our Gartner adapted Decision Framework helps companies with comparative analysis on choices, pros and cons of variables listed above. The summarized version includes the following approaches.
A simpler version of this decision tree:
We will primarily concentrate on the widely-recognized Approach #3 – Extending Generative AI through Data Retrieval from the Adoption Decision Framework.
Introduction to Retrieval Augmented Generation (RAG)
This approach known as Retrieval Augmented Generation (RAG) empowers enterprises by enhancing foundation models with enterprise or external data retrieval, optimizing accuracy and quality for domain-specific tasks. This method, negating the need for custom models, allows for the integration of pertinent information from private databases, enriching model responses with relevant and analogous insights.
diagram source: Prompt Engineering | Medium
In contrast to fine-tuning, this approach will not require training the model. Rather, it involves the creation of an embedding or a vector database that captures all of the available knowledge. Using this database, relevant data is then gathered and fed into the large language model as part of the prompt.
The Science behind the RAG Method
The RAG approach incorporates high-dimensional vector databases, a relatively recent technology, allowing for efficient storage and quick, accurate searches against private data through creation and collections of vectors.
This advanced method is crucial for interpreting embedding models, which are primarily utilized in foundational models. It converts raw text into numerical, vector embeddings that algorithms can read, a vital step for producing coherent and contextually accurate responses in numerous natural language processing tasks and expansive language models.
When extensive texts are segmented and stored in vector databases, the semantics and meaning behind words are translated into numeric forms as vectors, facilitating effective comparison with model embeddings. This close interrelation between vector databases and embedding models opens avenues for integrating private data, such as personal information and internal company data, with Large Language Models (LLMs), maintaining the essence and semantic richness of the original textual data.
Leveraging RAG with Microsoft Copilot
Companies seeking to understand how RAG works in conjunction with Microsoft Copilot may find this thought piece helpful. It makes a clear distinction between the use of RAG in conjunction with Microsoft Copilot through the concept of “Grounding”, along with what the concept means and its implications.
What is Grounding?
Grounding is the process of using large language models (LLMs) with information that is use-case specific, relevant, and not available as part of the LLM’s trained knowledge. It is crucial for ensuring the quality, accuracy, and relevance of the generated output. While LLMs come with a vast amount of knowledge already, this knowledge is limited and not tailored to specific use-cases. To obtain accurate and relevant output, we must provide LLMs with the necessary information. In other words, we need to “ground” the models in the context of our specific use-case.
The article outlines Retrieval Augmented Generation (RAG) is the primary technique for grounding used alongside Microsoft Copilot integrated with Visual Studio Code. The use case to be developed uses an LLM to provide context-aware suggestions based on the document you are working on. This approach is also being implemented in Microsoft 365 and Power Platform, and developers are now being invited to create their own context-aware applications.
The Advantages of the RAG Method
Ability to Differentiate: Connecting private or enterprise knowledge with LLM
It is Reliable: Driving efficiency when the use case involves accurate data. As an example, embeddings can be used to create a knowledge base containing private legal information or financial market information in real-time.
Definitely Cheaper than Finetuning: The RAG method also known as the knowledge-based method is also a cost-effective approach as it removes the need for fine-tuning, cutting down on expenses and making AI adaptation more economical viable.
It is Faster: The use of vector databases that enable semantic search facilitates several advantages over fine-tuning, such as faster search speeds, reduced computational costs, and the prevention of fact fabrication.
Enables Versatility: The use of embeddings enables diverse use cases, including recommendation engines, search or summarization functionality, and classifications.
The Disadvantages of the RAG Method
Sensitive to Changes: Any changes or updates to the LLM (impacting embeddings) require re-indexing everything in the vector database.
Privacy Risk: When enterprise knowledge is distributed across embedding models and a vector database, and if these are managed as separate services, it implies a replication of your entire set of data in two different locations. This duplication poses a significant risk as it can potentially expose sensitive information, heightening the vulnerability to unauthorized access or breaches.
Other risks: The quality of embeddings is intrinsically linked to the quality of the input data; inconsistencies or biases in the data can result in compromised embeddings, impacting the performance of subsequent tasks.
Costs related to Managed Services: Vector database is relatively a new technology. Below is a representative cost model, illustrated with a knowledge base comparable in scale to PubMed database.
PubMed® comprises more than 36 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.
Creating a healthcare Q&A chatbot utilizing the knowledge base of all PubMed abstracts for a healthcare Q&A application is a substantial undertaking. With approximately 36 million abstracts translating to around 100 million chunks, the total tokens would be about 25 billion. The monthly cost for a modest vector database plan and a cheaper embedding model is estimated to be around $7000–8000, excluding storage fees and an initial one-time cost of $125,000 for embedding generation. Any alterations to the embedding model also incur this one-time cost. Additionally, with 100 million queries a month, there is a recurring expense of at least $250,000 for the query embedding service and response generation.
For a relatively small database with a database capacity of about 30 queries per second, it would cost about $3,300 per month for the vector database. There is also a one-time cost of $3,300 for embedding generation, and an additional $3,300 every time the embedding model is changed. If the application receives a lot of queries per month, there will be an additional recurring expense for the query embedding service and response generation.
For context, tokens are pieces of words used for natural language processing. For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.
With these 2 ends of the spectrum of cost models, Enterprises can expect to pay anywhere between 10 to 100 times for a RAG application depending on the scale.
High dimensionality in embeddings results in increased computational complexity and storage needs, complicating tasks like similarity search and clustering in large datasets.
The landscape of Generative AI is rapidly evolving, presenting technology leaders with the challenge of identifying optimal deployment options among a myriad of available ones, each with its inherent pros and cons.
The adoption and effective implementation of these advanced technologies require a comprehensive decision framework that prioritizes problem-solving and aligns with an organization’s broader strategic goals, ensuring sustainable value creation and protection. One such promising approach is the Retrieval Augmented Generation (RAG) method, which utilizes vector databases to efficiently store and retrieve private data further differentiating to drive competitive advantage. This method is instrumental in converting voluminous raw text into algorithm-readable vector embeddings, acting as the backbone for producing coherent and contextually relevant responses in diverse applications. Despite its advantages, it demands meticulous attention to data privacy, quality, biases, and ethical considerations to mitigate potential risks to privacy and intellectual property. Organizations embarking on this journey must be well-equipped to navigate through these complexities to harness the full potential of Generative AI in a responsible and secure manner, thereby achieving enhanced enterprise differentiation and competitive advantage.
In today’s dynamic business landscape, aligning risk management with strategic decision-making is crucial.
At Adaptive.AI, we stand as pioneers, guiding companies to navigate confidently across the Generative AI landscape, from pinpointing strategic use cases to actual deployment. We emphasize maintaining a balance between varied adoption approaches, seamlessly integrating with privacy and security controls. It is our commitment to foster both performance and trust, serving as the cornerstone in the responsible and effective advancement of AI technologies.