Calculate the Total Cost of a RAG-Based Solution

This blog post examines the operational expenses of the AI-powered chatbot solution introduced in our previous blog posts, Custom AI chatbot using RAG and Deployment of AI-Powered Chatbot. This solution operates on the AWS cloud while integrating OpenAI’s language model. We will discuss the financial aspects of infrastructure costs and analyze how to estimate expenses for AI resources, such as Language Model as a service.

Cost Structure of a Custom AI Chatbot Solution

Operating expenses for our AI chatbot solution fall into two primary categories: infrastructure and AI resources. These costs will vary depending on your system’s scale and usage frequency.

Cost of Infrastructure in AWS Cloud

The basic configuration of backend application components we recommend for deployment on AWS is tailored to support up to 10,000 standard chatbot user sessions per day. Below, you will find the specifications of its components and their approximate costs at a relatively high load.

Basic Configuration

If you need support for a higher load, we are ready to help you choose the best component configuration that aligns with your needs and remains cost-effective. AWS and other cloud providers provide significant flexibility for this purpose.

For example, an infrastructure configuration designed to support up to 60,000 users per day is presented below. This setup utilizes higher-performance Amazon EC2 and AWS RDS instances, expands the database deployment across multiple Availability Zones rather than just one as in the basic configuration, and includes increased storage capacity and greater outbound traffic capabilities.

The higher-performance configuration

These are rough estimates, subject to change depending on the amount of resources used. Choosing Neo4j as your main database, with PostgreSQL performing supplementary roles under light load, can notably decrease the operational expenses for both standard and advanced setups. For both scenarios, the expense associated with AWS RDS for PostgreSQL constitutes a significant part of the total budget.

AI Costs: LLM as a Service Calculations

Let’s peel back the layers and explore the core of our AI expenditure, exemplified by our AI Chatbot project leveraging GPT-4 Turbo.

At the heart of our cost analysis is the AI’s token economy. Tokens are the currency of AI’s linguistic capabilities; each token represents a piece of the puzzle in understanding or generating human language. The OpenAI tokenizer effectively illustrates the concept of tokens. The price of a Language Model as a Service (LMaaS) depends on the volume of input tokens it processes and the quantity of output tokens it generates. Generally, there are less output than input tokens, and the output tokens tend to be costlier. For instance, under the GPT-4 Turbo pricing model, every 1000 input tokens are billed at $0.01, whereas 1000 output tokens have a price tag of $0.03.

The number of tokens transmitted is influenced by several factors: the daily user count, the number of requests from each user, and the number of tokens expended per request. The latter factor warrants special attention for a more in-depth examination. Upon receiving a user query, the backend system starts a sequence of interactions with the Large Language Model (LLM) to jointly craft a response. The size of these messages, measured in tokens, determines the LLM’s operational costs. Let’s examine a typical cycle of processing a single user request from this perspective. In this example, we’ve taken the token counts from a real-life case and rounded them for simplicity and clarity.

1. User Question: For an example, let’s take a short question of 10 tokens like “Do you have experience with AI or ML projects?”.

2. Backend Instructions to LLM: Upon receiving the user’s question, the backend formulates a query to the LLM consisting of the user’s question and instructions to guide the LLM, ensuring precision and clarity in the task at hand. This request adds 1000 input tokens to our pool.

3. LLM Calls a Specific Tool: LLM returns to the backend a 100-token response, which includes a call to a specific Tool. Think of tools as a set of commands that the backend application can execute upon request from the LLM. In our case, such a command would be the search for projects in the portfolio relevant to the AI and ML themes that the user is inquiring about.

4. Backend Retrieves Data: Executing the request from the LLM, the backend retrieves from the knowledge base data about the found portfolio projects relevant to the projects and forms from them an augmented context for LLM of 9000 tokens in length.

5. LLM Generates Output: Finally, the LLM synthesizes the information and produces a 700-token response, encapsulating the information sought by the user. It should be noted that at this stage, the LLM may return to step 3 if it deems it necessary to request additional context.

Let’s now calculate the total number and cost of the input tokens and separately for the output tokens, keeping in mind that they have different costs. We will add up their costs to determine the final cost of processing this user request.

Input tokens: 1000 + 9000 = 10000 (tokens) = 0,1 USD (at a rate of 0.01 USD per 1000 tokens)
Output tokens: 100 + 700 = 800 (tokens) = 0,024 USD (at a rate of 0.03 USD per 1000 tokens)

Total: 0,1 + 0,024 = 0,124 USD

Taking this processing cycle’s token counts as average, let’s use them to estimate the cost of using certain Language Model as a Service offerings from OpenAI and AWS Bedrock. Assuming an average of 5 requests per user session and 10 users per day (totaling 50 requests daily), we find the following cost scenario based on prices* and LLM versions at the time of writing this blog post.

*Prices may change over time. Please check the current pricing for OpenAI and Amazon Bedrock.

In our website’s chatbot, we use GPT-4 Turbo because it surpasses other LLMs in intellectual performance. However, if more cost-effective models can effectively manage tasks, there’s no need to pay extra for a more advanced model. Reach out to us, and we’ll find the LLM that best fits your requirements.

AI Costs: Embedding Model as a Service Calculations

Embedding models are significantly less expensive than LLMs and usually represent a minor fraction of the overall budget for AI resources. For those who are not yet familiar, embeddings convert text and images into a unique numerical format, capturing their essence. This transformation lets computer algorithms efficiently identify text elements with similar meanings, regardless of word differences. When processing a search query, we convert it into this numerical representation using an embedding service, searching for the closest semantic matches in our knowledge base, also stored numerically. We incur embedding costs whenever we add or update content in the knowledge base and for every user query.

Unlock the Power of Your Business Knowledge

We help you find the perfect Language Model as a Service (LaaS) – one that’s both smart and affordable. Our app lets you choose between top-notch performance or maximum cost savings, or even find the sweet spot in between, depending on your workload.

Let’s use your knowledge base to drive real business results. We’ll work with you to design a custom setup that tackles your specific challenges and goals.

Ready to see how it works? Let’s kick things off with a personalized demo to answer your questions and show you the power of our solution. Contact us today to schedule a meeting!

Let's schedule a presentation of the solution