This blog post focuses on the architecture and deployment of the AI-powered Chatbot solution that we described in our earlier blog post Custom AI Chatbot Development Using RAG. This solution is built for the AWS cloud but uses OpenAI as an LLM. We’ll dive deep into the CI/CD process and AWS deployment architecture of the backend application.

AWS Deployment Architecture

Our solution represents a rather typical configuration of a web project deployed to AWS.

Click or tap on the picture to open it in full size

Amazon Route53 directs traffic to Amazon CloudFront, ensuring the swift and reliable delivery of static files to our users. CloudFront enhances the loading speed of our administrative panel and chatbot interface web pages by caching their content close to the user. The files necessary for the web pages of the admin panel and chatbot client interface are stored and retrieved from two separate Amazon S3 buckets.

The Application Load Balancer distributes incoming traffic across Amazon EC2 instances, enhancing system availability and reliability while protecting the backend from potential attacks and overloads by concealing the direct IP addresses of the servers. This is also crucial for scaling the system, as it allows for load distribution across an increasing number of instances as traffic grows.

Our server application, AIChat-Backend, runs on an Amazon EC2 instance t3-large, written in NestJS. EC2 also hosts a Neo4J database if it is chosen as the knowledge base like in our solution. Each component on EC2 operates in its own isolated container, facilitating easy and flexible component management and a smooth transition to more scalable solutions like Amazon Elastic Kubernetes Service (EKS) or Amazon Elastic Container Service (ECS) with increased load.

The PostgreSQL database in Amazon RDS, equipped with the pgvector plugin, is used both for storing data in vector format and for storing various non-vector auxiliary data that supports the backend application’s operation. If Neo4j is used as the knowledge base, then PostgreSQL is only used for auxiliary functions.

We use Amazon ElastiCache for Redis as a high-performance solution for storing our chatbot conversations. This system excels during periods of high demand by automatically scaling up, through the addition of cache nodes, to manage sessions more efficiently.

Amazon ECR (Elastic Container Registry) plays a crucial role in our deployment process, as it provides a secure, scalable, and reliable repository for Docker images of our backend applications. When it comes to handling high-demand applications, we switch to using Amazon EKS or ECS to manage containerized applications at scale. These services offer powerful tools for container orchestration, allowing us to automatically scale applications in response to changing loads, thereby ensuring high availability and performance.

Our chatbot is powered by the GPT-4 Turbo as its large language model (LLM). For content vectorization, we utilize the most recent version of OpenAI’s embedding model.

Finally, Amazon CloudWatch acts as our observatory, logging application performance metrics and setting up alerts for quick response. For example, if our EC2 instance’s resources, such as memory or CPU, are under prolonged heavy usage, CloudWatch alerts us so we can take immediate action.

CI/CD Pipeline for RAG Solution

The term CI/CD (continuous integration and continuous delivery) fundamentally denotes the method of using automation to consistently integrate code and deploy it to a production environment. Our CI/CD solution for the chatbot is designed for deployment across any cloud provider’s infrastructure, guaranteeing a vendor-independent setup. Our scripts and utilities together form the core intelligence of the CI/CD pipeline. This central command orchestrates the smooth operation of the services we utilize from your chosen cloud provider. It’s akin to a processor adept at smoothly controlling peripheral devices from various manufacturers. Let’s examine the process of deploying our backend application on the AWS cloud.

Click or tap on the picture to open it in full size

Source Code and Environment Files

The application’s source code and environment files, containing environment variables, are stored in the GitLab version control system. Environment variables are settings for the container where the application runs, including passwords, database access keys, external service keys, connection settings, and much more, used when the application connects to databases and other external resources. These sensitive details must be stored separately from the source code for security purposes. They are specifically managed within GitLab’s CI/CD Variables feature. This ensures secure handling and injection into the CI/CD process without being part of the application’s build image. This approach, ensuring the image can be shared across different environments or stored in repositories without compromising the security of these details, greatly lowers the risk of data breaches.

Build

During the build phase, the backend application’s source code is converted into executable code. This code, together with its dependencies, is then packaged into a Docker image using the kaniko utility. The Docker images are tagged according to product versions and uploaded to Amazon ECR. Here, they undergo compression, encryption, and are protected by access control measures.

Managing and Delivering Environment Files

Our custom-developed utilities, Envmake, Envsubst, and Setsubst, provide a secure way to manage the sensitive content of environment files. They automate the previously error-prone, complex, and manual process of managing sensitive data. These tools automatically remove confidential environmental variable values from the application image and insert them into environment files. As a result, the application image is stored in the image repository without the confidential environmental variable values.

After preparing the environment files, we must send them safely to the application’s setup location. Our custom-developed tools—Invoker, Broker, and Puller—assist in this process and handle other CI/CD tasks. The invoker forwards the files to the Broker, which holds them until the Puller retrieves them. Puller then pulls the specific Docker container from Amazon ECR to EC2 and sets it up with these files.

Deployment

Once uploaded to EC2, the Docker container starts the deployment of the backend application. Initially, it configures itself using environment variables. For the PostgreSQL database on AWS RDS, we use TypeORM within our backend to apply necessary changes automatically, ensuring the database schema matches the application’s needs. The Neo4j database container integrates with EC2, deploying with each new backend instance. The Neo4j Desktop Application simplifies version management and database upgrades.

Financial Considerations of LLM-Based Solutions

One might naturally inquire about the expenses associated with utilizing AI resources, including LLMs and vectorization services (the process of creating embeddings). We encourage you to explore our specialized blog post on this topic.

Scale Up Efficiency with Chatbot in Knowledge Sources

Our solution is built to flex. It adapts to your workflow and data, no matter how unique. No vendor lock-in. We use our own tools for a fast and affordable CI/CD that works anywhere. It’s easy to use – minimal setup needed. Our chatbot solution leverages knowledge of your business to drive results. Together, we’ll design the ideal setup to crush your goals and solve your operational challenges.

Let’s talk! We’ll tailor a presentation to your needs, answer your questions, and show you how our solution can take you further. Contact us to schedule a meeting.

Let's schedule a presentation of the solution