AWS cost optimization is both an exciting opportunity and a challenging necessity for growing businesses. One of our clients, a provider of a niche SaaS solution for 35 partners, approached us with a critical problem: their AWS costs were rising steadily, but this increase was not directly proportional to the addition of new clients. Instead, the cost growth primarily stemmed from resource consumption by their existing clients.
The client needed us to analyze their architecture and cost structure to identify optimization opportunities. After a thorough analysis, we uncovered several inefficiencies and proposed actionable solutions that reduced their AWS spending by 38%. This was achieved by optimizing their resource consumption model and implementing targeted architectural improvements — all without compromising performance.
Read on to discover how we applied proven strategies and best practices to tackle the core challenges at hand.
Challenges of a growing SaaS infrastructure
Our analysis revealed several critical challenges in the client’s existing AWS environment. As their SaaS platform grew, inefficiencies, redundancies, and escalating costs began to undermine the scalability and performance of their operations. These challenges required immediate attention:
- Spiraling costs: The AWS bill had grown to over $800,000 annually due to a reactive approach to resource allocation. Resources were added as needs arose, without optimization, leading to significant waste. At the start of our engagement, the monthly AWS infrastructure costs were broken down as follows:
- Operational complexity: Managing multiple environments, accounts, and deployment streams introduced inefficiencies and frequent misconfigurations.
- Deployment bottlenecks: The process of deploying and testing new features became increasingly slow, hindering the client’s ability to respond quickly to partner requests.
- Security risks: The growing complexity of the infrastructure exposed vulnerabilities in access control and compliance measures, increasing the risk of potential breaches.
- Inefficient storage use: Storage costs surged due to the lack of clear strategies for data management, resulting in unnecessary expense.
At the root of these issues was an unstructured and reactive approach to scaling. Resources were added on an as-needed basis without a cohesive, long-term strategy. This lack of planning created inefficiency, redundancy, and skyrocketing expenses, which we were tasked with addressing.
Before diving into the solutions we implemented to address these challenges, let’s first take a closer look at the system architecture we were tasked with optimizing.
Multi-tenant SaaS architecture
The client’s SaaS solution leverages a multi-tenant architecture, enabling efficient management of multiple tenants, each with its own unique configuration. A unified API serves as the central access point for all users across tenants, streamlining access and ensuring scalability. Each user is assigned an ID linked to a specific tenant, which enables resource isolation and targeted access. When a user request is processed through the unified API, the ConnectionManager dynamically identifies the associated tenant based on the user’s ID and allocates the appropriate resources for that tenant.Each tenant operates within a dedicated AWS account to ensure data isolation and security. While the overall architecture remains consistent across tenants, the flexibility of the system allows for customization to meet the specific needs of each partner. The following diagram illustrates the architecture of a single tenant within this system. It highlights the resource isolation, CI/CD processes, and functional update deployment mechanisms.Now that we’ve explored the architecture, let’s dive into how we optimized compute resources to cut costs and boost efficiency.
Maximizing compute efficiency in AWS
The optimization journey began with a deep dive into resource utilization to eliminate inefficiencies and continued with leveraging AWS’s cost-saving mechanisms for compute instances. We grounded these efforts in two key strategies: we right-sized infrastructure to eliminate waste by aligning resources with workload needs, and leveraged Reserved and Spot Instances to achieve significant cost savings through optimized resource commitments and flexible application design.
Right-sizing infrastructure to match actual needs
One of the first problems we tackled was overprovisioning. Many of the client’s instances and container tasks were oversized, consuming far more resources than necessary. Through a detailed analysis of their workloads, we implemented a right-sizing strategy:
Adjusting instance sizes: By reviewing CPU, memory, and storage utilization, we identified opportunities for downsizing. For example, a database instance running at 15% utilization was resized to a smaller instance. Building on this approach, we also optimized the CPU and memory limits of ECS tasks to better align with actual workload requirements. Tasks previously configured with 4 vCPU and 8 GB of memory were reduced to 2 vCPU and 4 GB of memory, eliminating unnecessary resource usage and directly reducing costs, ultimately saving hundreds of thousands annually.
Autoscaling for dynamic demand: To handle traffic spikes efficiently, we introduced autoscaling groups for EC2. These groups automatically increased capacity during peak periods and scaled down during idle times, ensuring optimal resource usage and minimizing waste. For ECS, we implemented Service Auto Scaling and Task Placement Strategies to dynamically manage the number of running tasks. During non-peak hours, the number of ECS tasks was scaled down to the necessary minimum, reducing costs while maintaining performance for lower traffic levels.
Right-sizing isn’t about cutting blindly — it’s about deeply understanding workload patterns and balancing peak capacity with efficient idle usage. The goal is to make the infrastructure invisible — always working perfectly while avoiding unnecessary costs.
Saving up to 90% with reserved and spot instances
With right-sizing in place, we turned our attention to compute costs. We leveraged a combination of Reserved Instances and Spot Instances to maximize savings:
Reserved Instances: For predictable workloads, we helped the client commit to long-term capacity, cutting costs by up to 66%. This required precise planning, using historical data to forecast capacity needs.
Spot Instances: For stateless applications that could tolerate interruptions, we transitioned to Spot Instances, reducing costs by up to 90% compared to on-demand pricing. To make this work:
- ECS tasks on Spot Instances: For ECS clusters running on EC2 (using the EC2 Launch Type), we replaced On-Demand instances with Spot Instances as the underlying infrastructure. This approach maintained service availability through proper horizontal scaling and fault-tolerant design.
- Horizontal scaling: We distributed workloads across multiple Spot Instances, ensuring operations continued uninterrupted even if one instance was terminated.
Spot Instances require a deep understanding of application behavior. It’s crucial to adapt applications to handle interruptions, and this often means rethinking their design entirely. In the context of ECS, this meant ensuring that tasks were stateless and capable of being rescheduled seamlessly on new instances.
Impacted resources:
- Elastic Compute Cloud (EC2): $30 500 → $15 125
- Elastic Container Service (ECS): $24 000 → $17 000
Net monthly reduction: − $22 375
Optimizing storage by managing hot and cold data
Storage costs are often overlooked in cloud optimization, but they can account for a significant portion of the bill. For our client, we implemented a tiered storage strategy:
Categorizing data: Frequently accessed “hot” data was stored on high-performance (and higher-cost) tiers, while rarely accessed “cold” data was moved to cheaper options like Amazon S3 Glacier.
Caching layers: To reduce reliance on storage IOPS (Input/Output Operations per Second), we added caching mechanisms. For example, we introduced Redis as a caching layer for database queries, which significantly cut storage-related costs.
Backup optimization: A revised backup strategy reduced duplication and ensured that only necessary data was retained.
Impacted resources:
- Elastic Block Storage (EBS): $5 250 → $2 875
- S3: $100 → $400
- ElastiCache: $875 → $1 725
Net monthly reduction: − $1 225
AWS offers a variety of storage options, each with different performance levels and costs. The key to cost efficiency is matching data to the appropriate storage type — don’t keep cold data on expensive SSDs.
Optimizing data transfer to cut costs
Data transfer costs can significantly impact cloud expenses, especially in multi-tenant architectures. By implementing focused optimizations, we reduced our client’s data transfer costs from $2 750 to $875 per month — a 68% reduction.
Minimizing inter-region and inter-AZ traffic: We reviewed the architecture to reduce cross-region and cross-AZ traffic, which incurs additional charges:
- We co-located resources like EC2 and S3 by deploying them in the same region and availability zone wherever possible. This approach minimized data transfer costs associated with cross-region and cross-AZ traffic.
- We applied selective Multi-AZ replication only for critical data, ensuring reliability while minimizing unnecessary replication and its associated costs.
Reducing repetitive data transfers with caching: We used ElastiCache for database queries by caching frequently accessed queries with Redis, which significantly reduced traffic to the underlying databases hosted on EC2 and minimized redundant data transfers.
Impacted resources:
- Data Transfer: $2 750 → $875
Net monthly reduction: − $1 875
Data transfer optimization is often overlooked but is a critical component of managing cloud costs effectively. With the right practices, businesses can ensure that their data moves efficiently and cost-effectively, unlocking significant savings.
Automation and observability to control costs
We made automation and monitoring central to the client’s cost strategy:
Automation: We rebuilt the deployment pipelines using a CI/CD approach, transforming both the speed and efficiency of the deployment process. Bootstrap deployment times were reduced from 50 minutes to 2 minutes, a 25x improvement, while regular deployments were accelerated from 18 minutes to just 30 seconds, a remarkable 36x increase. These improvements were achieved through the unification and simplification of CI/CD scripts, reducing their count from 12 to 1 and integrating them seamlessly with the API deployment processes.
Observability: We implemented real-time dashboards and alerts, giving stakeholders clear visibility into costs, resource usage, and performance to enable proactive decision-making.
Automation impacts not just infrastructure costs but the entire software development lifecycle, improving efficiency, reducing errors, and freeing teams to focus on delivering value.
Real results in four months
By implementing these strategies, we delivered transformative results for the client:
- Cost per environment reduced by 50%.
- Bootstrap deployment speeds improved 25x (from 50 minutes to 2 minutes).
- Regular deployment speeds improved 36x (from 18 minutes to just 30 seconds).
- Post-deployment troubleshooting time reduced 8x (from 2 hours to 15 minutes).
- Security scores improved to 90% compliance with AWS best practices.
- AWS setup time cut by 3.5x, enabling faster scaling.
Despite adding four new partners (and their associated tenants) to the client’s infrastructure during the four-month project, we achieved a 38% reduction in monthly AWS costs, as shown in the table below:This reduction highlights the impact of optimizing resource consumption and implementing architectural improvements, even as the client’s operations continued to grow. While the costs of certain resources increased slightly, this was a strategic trade-off that enabled significant cost savings in other areas.
Cost optimization is about more than saving money
Cloud cost optimization isn’t just about cutting costs — it’s about building infrastructure that scales efficiently, performs reliably, and supports your business goals. For our client, optimization unlocked agility, reduced operational overhead, and laid the foundation for sustainable growth.
Cost optimization is about spending smarter, not less. When done right, every dollar spent on the cloud drives measurable value for the business.
Rely on professionals to cut AWS costs
If your cloud infrastructure feels like it’s running you, rather than the other way around, ABCloudz is here to help. Contact us today to start building a cost-efficient, high-performing AWS environment tailored to your business needs.