Custom data migration solution for a complex Oracle to PostgreSQL migration project using AWS DMS

21 Nov 2022 Alexander Gora

In this post we’d like to share with you our experience in delivering a sophisticated database migration solution on top of AWS DMS for one of our large enterprise customers. The client is a large technology corporation that provides software solutions and services for telecom giants and mid-size companies. They ran numerous Oracle databases where they stored billing and other financial data. In an effort to cut costs for operational infrastructure, the company started the migration of their Oracle systems to PostgreSQL RDS in the AWS cloud. The goal of this migration was to take advantage of AWS, which brings improved data control functionalities and enables their end customers to cut costs required for database maintenance.

The ABCloudz team became a perfect technology partner for implementing the client’s migration plans. We have two decades of database migration experience thanks to the legacy of DB Best, where we built custom solutions, migration products and practices. We also built good relationships with the customer in multiple earlier projects. After considering successful experience of multiple migration projects, the customer approached ABCloudz team to help orchestrate and execute a complex data migration from on-premises Oracle database to PostgreSQL hosted on AWS.

Read this article to know how the ABCloudz team has created an efficient solution for one of the most complex database migrations from Oracle to PostgreSQL.

Data transformation challenge

As part of the data migration step from Oracle to PostgreSQL, the customer required a significant database schema and data transformation. Primarily, due to table structure differences between Oracle and PostgreSQL. The customer required some target database tables to be merged and wanted to expand some of them with additional columns.

AWS DMS provided basic capabilities for such transformations, but it had issues with merging two interconnected Oracle data tables into a single PostgreSQL data table. At that time, DMS did not have the functionality required for on-the-fly data calculation using complex select statements.

Finally, the client needed to migrate data during a very limited downtime window. That’s why they wanted data validation to be as fast as possible. Meanwhile, AWS DMS failed to validate all data during the downtime window.

That’s why we had to come up with a custom solution that transforms the client’s data tables and can migrate terabytes of data from Oracle to PostgreSQL during a system downtime lasting no longer than several hours.

Creative solution

Our database migration team started working on the project with a thorough analysis of the customer’s infrastructure. We determined that some data sets from the client’s Oracle database could be migrated to RDS PostgreSQL on AWS without any transformation. Meanwhile, specific data sets required substantial transformations and Change Data Capture (CDC) tracking.

Two phases of the solution

The solution included two parts. The first part applies to initial data migrations. It goes about creating a secure EBS snapshot of the existing Oracle database. The clone of the original Oracle database is saved on Amazon EC2. After that, the initial data load utility built with Python transfers all the data from this copy to RDS PostgreSQL on AWS according to the requirements of the target database.

The second part of the solution addresses the real-time data transformations that operate with CDC and implement them to the target PostgreSQL database. This process looks the following way:

  1. An EBS snapshot of the Oracle database is created.
  2. AWS DMS runs CDC Replication-Only tasks and starts monitoring of the changes that should be introduced to the Oracle database snapshot. It reviews these changes according to the PostgreSQL configuration file.
  3. The DMS Replication-Only task captures data changes and saves CDC records to an S3 bucket which serves as a staging area. All these records are deployed to S3 bucket chunk-by-chunk and stored in the CSV format.
  4. Custom Python utility takes new data files from S3 bucket and imports data from them into the target PostgreSQL database with additional data and schema calculation and transformations. Instead of deploying all the data line-by-line we arranged a chunk-by-chunk principle, which enhances performance by up to 70 times. The problem of the line-by-line approach is that some primary key transactions may include multiple updates, each of which should be processed. Meanwhile, with a chunk-by-chunk approach, we migrate large chunks of unprocessed data to the target DB, while Python utility focuses only on the latest updates in all primary key transactions. As a result, with some files, Python utility makes only one operation for the latest update instead of processing up to 100 updates.
  5. Custom data validation tool which compares data for all tables between Oracle and PostgreSQL using the same data transformations as the Python utility.

You may see both parts of the solution on the scheme below:

Future use of this solution

As a result of our effort, the client received:

  • Custom solution that conducts complex database migration with transformation during the client’s system downtime.
  • Custom practice that enables them to conduct database migrations independently. The client already conducted 3 successful migrations with our solution.
  • Powered with a custom Python utility solution that not only migrates data, but also transforms data and schemas according to new requirements. This option was unavailable in AWS DMS and any other non-custom practice.
  • Bulk-by-bulk approach to CDC data deploy to the target PostgreSQL database, which is 70 times faster than the traditional line-by-line deploy.
  • Different modes in which the solution can work: full load migration, delta load migration (partial migration with transformations), and data validation.
  • Transition to AWS, which enables the client to cut their database maintenance costs by at least 30%.

The solution worked so well that the customer used it in multiple migrations with very limited modifications. We believe it should be one of the showcases that demonstrate the potential uses of AWS DMS.

Use the advantage of our database migration solutions

The ABCloudz team of specialists has an outstanding expertise in database migration and uses a broad range of custom practices to conduct the most sophisticated database transitions.

We have a rich portfolio of successful database migration involving PostgreSQL.

For example, we helped a German electronics company migrate their existing database from on-premise PostgreSQL to cloud-based PostgreSQL. The ABCloudz team worked with significant data loads and data flow disruptions. These issues followed the crash of the customer’s system, which had occurred just before we started our cooperation with them.

After this, the customer realized the need to expand the capacity of the target database in order to correspond to massive data flows.  Eventually, our team of database specialists managed to stabilize the client’s data flow by adding an extra node worker to their infrastructure. This made the migrated infrastructure more consistent and efficient.

This and many other examples show that we are ready to tailor our custom solutions to the needs of any client, regardless of the industry. Contact us if you want to embrace the potential of database migration and take maximum advantage of AWS and other database technologies.

Ready to start the conversation?