Stop the stress. Learn the secret to moving live OLTP databases of any size with zero downtime and no data loss

As a data engineer, you know the feeling: the pit in your stomach when you’re told to “just move the database.” A live OLTP (Online Transaction Processing) database, the heartbeat of your business, is a ticking time bomb during migration. One wrong move and you’re facing a catastrophic outage, angry users, and a very bad day.

But what if I told you there’s a proven playbook to migrate a critical, constantly-changing data source to a new home in the cloud without a single hiccup? This isn’t about moving files; it’s about orchestrating a seamless, real-time transition from one live system to another.

This guide will serve as your ultimate playbook, breaking down the scenarios, cutting-edge tools, and insider best practices for live OLTP migrations, from a tiny database to a multi-terabyte distributed monster.

The Core Principle: The CDC Dance (Your Secret Weapon)

The secret to a zero-downtime migration is Change Data Capture (CDC). It’s a two-phase process that turns a risky leap into a graceful dance:

  1. Full Load (or “Backfill”): A one-time bulk transfer of all existing data from the source to the target.
  2. CDC (or “Replication”): A continuous, real-time stream of all new changes (inserts, updates, deletes) from the source to the target. This phase runs concurrently with the full load and continues until the final cutover.

This powerful combination allows your application to keep writing to the source database while the new database catches up. Once the target is in sync, you simply update your application’s connection string to point to the new cloud database, and just like that; you’re done.

Scenario 1: The Small & Mighty – Less Than 100 GB

For smaller databases, the migration is less about pure bandwidth and more about precision and execution. You can do this with minimal fuss.

Source: Single MySQL or PostgreSQL instance (<100 GB)

Target: Google Cloud SQL, AWS RDS, Azure Database for PostgreSQL/MySQL

Solution: Your Cloud Provider’s Native Migration Service

The major cloud providers have built incredible, guided tools for this exact scenario. They are your first and best bet for a reliable migration.

  • Google Cloud Database Migration Service (DMS):
    • How it works: For homogenous migrations (e.g., MySQL to Cloud SQL for MySQL), Google’s DMS is a no-brainer. It performs a backfill and then uses MySQL’s binary log (binlog) or PostgreSQL’s logical replication (via a publication and subscription model) to stream changes.
    • Cost: Homogenous migrations to Cloud SQL for MySQL or PostgreSQL are offered at no additional charge for the service itself. You only pay for the underlying compute and network resources used.
    • Pro-Tip: Make sure your source database has binary logging (MySQL) or logical replication (PostgreSQL) enabled and configured correctly before you even start. This is a critical prerequisite.
    • Security: DMS supports private connectivity, ensuring your data is not exposed to the public internet during the transfer.
  • AWS Database Migration Service (DMS):
    • How it works: AWS DMS is a managed service that helps you migrate databases to AWS quickly and securely. It supports both homogenous and heterogeneous migrations and uses the same CDC mechanisms (binlog and logical replication).
    • Cost: You pay for the replication instance that runs the migration and for the data processed. The pricing is tiered, so costs per GB decrease as volume increases.
    • Pro-Tip: Right-size your replication instance! For a <100 GB database, a smaller instance will suffice and save you money.
  • Azure Database Migration Service:
    • How it works: This service provides a streamlined migration path to Azure’s managed database services. It supports online (CDC-based) migrations for minimal downtime.
    • Pro-Tip: Utilize the dedicated migration tools and ensure you meet all prerequisites, such as firewall configurations and user permissions.

Scenario 2: The Small MongoDB Migration (<100 GB)

Source: MongoDB replica set (<100 GB)

Target: Azure Cosmos DB for MongoDB vCore, MongoDB Atlas on GCP/AWS

Solution: Cloud Provider’s Live Migration Service or Native Tools

  • Azure Cosmos DB for MongoDB (vCore):
    • Solution: Use the Azure Cosmos DB Migration tool, which supports both online (live) and offline migrations. The online migration uses the MongoDB change stream to replicate real-time changes.
    • Benefit: This offers a user-friendly interface and can be deployed privately within your virtual network for enhanced security.
  • MongoDB Atlas Live Migration:
    • Solution: This is a fantastic tool for moving from a self-managed MongoDB replica set to a fully managed Atlas cluster. It’s a wizard-driven process that handles the full load and then continuously tails the oplog (operation log) of the source primary.
    • Pro-Tip: Your source must be a replica set, as the oplog is crucial for this seamless process.

Scenario 3: The Mid-Sized Migration (1 GB to 200 GB)

At this scale, you’ll still rely on the same CDC principles, but with a sharper focus on performance, monitoring, and robust pipelines.

Solutions: Cloud Provider DMS + Advanced CDC or Third-Party Tools

For this size, the native cloud provider services are still incredibly capable. You might need to scale up your resources, but the process is largely the same.

  • AWS DMS and Google Cloud DMS can handle these sizes efficiently. You may need to provision a larger replication instance to handle the initial backfill and ongoing change stream without bottlenecking.
  • Third-Party Tools: This is where you might start looking at tools like Fivetran, Airbyte, or Qlik Replicate. These managed services offer a wider range of connectors and are often easier to set up and manage than a self-hosted CDC solution.

The Hidden Cost: Bandwidth & Egress

  • Bandwidth Cost (Egress): This is a critical factor. Cloud providers charge for data leaving their network (egress). If you’re migrating from on-prem to cloud, you’ll pay egress fees for your current hosting provider. If you’re migrating between cloud regions or providers, you’ll incur inter-region or egress charges.
    • Example (AWS): Data transfer out from EC2 to the internet can be around $0.09 per GB for the first 10 TB.
    • Example (Google Cloud): Database Migration Service for heterogeneous migrations charges per GB processed, with tiered pricing.
  • Strategies to Crush the Cost:
    • Compression: Compress your data before transfer to reduce the volume.
    • Private Interconnect: Use a dedicated network connection like AWS Direct Connect or Google Cloud Interconnect to bypass the public internet and get a lower, flat data transfer rate.
    • Schedule Transfer: If you can, schedule the backfill during off-peak hours to avoid competing with production traffic.

Scenario 4: The Enterprise-Scale Apocalypse (1 TB, 10 TB, 200 TB+)

This is where the rubber meets the road. Simple tools won’t cut it. You need a robust, scalable, and highly configurable CDC solution.

Solutions: Open-Source CDC (Debezium), Cloud-Provider Specific Solutions, or Enterprise-Grade Third-Party Tools

1. The Open-Source Titan: Debezium

  • What it is: Debezium is a suite of distributed connectors built on Apache Kafka and Kafka Connect. It reads the transaction log (binlog, WAL) of your source database and publishes row-level changes as events to Kafka topics.
  • Your Pipeline: Source DB -> Debezium Connector -> Kafka -> Kafka Connect Sink Connector -> Target DB
  • Pros: Scalability (inherits Kafka’s power), Flexibility (full control over the pipeline), Open Source (free to use).
  • Cons: Requires significant engineering effort to set up and maintain a production-grade cluster.
  • Use Case: Use Debezium for complex, heterogeneous migrations or when you need a decoupled, event-driven architecture.

2. The Cloud Giants’ Solutions:

  • AWS DMS: For large-scale migrations, AWS DMS is a strong contender. It’s a managed service, so you don’t worry about infrastructure. It can scale its replication instances to handle terabytes of data.
  • Google Cloud Datastream (A Must-Know):
    • What it is: Datastream is a serverless, managed CDC service. It streams data changes in near real-time from a source database to a destination.
    • How it works: You set up a connection profile and a stream, and Datastream automatically performs the backfill and then continuously reads the source’s transaction logs.
    • Pricing: Based on gigabytes processed. The first 500 GB of backfill per month is free.
    • Why you need it: It’s perfect for streaming from a self-hosted database to a managed cloud database or to a data warehouse like BigQuery. It’s serverless, so you don’t manage any infrastructure.

3. The Enterprise-Grade Heavyweights:

  • Qlik Replicate (formerly Attunity Replicate): An industry leader. It’s a premium tool with a GUI, robust monitoring, and high-performance replication for heterogeneous databases.
  • Striim: A real-time data streaming and integration platform that can capture changes and stream them to various targets.

Live Migration for Other Key Databases

Microsoft SQL Server

  • Native Replication: Use Transactional Replication, which captures transactions from the database transaction log. Set up a “Publisher” (source), a “Distributor” (replication manager), and a “Subscriber” (target).
  • Cloud Solutions:
    • Azure SQL Database: The Azure SQL Migration extension for Azure Data Studio is a powerful wizard that simplifies the process using transactional replication.
    • Google Cloud SQL for SQL Server: Google Cloud’s DMS supports SQL Server migrations by exporting full database backups and transaction log files to a Cloud Storage bucket, which DMS automatically applies to the target.

Oracle

  • Native Replication: Oracle GoldenGate, the enterprise-grade replication product. It’s the gold standard for Oracle migrations and real-time CDC.
  • Cloud Solutions:
    • AWS DMS: Fully supports Oracle as a source, leveraging CDC.
    • Google Cloud DMS: Supports Oracle to Cloud SQL for PostgreSQL, MySQL, and SQL Server.
    • Oracle Cloud Infrastructure (OCI): Oracle’s own cloud offers a seamless migration using its Database Migration service.

SAP (with SAP HANA or ASE)

  • The SAP Caveat: Migrating SAP is a unique beast. You have to navigate complex licensing and specific technical requirements.
  • SAP HANA System Replication: This is SAP’s native replication technology for HANA databases. It’s a powerful tool for disaster recovery and can be leveraged for live migrations.
  • Specialized Tools: For SAP ASE, you can use SQL Server Migration Assistant (SSMA) for SAP ASE to migrate to Azure SQL Database.
  • Specific Requirements: As mentioned, for high-performance SAP HANA workloads on Google Cloud, you might need to use Bare Metal Solution for licensing compliance, as live migration isn’t supported for these instances during maintenance events. This is a non-negotiable architectural requirement.

The Ultimate Safety Net: Backups During Migration

While CDC minimizes downtime, backups are your ultimate insurance policy against data corruption or human error. You should absolutely be taking backups before, during, and after the migration.

Best Practices for Backups during a Live Migration:

  • Take a Full Backup Before You Start: Before you even initiate the full load, take a complete, verified backup of your source database. This is your known-good starting point.
  • Continue Your Regular Backup Schedule: Do not stop your existing backup schedule (e.g., nightly full backups, hourly transaction log backups). These are independent of the migration process and are your continuous recovery points.
  • Don’t Rely on the Full Load as a Backup: The initial backfill of the migration is a copy for a specific purpose; it’s not a transactional, point-in-time backup. It’s better to rely on your verified, traditional backups for recovery purposes.
  • Verify Backups: After taking a backup, restore it to a staging environment to ensure it is valid and can be used for recovery. A corrupt backup is worse than no backup at all.
  • Store Backups in a Different Location: Always store your backup files on a separate physical location or device from the database files. In the cloud, this means a separate storage bucket or region.

Migrating Within the Same Cloud vs. Cross-Cloud

  • Within the Same Cloud: This is the easiest. The network latency is minimal, and you can use private IP addresses and dedicated network connections to ensure security and speed. Cloud providers’ native migration services are optimized for this.
  • Cross-Cloud: Migrating from AWS to GCP, or Azure to GCP, introduces network egress costs and requires careful security.
    • Cost: You’ll be hit with egress charges from the source cloud provider.
    • Security: You must set up secure tunnels (VPNs, private links) between the cloud providers to prevent data from traversing the public internet.
    • Tooling: Cloud provider DMS services often have features for this (e.g., AWS DMS can write to a different cloud’s S3-compatible storage), or you’ll rely on third-party tools like Qlik Replicate or self-managed solutions like Debezium.

Final Thoughts: Don’t Panic, Plan.

Live OLTP migration is a marathon, not a sprint. The key is to run a pilot migration on a dev or staging environment. Test your rollback plan. And always, always communicate with your stakeholders. A seamless migration is a testament to meticulous planning, deep technical knowledge, and the power of continuous replication.

Resources

Google Cloud

Amazon Web Services (AWS)

Microsoft Azure

Database-Specific Replication Technologies

MongoDB