Migrating On-Prem Apps to AWS: A Real Architecture Walkthrough

· 9 min read ·
AWSArchitecture

Most migration guides stop at “lift and shift vs. re-architect.” This one skips the theory and focuses on the decisions to make before, during, and after moving workloads from an on-premises data centre to AWS.


1. Pre-Migration Assessment Checklist

Before touching a single EC2 instance, a clear inventory of what is being moved and why is needed. Skipping this phase is the single most common reason migrations stall mid-flight.

Application Discovery

Infrastructure & Performance Baseline

Data & Compliance

Readiness Score

After the checklist, assign each application a migration readiness score across four axes:

AxisQuestions to answer
ComplexityNumber of dependencies, custom kernel modules, legacy protocols (IPX, NetBIOS)
RiskRevenue impact of downtime, regulatory exposure
EffortCode changes needed, team familiarity with AWS
Business valueCost savings, performance gains, agility unlocked

Low complexity + high business value apps go in Phase 1. High risk + high effort apps go last, or get a parallel-run strategy.


2. Migration Phases

Trying to migrate everything at once is how projects fail. Phasing builds confidence and institutional knowledge before touching critical systems.

Phase 0 - Foundation (Weeks 1–4)

This phase has no application migrations. It is purely infrastructure setup.

Phase 1 - Low-Risk Workloads (Weeks 5–10)

Target: stateless, non-customer-facing, or non-critical applications.

Good candidates: internal tools, batch processing jobs, dev/staging environments, monitoring agents, log shippers, internal wikis.

Goal at this phase is operational learning - the team is figuring out CloudWatch alarms, IAM policies, security group rules, and deployment patterns. Mistakes here have low blast radius.

Pattern: Rehost (lift-and-shift) using AWS MGN (Application Migration Service). MGN installs an agent on the source server, continuously replicates the disk to a staging area in AWS, and then it launches a test instance before cutover. Cutover time is typically under 30 minutes.

Phase 2 - Tier-2 Production (Weeks 11–20)

Target: production workloads that are important but not the most revenue-critical.

Good candidates: internal APIs, secondary databases, background workers, reporting services.

At this phase, start introducing managed services where it makes sense:

Important: Do not re-architect and migrate at the same time. If swapping MySQL for Aurora, migrate first (rehost), stabilise, then re-platform in a subsequent sprint. Combining both changes makes rollback nearly impossible.

Phase 3 - Tier-1 / Mission-Critical (Weeks 21–30+)

Target: the systems that would wake up the CTO at 3am if they went down.

These require a parallel-run or blue/green strategy:

  1. Replicate the database to AWS using AWS DMS (Database Migration Service) with ongoing replication enabled.
  2. Stand up the application in AWS, pointed at the replicated database.
  3. Route a small percentage of traffic to the AWS environment (Route 53 weighted routing or ALB canary rules).
  4. Monitor error rates, latency, and database lag for at least one full business cycle (usually a week).
  5. Shift 100% of traffic, then stop replication, then decommission the on-prem instance.

Rollback is straightforward as long as replication is running - flip the DNS weight back. Once replication stops, rollback becomes a restore-from-backup operation.

Phase 4 - Decommission & Optimise

After all workloads are in AWS, decommission on-prem hardware on a defined schedule. Do not keep on-prem “just in case” indefinitely - it creates an implicit expectation of rollback that team will be tempted to invoke for bad reasons.

Once decommissioned, revisit right-sizing with real AWS Cost Explorer data, move suitable workloads to Savings Plans or Reserved Instances, and evaluate which services warrant re-architecting (Lambda, containers, serverless databases).


3. Networking Pitfalls

Networking is where most migrations accumulate unplanned work. These are the issues that come up repeatedly.

Overlapping CIDR Blocks

The most preventable problem. If the on-prem network uses 10.0.0.0/8 and then engineers created a VPC with 10.0.0.0/16, VPC peering and Transit Gateway routing will break with no error - traffic will simply route to the wrong destination.

Fix: Audit all RFC 1918 ranges in use before creating a single VPC. Reserve a dedicated non-overlapping CIDR range for AWS (e.g., 172.16.0.0/12 if on-prem owns all of 10.x.x.x). Plan for future VPCs - a /16 per VPC with a /8 supernet reserved for AWS is a reasonable starting point.

Security Groups Are Not Firewalls

On-premises teams are used to perimeter firewalls with stateful rules, IP allowlists, and explicit deny rules. Security groups are stateful but allow-only - there is no explicit deny rule. Network ACLs add stateless deny capability but are easy to misconfigure because they are evaluated in order by rule number.

The common mistake: migrating a firewall ruleset literally into security groups, including rules like “deny all from 0.0.0.0/0.” That rule does nothing in a security group - the default deny is implicit and cannot be overridden by adding an explicit deny entry.

Fix: Model security groups around application tiers (web, app, db) and allow only the specific ports each tier needs from the tier that calls it. Reference security group IDs instead of IPs wherever possible - this survives instance replacement and Auto Scaling.

DNS Resolution Between On-Prem and VPC

Applications that resolve hostnames using on-prem DNS servers will break when migrated to a VPC, because the VPC’s Route 53 Resolver handles .aws.internal hostnames, and the on-prem DNS has no knowledge of them. The reverse is also true: on-prem DNS cannot resolve Route 53 private hosted zone records.

Fix: Configure Route 53 Resolver inbound and outbound endpoints. Outbound endpoints forward queries for on-prem domains to on-prem DNS. Inbound endpoints let on-prem DNS forward queries for AWS private zones to Route 53. This bidirectional setup is required before any split-DNS app can function correctly after migration.

NAT Gateway Costs Surprising Teams

On-premises, east-west traffic between application tiers is free. In AWS, traffic that routes out through a NAT Gateway (e.g., a private subnet instance calling an S3 bucket via the public endpoint) incurs NAT Gateway data processing charges in addition to the data transfer charge. At scale this adds up quickly and is completely avoidable.

Fix: Use VPC endpoints (Gateway endpoints for S3 and DynamoDB are free; Interface endpoints for other services have an hourly charge but eliminate NAT costs at volume). Audit NAT Gateway bytes-processed CloudWatch metric after Phase 1 to catch unexpected traffic patterns before they become a surprise on the bill.


Closing Thoughts

The technical work of a migration is generally straightforward. The hard parts are the organisational ones: getting accurate dependency data, aligning on a cutover window with multiple stakeholders, and resisting the pressure to migrate everything at once to “get it done.”

Phase exist to create checkpoints. Each completed phase gives real operational experience in AWS, a reduced on-prem footprint, and a smaller blast radius for the next one. Move methodically, instrument everything from day one, and treat the decommission date as a hard deadline - not a goal that shifts indefinitely to the right.

Further Reading