|
Data Migration Methodology
Author: Vivek Salunke
Company: IT Elite Systems Pvt. Ltd.
What is data migration?
Simply put data migration is the process of moving
of data from one data source to another data source.
Generally the term is used to refer to migration of
data from legacy database application to an RDBMS. It
primarily involves rearrangement of database entities
such as physical data, schemas, view, stored procedure
and triggers.
Why migrate data?
The trend is more and more companies are moving away
from their legacy database to commonly available and
popular RDBMS system. In many of the re-engineering
projects, it becomes necessary to move away from the
existing database system to the database systems, which
provide for technological advances in the software industry.
This paper attempts to
- Identify the
complexities involved in data migration.
- Identify
the phases in the data migration project
- Define a conceptual
road map for efficient data migration
Complexities (top)
Data migration is done between two disparate database
systems and is the reason involves incompatible data
structures, inconsistent and inaccurate data. Insufficient
documentation of the existing database system complicates
the issues further.
Data Structure incompatibilities
The major need for any re-engineering project induces
data structure changes in the target database. Moreover
proprietary data structure between the source and the
target RDBMS system introduces additional complexity.
For instance a data type blob in Informix is handled
differently in Oracle.
In accurate / In-consistent data
If the source database is defined without foreign
key constraints then making the data "clean" in the
target database becomes a big task in itself. At the
same time we have to make sure that live data is not
lost because of in-accuracies in the source database.
In-adequate documentation
This is mostly true in old legacy systems where the
systems are not documented enough for the database migration
team to do justice to the data migration task. The lack
of documentation induces re-work after the proto-type
stage of migration is over. Before we can even think
about moving the first element from one system to another,
we need to completely understand both environments.
This understanding cannot stop at the data level. Full
understanding of the entire architecture, in the As-Is
and To-Be worlds, is required for a complete and accurate
migration
Size of the source Database
The size of the source database also plays a major
in the data migration plan. The resultant road map for
data migration should take into consideration the size
to calculate the time and temporary space required for
migration.
Database downtime constraints
Many of the mission critical applications of the enterprises
require that the application is up and running 24x7.
In such scenario it is very important that the migration
effort is time bound.
Data Migration
Phases (top)
A logical process flow comprising distinct phases
for key activities such as strategy, analysis, design,
development, testing and implementation, is essential
to efficient and trouble-free data migration. The ideal
data migration project plan should be broken down into
phases that mirror the overall project development phases.
These phases are: -
| Phase |
Objectives |
| Strategy |
Determine whether or not the objectives
documented in the pre-strategy phase are achievable. |
| Analysis |
Perform a complete examination in
order to make a checklist of the legacy data elements
that we know need migrating. |
| Design |
Map the legacy data elements to the
columns. |
| Development |
Develop data migration code or write
scripts using the specific data migration tool chosen
during the analysis phase. |
| Test/Implementation |
Deals with the semantics of the scripting
language used in the transformation effort, specifically
the logical and physical errors. Identify and resolve
logical errors. |
Strategy Phase
Data migration should be treated as a project in itself
and not a small part of the application re-engineering
project. The project milestones and the timelines for
the migration projects should be outlined in the plan.
The activities should include tasks such as database
layout, design, development, testing, piloting the application,
bench marking, documentation, and actual migration.
Following activities should be carried out during
strategy phase:
- Determine the source and the
target system by identifying all sources of the
incoming data.
- Determine the timeframe we can expect for the duration
of the data migration. This is especially important
if you are migrating a large 24x7 system that cannot
afford any significant downtime.
Analysis Phase
After the analysis phase you must resolve the following:
- Determine the required schema
changes including which elements are missing in the
current schema that will be needed to support the new
application functionality.
- Define the required data
scrubbing to be performed.
- Perform topology analysis and
obtain performance estimates. You will need to know
precisely how long will the data migration effort need
to run. This will add to the required downtime.
- Decide what tool/language the data migration utilities
will be performed in
Design Phase
At this point we are in a position to generate the
physical database design
Development
Phase (top)
During the development phase, the actual data migration
code is developed, or the scripts are written if a specific
data migration tool was chosen during the analysis phase.
Generally the following issues are addressed:
- Handling data irregularities
and exceptions during transformations (such as
incompatible data types). Ideally, you should have
your tool create a data irregularities and integrity
violations report since the source data will need to
be corrected by the end users or the legacy system
administrators.
- Maximizing code efficiency through increasing the
degree of parallelism and performing tuning. Again
remember that the timeframe for the migration can
be quite tight.
Test/Implementation Phase
The Test phase is where we identify and resolve logical
errors. The first step is to execute the mapping. Even
if the mapping is completed successfully, we must still
identify:
- The number records we expect
this script to create.
- Whether or not the correct
number of records got created, and if not, why?
- Was the data loaded into the
correct fields?
- Was the data correctly formatted?
The ultimate deliverable of the implementation phase
is the completion of the data migration effort. However,
you must remember that the data migration might not
be a one-time effort: you may end up doing cyclical
iterations that can spread over a period when the legacy
data would be continually fed into the new system.
Conclusion
Data Migration is a necessary evil, but not an impossible
one to conquer. Data migration can make or break a project.
The key is to prepare for it very early on, and monitor
it carefully throughout the process. Project timelines
tend to become more rigid as time passes, so it really
makes sense to meet migration head on. Devoted teams
with a clearly defined project plan, from project inception,
and armed with automated tools where applicable, is
indeed the formula to success.
page
top
|