Core IT
• Complete Reengineering Methodology
• Data Migration Methodology
• Content Management Systems
Engineering Services
• CAD Detailing Methodology
Case Studies

Business Model
Service Delivery Model
 FAQ's
 Subscribe
 Corporate Brochure

You are here: home / white papers / Core IT / data migration methodology

Data Migration Methodology

Author: Vivek Salunke
Company: IT Elite Systems Pvt. Ltd.

What is data migration?

Simply put data migration is the process of moving of data from one data source to another data source. Generally the term is used to refer to migration of data from legacy database application to an RDBMS. It primarily involves rearrangement of database entities such as physical data, schemas, view, stored procedure and triggers.

Why migrate data?

The trend is more and more companies are moving away from their legacy database to commonly available and popular RDBMS system. In many of the re-engineering projects, it becomes necessary to move away from the existing database system to the database systems, which provide for technological advances in the software industry.

This paper attempts to

  1. Identify the complexities involved in data migration.
  2. Identify the phases in the data migration project
  3. Define a conceptual road map for efficient data migration

Complexities                                                                                                        (top)

Data migration is done between two disparate database systems and is the reason involves incompatible data structures, inconsistent and inaccurate data. Insufficient documentation of the existing database system complicates the issues further.

Data Structure incompatibilities

The major need for any re-engineering project induces data structure changes in the target database. Moreover proprietary data structure between the source and the target RDBMS system introduces additional complexity. For instance a data type blob in Informix is handled differently in Oracle.

In accurate / In-consistent data

If the source database is defined without foreign key constraints then making the data "clean" in the target database becomes a big task in itself. At the same time we have to make sure that live data is not lost because of in-accuracies in the source database.

In-adequate documentation

This is mostly true in old legacy systems where the systems are not documented enough for the database migration team to do justice to the data migration task. The lack of documentation induces re-work after the proto-type stage of migration is over. Before we can even think about moving the first element from one system to another, we need to completely understand both environments. This understanding cannot stop at the data level. Full understanding of the entire architecture, in the As-Is and To-Be worlds, is required for a complete and accurate migration

Size of the source Database

The size of the source database also plays a major in the data migration plan. The resultant road map for data migration should take into consideration the size to calculate the time and temporary space required for migration.

Database downtime constraints

Many of the mission critical applications of the enterprises require that the application is up and running 24x7. In such scenario it is very important that the migration effort is time bound.

Data Migration Phases                                                                                         (top)

A logical process flow comprising distinct phases for key activities such as strategy, analysis, design, development, testing and implementation, is essential to efficient and trouble-free data migration. The ideal data migration project plan should be broken down into phases that mirror the overall project development phases. These phases are: -

Phase Objectives
Strategy Determine whether or not the objectives documented in the pre-strategy phase are achievable.
Analysis Perform a complete examination in order to make a checklist of the legacy data elements that we know need migrating.
Design Map the legacy data elements to the columns.
Development Develop data migration code or write scripts using the specific data migration tool chosen during the analysis phase.
Test/Implementation Deals with the semantics of the scripting language used in the transformation effort, specifically the logical and physical errors. Identify and resolve logical errors.

Strategy Phase

Data migration should be treated as a project in itself and not a small part of the application re-engineering project. The project milestones and the timelines for the migration projects should be outlined in the plan. The activities should include tasks such as database layout, design, development, testing, piloting the application, bench marking, documentation, and actual migration.

Following activities should be carried out during strategy phase:

  • Determine the source and the target system by identifying all sources of the incoming data.
  • Determine the timeframe we can expect for the duration of the data migration. This is especially important if you are migrating a large 24x7 system that cannot afford any significant downtime.

Analysis Phase

After the analysis phase you must resolve the following:

  • Determine the required schema changes including which elements are missing in the current schema that will be needed to support the new application functionality.
  • Define the required data scrubbing to be performed.
  • Perform topology analysis and obtain performance estimates. You will need to know precisely how long will the data migration effort need to run. This will add to the required downtime.
  • Decide what tool/language the data migration utilities will be performed in

Design Phase

At this point we are in a position to generate the physical database design

Development Phase                                                                                              (top)

During the development phase, the actual data migration code is developed, or the scripts are written if a specific data migration tool was chosen during the analysis phase.

Generally the following issues are addressed:

  1. Handling data irregularities and exceptions during transformations (such as incompatible data types). Ideally, you should have your tool create a data irregularities and integrity violations report since the source data will need to be corrected by the end users or the legacy system administrators.
  2. Maximizing code efficiency through increasing the degree of parallelism and performing tuning. Again remember that the timeframe for the migration can be quite tight.

Test/Implementation Phase

The Test phase is where we identify and resolve logical errors. The first step is to execute the mapping. Even if the mapping is completed successfully, we must still identify:

  • The number records we expect this script to create.
  • Whether or not the correct number of records got created, and if not, why?
  • Was the data loaded into the correct fields?
  • Was the data correctly formatted?

The ultimate deliverable of the implementation phase is the completion of the data migration effort. However, you must remember that the data migration might not be a one-time effort: you may end up doing cyclical iterations that can spread over a period when the legacy data would be continually fed into the new system.

Conclusion

Data Migration is a necessary evil, but not an impossible one to conquer. Data migration can make or break a project. The key is to prepare for it very early on, and monitor it carefully throughout the process. Project timelines tend to become more rigid as time passes, so it really makes sense to meet migration head on. Devoted teams with a clearly defined project plan, from project inception, and armed with automated tools where applicable, is indeed the formula to success.

                                                                                                                          page topˆ

  © 2003-2005 IT Elite Systems. All rights reserved. Terms & Use | Contact | Sitemap