Data migration

From Useful Data
Jump to: navigation, search
Definition
The process of transferring data from one system to another to seamlessly support business processes.
Principles
Data migration is of strategic importance. The applications, processes and services are all there to transform and manage the data. The data is the business.
Complexity is more significant than volume. Scale of volume is almost immaterial.
Reason
transfer to a new computer system
merger of business units
Process
Determine and define the source data structure(s)
Determine and define the target data structure(s) (which should be easy)
Map source data columns to target data columns
Determine if data cleansing is required
Determine whether manual or automated transfer is appropriate
Determine a test
Determine the time that will be required
Plan when the migration and test will occur, and a back-out plan
Tools
Data Migration Project Checklist
Methods
ETL / APIs / transfer disks / export & import / file type conversion / OCR / digitising /
Best Practice
Be realistic - can it be done in the time available?
Data migration activity needs to start with the programme to prevent delays.
A data quality framework is the first requirement.
Regarding data cleansing, be sure about what the acceptance criteria on data quality are.
A data migration testing strategy is required.
A data migration go-live strategy is required.
Good Footings
Planning
Data Architecture - get the list of Legacy Data Stores under change control.
Does everyone on the programme know where to go to see a full list of the data sources?
Is it easy to see which data sources in the plethora that are available to you contain which of your Conceptual Entities?
Is it easy to find out which data sources will be expected to have matching Conceptual Entities for the migration (e.g. can you see which application within billing, has to have a matching customer reference to which application with sales)?
Do you have an agreed Legacy Data Store Model against which each candidate legacy data store can be measured for consistency?
Are you performing formal Landscape Analysis or are you going for the "Poke And Hope" approach (i.e. throw stuff at the target and see what fails)?
Data Quality Rules - get the list of data issues under change control and get them into your Data Quality Rules process.
In panic mode the number of issues can escalate rapidly but what we need at this stage is focus and prioritisation. Most projects only track the really big issues on an issue register leaving hundreds of smaller issues to go unnoticed. These issues can rapidly eat up project hours. Get each data issue logged as a data quality rule so that the relevant Technical Systems Expert and Business Domain Expert are closely involved and working together for a solution.
Key Data Stakeholders - these people are essential and are often lacking on a failing migration.
These people can help you prioritise and resolve the data issues and you must tactfully convince them to take responsibility for the data in their area. This is not easy if they have been used to being consulted but not having any responsibility to the programme for delivery.
Delivery - focus on consulting, not fixing.
In panic mode there is often the temptation to fix every issue that arises. Stop. Accept that you have missed important data gaps and begin to educate sponsors and stakeholders that you will be delivering the project iteratively from this point on using a prioritised process.
Do the do-able – your analysis will tell you where you are going wrong but at this point you can’t hope to fix everything.
It is just not technically or politically feasible. Concentrate on iterative increments. Accept that knowing who the Key Data Stakeholders are and getting them to accept their responsibilities are two different things. Using the Data Quality Rules process will bring them into the project. You will not at this point in time be allowed to perform a formal SRP creation process, but be alert to Reasons To Say No as they come out in conversation. Never let one slide by. Always ask what it would take to close it out. Record it. Build up trust.
Activities
Prioritisation - start prioritising faults and create realistic estimates of how long these will take to resolve.
Data quality prioritisation is also critical, we always say "No enterprise wants, needs or will pay for perfect quality data" however in the rush to recover the project the temptation to load data will be great. If your data is sub-standard, how will it be corrected in a later iteration? How will it affect the load process? Have these risks been communicated to the Data Stakeholders? Your project may not necessarily need perfect quality data but you do need to understand and communicate the risks.
Change control - tighten up change control and configuration management procedures.
The fire-fighting mentality of failing projects has to be transformed into a stakeholder-driven release plan with clearly defined deliverables that the business can plan for.
Finishing
Post-migration house-keeping duties - in a well-managed migration there are a number of activities that should have been completed as a matter of course.
System Retirement Policies - the best solution is to implement a system retirement policy from the outset and get this logged on the programme issue log so it has full attention.
If you have no system retirement policies in place then you will certainly need to exercise some political sensitivity as system owners may be antagonistic if they've not been involved.
Key Business Data Areas - chances are you've now got a lack of knowledge on exactly where your consistency problems exist between data stores.
It's unlikely you will be given sufficient time to completely re-plan and do this correctly but certainly look to address consistency issues across your legacy data areas in subsequent iterations.
Reality Check - it's unlikely you will have sufficient time to analyse the accuracy of your migrated data compared to business reality. However, that doesn't mean it should be forgotten.
Use the legacy data stores and business experts you found in the stabilisation phase to help you corroborate the data. Ensure the data stakeholders are responsible for prioritising remedial work post implementation and start to transition your data quality rules framework (set up in the previous phases) over to the enterprise for ongoing data quality management
Related Topics
Data Cleansing
Data validation
ETL
Data archiving
Automated migration
Phases
data discovery
data profiling
extraction
cleansing
transformation
validation
load
verification
Attributes
The data migration phases (design, extraction, cleansing, load, verification) for applications of moderate to high complexity are commonly repeated in several cycles before the new system is deployed.
Different data sets warrant different approaches; it is unlikely one tool will do all the job.
Categories
Storage - move the data from one medium to another
Database - from one database vendor to another database vendor or from version to version. Can be straightforward.
Application - from one vendor's product to another. Typically, or ideally, done using each application's API to get the data out and in.
Business process - can be straightforward when internal, or horrid when caused by mergers.
Questions
Do we migrate legacy data?
If so, should we bring all the data or just part of it?
If just part of the data, how to we determine which to bring? E.g. by last update date, or certificate date or application date or something else?
If we choose to select by date, what should the cut-off be? That is, how current must data be to be migrated?
What do we do with data we do not want to migrate?
Do we need a data quality audit first?
Should we bring the data over to the new system?
Are there any columns we will lose on migration?
Will it be cheaper to do some migration manually?
What is the risk of not migrating a data item?
What is the cost of migrating a data item?
What is the use of migrating a data item?
Do we archive, delete or emulate?
Do we only migrate data that supports processes?
Tests on source data
Data Integrity
Data Completeness
Null Values
Default Values
Business Rules
Date Formats
Risks
data loss
source data may be in undocumented office automation files
source data may be in user-developed or undocumented linked Excel spreadsheets or Access databases
a need for downtime
the temptation for a Big Bang transfer
security - is data cleansing (for example) going to be done off-shore? How will security of the data be guaranteed?
data migration projects have a high rate of over-run and failure
Risk Assessment
look at:
* Data Architecture
* Business Engagement
* Programme Governance
* Policies
* Migration Delivery
* System Retirement policies
* Key Data Stakeholder Analysis
* Data Quality Rules
See also
data extraction
data loading
data loading verification
metadata
testing
Resources
* Data Migration Pro
* BCS "Johny's Data Migration Blog"