Data migration
Jump to navigation
Jump to search
- Definition
- The process of transferring data from one system to another to seamlessly support business processes.
- Principles
- Data migration is of strategic importance. The applications, processes and services are all there to transform and manage the data. The data is the business.
- Complexity is more significant than volume. Scale of volume is almost immaterial.
- Reason
- transfer to a new computer system
- merger of business units
- Process
- Determine and define the source data structure(s)
- Determine and define the target data structure(s) (which should be easy)
- Map source data columns to target data columns
- Determine if data cleansing is required
- Determine whether manual or automated transfer is appropriate
- Determine a test
- Determine the time that will be required
- Plan when the migration and test will occur, and a back-out plan
- Methods
- ETL / APIs / transfer disks / export & import / file type conversion / OCR / digitising /
- Best Practice
- Be realistic - can it be done in the time available?
- Data migration activity needs to start with the programme to prevent delays.
- A data quality framework is the first requirement.
- Regarding data cleansing, be sure about what the acceptance criteria on data quality are.
- A data migration testing strategy is required.
- A data migration go-live strategy is required.
- Good Footings
- Planning
- Data Architecture - get the list of Legacy Data Stores under change control.
- Does everyone on the programme know where to go to see a full list of the data sources?
- Is it easy to see which data sources in the plethora that are available to you contain which of your Conceptual Entities?
- Is it easy to find out which data sources will be expected to have matching Conceptual Entities for the migration (e.g. can you see which application within billing, has to have a matching customer reference to which application with sales)?
- Do you have an agreed Legacy Data Store Model against which each candidate legacy data store can be measured for consistency?
- Are you performing formal Landscape Analysis or are you going for the "Poke And Hope" approach (i.e. throw stuff at the target and see what fails)?
- Data Quality Rules - get the list of data issues under change control and get them into your Data Quality Rules process.
- In panic mode the number of issues can escalate rapidly but what we need at this stage is focus and prioritisation. Most projects only track the really big issues on an issue register leaving hundreds of smaller issues to go unnoticed. These issues can rapidly eat up project hours. Get each data issue logged as a data quality rule so that the relevant Technical Systems Expert and Business Domain Expert are closely involved and working together for a solution.
- Key Data Stakeholders - these people are essential and are often lacking on a failing migration.
- These people can help you prioritise and resolve the data issues and you must tactfully convince them to take responsibility for the data in their area. This is not easy if they have been used to being consulted but not having any responsibility to the programme for delivery.
- Delivery - focus on consulting, not fixing.
- In panic mode there is often the temptation to fix every issue that arises. Stop. Accept that you have missed important data gaps and begin to educate sponsors and stakeholders that you will be delivering the project iteratively from this point on using a prioritised process.
- Do the do-able – your analysis will tell you where you are going wrong but at this point you can’t hope to fix everything.
- It is just not technically or politically feasible. Concentrate on iterative increments. Accept that knowing who the Key Data Stakeholders are and getting them to accept their responsibilities are two different things. Using the Data Quality Rules process will bring them into the project. You will not at this point in time be allowed to perform a formal SRP creation process, but be alert to Reasons To Say No as they come out in conversation. Never let one slide by. Always ask what it would take to close it out. Record it. Build up trust.
- Data Architecture - get the list of Legacy Data Stores under change control.
- Activities
- Prioritisation - start prioritising faults and create realistic estimates of how long these will take to resolve.
- Data quality prioritisation is also critical, we always say "No enterprise wants, needs or will pay for perfect quality data" however in the rush to recover the project the temptation to load data will be great. If your data is sub-standard, how will it be corrected in a later iteration? How will it affect the load process? Have these risks been communicated to the Data Stakeholders? Your project may not necessarily need perfect quality data but you do need to understand and communicate the risks.
- Change control - tighten up change control and configuration management procedures.
- The fire-fighting mentality of failing projects has to be transformed into a stakeholder-driven release plan with clearly defined deliverables that the business can plan for.
- Prioritisation - start prioritising faults and create realistic estimates of how long these will take to resolve.
- Finishing
- Post-migration house-keeping duties - in a well-managed migration there are a number of activities that should have been completed as a matter of course.
- System Retirement Policies - the best solution is to implement a system retirement policy from the outset and get this logged on the programme issue log so it has full attention.
- If you have no system retirement policies in place then you will certainly need to exercise some political sensitivity as system owners may be antagonistic if they've not been involved.
- Key Business Data Areas - chances are you've now got a lack of knowledge on exactly where your consistency problems exist between data stores.
- It's unlikely you will be given sufficient time to completely re-plan and do this correctly but certainly look to address consistency issues across your legacy data areas in subsequent iterations.
- Reality Check - it's unlikely you will have sufficient time to analyse the accuracy of your migrated data compared to business reality. However, that doesn't mean it should be forgotten.
- Use the legacy data stores and business experts you found in the stabilisation phase to help you corroborate the data. Ensure the data stakeholders are responsible for prioritising remedial work post implementation and start to transition your data quality rules framework (set up in the previous phases) over to the enterprise for ongoing data quality management
- Related Topics
- Data Cleansing
- Data validation
- ETL
- Data archiving
- Automated migration
- Phases
- data discovery
- data profiling
- extraction
- cleansing
- transformation
- validation
- load
- verification
- Attributes
- The data migration phases (design, extraction, cleansing, load, verification) for applications of moderate to high complexity are commonly repeated in several cycles before the new system is deployed.
- Different data sets warrant different approaches; it is unlikely one tool will do all the job.
- Categories
- Storage - move the data from one medium to another
- Database - from one database vendor to another database vendor or from version to version. Can be straightforward.
- Application - from one vendor's product to another. Typically, or ideally, done using each application's API to get the data out and in.
- Business process - can be straightforward when internal, or horrid when caused by mergers.
- Questions
- Do we migrate legacy data?
- If so, should we bring all the data or just part of it?
- If just part of the data, how to we determine which to bring? E.g. by last update date, or certificate date or application date or something else?
- If we choose to select by date, what should the cut-off be? That is, how current must data be to be migrated?
- If so, should we bring all the data or just part of it?
- What do we do with data we do not want to migrate?
- Do we need a data quality audit first?
- Should we bring the data over to the new system?
- Are there any columns we will lose on migration?
- Will it be cheaper to do some migration manually?
- What is the risk of not migrating a data item?
- What is the cost of migrating a data item?
- What is the use of migrating a data item?
- Do we archive, delete or emulate?
- Do we only migrate data that supports processes?
- Tests on source data
- Data Integrity
- Data Completeness
- Null Values
- Default Values
- Business Rules
- Date Formats
- Risks
- data loss
- source data may be in undocumented office automation files
- source data may be in user-developed or undocumented linked Excel spreadsheets or Access databases
- a need for downtime
- the temptation for a Big Bang transfer
- security - is data cleansing (for example) going to be done off-shore? How will security of the data be guaranteed?
- data migration projects have a high rate of over-run and failure
- Risk Assessment
- look at:
- * Data Architecture
- * Business Engagement
- * Programme Governance
- * Policies
- * Migration Delivery
- * System Retirement policies
- * Key Data Stakeholder Analysis
- * Data Quality Rules
- See also
- data extraction
- data loading
- data loading verification
- metadata
- testing
- Resources
- * Data Migration Pro
- * BCS "Johny's Data Migration Blog"