Having involved in 2 data migration projects, a big one and a small one, this is the approach that has worked using scrum:

  • Balance depth and width
  • Don’t go to deep as not all data is as important as other.
  • DEPTH is harder than WIDTH.
  • If you cannot measure the quality of the data you have migrated at the end of the sprint, you have been too deep or/and too wide (do not trade of quality for quantity): this is key!

Let’s assume for a project you have ten of thousands of “customers* to move from one source of data made of multiple DBs where business rules have evolved and no all data are “compliant” against the rules added (example: password protection requires special characters now). For each customer, loads of data needs to be migrated. You have split population into small populations.

Sprint 1: Simplicity – Migrate successfully a tiny population using few data – Go for smaller.

  • Select a small population and migrate minimum amount of data, reducing size of the population if needed. Example:
    • Name, first name and date of birth
    • Put automated tests in place (“count” etc)
    • Put data quality reporting in place with CLEAR information why some “customers” were not migrated.

Of course, there isn’t much value yet because you may not be able to use such data in your product…Still keep the amount of data to be migrated small: don’t look for depth or width but for Quality, especially in the reporting and testing. Migration should be to an environment as close as possible to the production environment.

Sprint 2 – Go slowly for depth and faster for width

  • Based on sprint 1, for the chosen tiny population, add more data to be migrated: DEPTH. For instance, add addresses. DEPTH is difficult so go slowly.
  • Based on sprint 1, because you have automated your testing and your reporting, increase size of your population and continue migrating data based on the Name, first name and date of birth only: WIDTH. Don’t be too cautious: automation allows to go a bit faster (if not, review your automation).

Why? DEPTH is because you need to have some *customers* migrated to test your new application. WIDTH: because of quality of the data, performance, memory usage, etc and because it should be easier than going for DEPTH

Sprint 3: Iterate

Keep focusing on DEPTH and WIDTH: do not add more data to “WIDTH” until you have covered a first small population you would be happy to migrate. Keep focusing on DEPTH too.

Conclusion:

From my experience, migration can become a nightmare when the quality of data migrated is not under-controlled: you go for depth with a big population and struggle to measure anything, worse you are flooded with poor data. Baby steps are critical here: quality should not be traded-off.

 

Data Migration Example

Advertisements