Another and the final one in the series describing my experience of data migration process from source database to target. In the last post (Data Migration : Extraction), i detailed about the data extraction process design. This post will detail steps involved to load those extracts. Approach is a 2 step one:
- Load the core extracts to a milestone database.
- Move the data from milestone database to target database.
Why this approach : During data migration; you may come across a situations :
- The extracted column formats (eg. date/time) cannot be mapped natively to the target database.
- Security transformations that are database or product dependent.
- Custom transformations as per business that are database/product dependent.
There can be more use cases than what is specified above, hence to handle these; we should load the data in a staging/milestone database in target environment(in my case Teradata). Wherever, i had to apply any form of transformation; i used to load the content in a varchar field. This ensured that all my extracts are loaded without fail to the milestone database.
Once the data has been loaded in the milestone database; we will write views on the milestone database tables(wherever applicable) which will perform the transformations that are required before data can be migrated to the target table. This will cover all sorts of transformations as it gives a complete control in developers hand, which is a bliss. 🙂 . Few instance of transformations have been listed above.
There are various options available to load your extracts to Teradata (milestone database):
- SQL Assistant : Leverage this option if the number of tables are very less (<30) and complexity is not too high either.
- BTEQ : If size of tables is relatively small; this is a good option.
- TPT : If the size of tables both in terms of rows and columns is high; TPT is perfect as its a full suite that has options in abundance to handle your load process.
- Fast Load : This is another technique; but this was not applicable in my case.
Again, as with our previous posts, will not get into the technical details because our focus is more on the design and approach; technical can always be handled (Did i just speak like a typical business partner or a technically challenged technology manager 🙂 🙂 ) . You can search the web for the techniques mentioned above; there are good tutorials on each. However, i would like to share some of my learning which can be helpful in your implementation:
- Do checks like count, integrity,etc after each step to pause if there is a fault in any migration step.
- Keep a tab on error codes especially with TPT; make sure you do the correct error handling of various codes to ensure either complete data is loaded or none. Partial data handling has been a mess since its inception. 🙂
- When dealing with TPT; ensure the correct mode is selected as data. “insert” mode is extremely slow when compare with “update”.
- Clean up of logs/resources/locks must be done after every migration step irrespective of the status. For example, mloads , worker tables must be cleared post script execution to prevent any issues in future re-runs.
- TPT work on base tables and not views. Choosing a transformed view to load extract in milestone table will not work.
- Handle unclosed quotes while preparing extracts as TPT 14+ errors out on it.
One of the thoughts that will certainly come to your mind is to write a generic script which can work on a set of inputs like the source details and target details. The script will itself take care of migrating the data and every developer need not write the migration scripts specific to their objects. The generic code build is certainly advisable considering the number of objects that are involved during migration and the script may be leveraged in future as well. Web search on this will give considerable results to give head start on writing a generic BTEQ or TPT script.
Once you extracts have been loaded into the milestone database; all you need to do is to insert data from milestone to target via a simple insert-select query. Do remember that if any transformations need to be applied; then write those in a view above the milestone table and select from it while moving data from milestone to target.
That’s it. Data journey completed.
Feel free to reach out to us via this blog for any suggestions/queries.