Continuing from the initial post Teradata:Data Migration Overview , I will detail about creating the extract from your source table which will then be ported further to target system. This is second amongst a series of posts related to data migration.
By this time, you must have analyzed what needs to be ported. Next, you must have also analyzed the demographics(Described in initial post). That will give you leads for this step as well as others.
Extract creation can be broadly categorized under the following:
- Select data from source DB GUI (tool).
- Copy Data in a file.
- Suitable only if the number of tables is very less (10-50).
- Migration Tool:
- This could be a tool provided by third party or the target database company.
- They can take up the task of migrating entire data from source to target.
- Suitable for generalized cases and may face a deadlock in case of specific migration requirements.
- Cost is another factor as these tools/services are licensed.
- Furthermore, you will always feel that one could have coded better than the tool suite. 🙂
- Program your migration:
- This is the approach I followed and will talk about it in detail.
- In this, you will code task of preparing the extract from database.
- It suits all requirements as the code base is controlled by you.
Coding Extract Generation Process:
Once you’ve finalized the approach to extract preparation, there is a considerable amount of effort in the days or weeks to come.
Start with making a list of distinct use cases. By use case, I mean a table falling under a specific category. I had categorized by the following:
- Has date/datetime/time data type columns?
- Has large text/blob/clob columns?
- Does data in the table contain special characters like carriage,etc.
- Is the any possibility of un-closed quotes in data?
We need the above because by doing a POC for these distinct use cases you will be able to conclude the final approach that will help your end to end implementation of data migration. The step will give you a success:failure ratio. It will help you list exceptions that need special handling.
Your extract preparation step is linked to the design of loading process.
Loading process is where you load the extract to the target environment. This linkage between extract preparation and loading is un-avoidable and is inherent due to different architectures or various database products.
Database products offer various extract loading utilities which are natively terminal commands. I will leave this here as there is a separate post that covers the loading strategy. The point to note here is that you must ensure end to end loading for all the distinct use cases identified. There might be instances where you feel that some tweaking needs to done in the extract generation step. Eg : Date Formatting , Special Characters, Business Conversions/Transformation etc.
I am not going to talk much about the code base and will be listing some code snippets. I was extracting data from Sybase ASE & Sybase IQ and hence sharing code snippets for the same.
When dealing with Sybase ASE, bcp i.e Bulk Copy command is provided for extract handling both import and export.
bcp <db_name>.<table_name> out <out_file_path> -c -t <column_delimiter> -r <row_delimiter> -S <ASE Server Name> -U <ASE_UID> -P <ASE_PWD> > <REDIRECT_LOG_FILE> bcp person.person_dtls out /home/allzhere/person_dtls.dat –c –t ‘|’ –r ‘\n’ –S DEV_SERVER –U abc_user –P password > /home/allzhere/person_dtls_extract.log
For Sybase IQ, the temporary options can be set prior to sql execution to prepare extract.
set temporary option Temp_Extract_Name1 = '/home/allzhere/extract/person.dat'; set temporary option Temp_Extract_Column_Delimiter = '|'; set temporary option Temp_Extract_Null_As_Empty = 'ON'; set temporary option Date_Format = 'YYYY-MM-DD'; set temporary option Timestamp_Format = 'YYYY-MM-DD HH:NN:SS.SSSSSS'; select * from person_dtls;
During the extraction phase, certain transformation must be taken care of. Eg : carriage return.
Sybase IQ : replace(class_desc, char(10), '') Sybase ASE : str_replace(convert(varchar(16384),class_desc),char(10),' ')
Balancing code checks should be in place i.e. match the count in the table and the extract to be sure you have not missed anything. There will always be some exceptions for which you may have not coded and realize it during migration or after.
Eg, when i was preparing an extract for a “description” field which was “text” type in sybase, i was able to fetch only first 16000 characters as that was a maximum length permitted for varchar. Conversion to varchar was required as i had to replace the special characters.
That’s it for the extraction phase. Next, will be the step where we load those extracts in our target environment.