Seamlessly Migrate Data from Teradata to AWS Using Glue
Written on
Chapter 1: Introduction to Data Migration
Migrating data from Teradata to Amazon Web Services (AWS) can be accomplished through various methods. In this guide, we will focus on utilizing AWS Glue, a fully managed Extract, Transform, Load (ETL) service, for your data migration needs.
Section 1.1: Setting Up an AWS Glue Connection
To initiate the migration process, you need to establish a connection within AWS Glue that links to your Teradata database. Here’s how to do it:
- Access the AWS Management Console and navigate to AWS Glue.
- Select "Connections" from the menu on the left.
- Click the "Add connection" button.
- Enter the necessary connection details, which include the JDBC connection URL along with your Teradata database username and password.
Section 1.2: Creating an AWS Glue Job
Next, you will set up an AWS Glue job that utilizes the connection established in the previous step to transfer your data from Teradata to AWS.
- In the AWS Glue console, click on "Jobs" in the left-hand menu.
- Click the "Add job" button.
- Name your job and choose the "Spark" ETL engine.
- Under "Data source," select "JDBC" as the source type and choose the connection created earlier.
- In the "Data targets" section, select the AWS service and the target table where you want to store the data. For instance, you could opt for Amazon S3 and create a new bucket for your data.
- Under "Job parameters," you can input any additional settings or parameters, such as a SQL query to fetch specific data from your Teradata database.
Section 1.3: Executing the AWS Glue Job
After creating your AWS Glue job, you can proceed to run it to initiate the data transfer from Teradata to AWS. Here’s how:
- In the AWS Glue console, select your job from the list.
- Click on the "Run job" button.
- Review the job settings and click "Run job" again to commence the process.
- Monitor the progress of the job within the AWS Glue console. The duration may vary based on the data size and job complexity.
Example Code
Below is a sample code snippet that you might implement in your AWS Glue job for transferring data from Teradata to Amazon S3:
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
# Create a Glue context
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
# Retrieve job parameters
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'source_db', 'source_table', 'target_bucket'])
# Extract data from Teradata using JDBC connection
jdbcUrl = "jdbc:teradata://{0}/DATABASE={1},USER={2},PASSWORD={3}".format(
args['source_host'], args['source_db'], args['source_user'], args['source_password'])
source_df = glueContext.read.format("jdbc").options(
url=jdbcUrl,
dbtable=args['source_table']
).load()
# Convert DataFrame to DynamicFrame for processing
source_dynamic_frame = DynamicFrame.fromDF(
frame=source_df,
glue_ctx=glueContext,
name="source_dynamic"
)
# Transform data as necessary
transformed_dynamic_frame = ApplyMapping.apply(
frame=source_dynamic_frame,
mappings=[
("column1", "string", "target_column1", "string"),
("column2", "int", "target_column2", "int")
]
)
# Write data to Amazon S3
s3_target_path = "s3://{0}/{1}".format(args['target_bucket'], args['target_path'])
glueContext.write_dynamic_frame.from_options(
frame=transformed_dynamic_frame,
connection_type="s3",
connection_options={
"path": s3_target_path
},
format="parquet"
)
This code performs the following tasks:
- Initializes a Glue context and retrieves job parameters.
- Utilizes a JDBC connection to extract data from Teradata and create a Spark DataFrame.
- Converts the DataFrame into a DynamicFrame for further processing.
- Applies transformations as required (in this example, mapping source columns to target ones).
- Writes the data to Amazon S3 in Parquet format.
Feel free to customize this code to fit your specific requirements, such as altering the target data format, adding further transformations, or using a different AWS service for the target.
Chapter 2: Video Resources
For a more visual approach to understanding this process, check out the following videos:
Learn how to move data from Teradata to Amazon Redshift using ETL scripts in AWS Glue.
Discover how to automate the migration of Teradata Data Warehouse to Amazon Redshift Cloud Data Warehouse.