Bridging the Culinary Gap
data:image/s3,"s3://crabby-images/886a8/886a8153b30911820f30ed7c8459330cf276df04" alt="Bridging the Culinary Gap"
A Delicious Data Journey with ETL
February 11, 2025data:image/s3,"s3://crabby-images/8ce59/8ce59feb33646a86f5c611e9a92a3b157ee23e2e" alt="Jessica Michelle Rudd, PhD, MPH"
data:image/s3,"s3://crabby-images/886a8/886a8153b30911820f30ed7c8459330cf276df04" alt="Bridging the Culinary Gap"
A Delicious Data Journey with ETL
Welcome, fellow data chefs! We're about to embark on a culinary adventure to construct a tasty ETL pipeline that will excite your customers with a feast of data-driven insights. But this isn't just a single recipe; it's a multi-course meal! Over the next few weeks, I’ll guide you through each step, equipping you with the tools and techniques to ensure the right data arrives at the right place and right on time, just like a perfectly coordinated dinner service.
The ETL Recipe: A Three-Course Meal for Your Data
An ETL (Extract, Transform, Load) pipeline is our recipe for success. It's a multi-course meal that takes raw data from various sources (our pantry), transforms it into a usable format (our kitchen prep), and loads it into a target system (our dining table) for analysis and reporting.
To prepare this data feast, we'll be using three essential kitchen tools from Google Cloud:
- Dataform: Our cookbook and kitchen for crafting precise data transformation recipes (covered here in Part 1).
- Composer/Airflow: Our sous chef for orchestrating the cooking process, ensuring each dish is prepared in the right order and at the perfect temperature (covered in Part 2).
- Dataplex: Our pantry organizer, keeping our ingredients fresh, labeled, and easily accessible (covered in Part 3).
- In part 4, we’ll build a full E2E pipeline meal service.
Let's fire up the stoves and get cooking!
Dataform: Your Recipe for Data Deliciousness
Now that we've set the table for our ETL feast, it's time to open our cookbook: Dataform. Imagine it as your guide to crafting the perfect recipes that turn raw ingredients (your data) into savory dishes (refined, insightful datasets). No more messy countertops or scattered notes – Dataform brings order and precision to your data kitchen.
Why Choose Dataform as Your Head Chef?
Think of the common kitchen nightmares: burnt dishes, missing ingredients, and chaotic recipes scribbled on napkins. Dataform helps you avoid these culinary disasters, ensuring your data transformations are a recipe for success.
Here's how Dataform elevates your cooking game:
- Modular Recipes: Break down complex data transformations into manageable recipes. Just like a master chef separates the mise en place, you can create reusable modules for different data dishes. Need to chop vegetables for multiple recipes? Dataform lets you create a "chop veggies" module and reuse it as needed, keeping your kitchen efficient and organized.
- Version Control and Collaboration: Ever wished you could undo a culinary misstep? Dataform's version control is like a "rewind" button for your recipes. Track changes, collaborate with fellow chefs, and revert to previous versions effortlessly. No more worries about over-salting the data or accidentally deleting a crucial step.
- Taste Testing and Quality Control: Dataform's built-in testing is like a taste test for your data recipes. Ensure your transformations are perfectly seasoned and free of errors. This proactive approach prevents serving up "spoiled" data to your stakeholders.
Clear Instructions and Kitchen Notes: Dataform encourages clear documentation within your recipes. Just like a well-written cookbook, your transformations become easy to understand and follow. Imagine having detailed instructions and helpful notes for each recipe, making it a breeze for anyone to replicate your data delicacies.
Dataform in Action: A Taste of Transformation
Let's see Dataform in action with a simple recipe.
Setting Up Your Kitchen (Creating a Dataform Repository)
Every master chef needs a well-stocked kitchen. In Dataform, this starts with creating a repository to store your recipes and ingredients.
- Create a Repository: In the Dataform UI, click "Create Repository" and give it a descriptive name, like "sales_analytics." This repository will be the central hub for all your Dataform projects related to sales data.
- Connect to Version Control: To keep your recipes organized and collaborate with other chefs, connect your repository to a version control system like Github. This allows you to track changes, share your work, and easily revert to previous versions if needed. Think of it as having a backup recipe book in case you accidentally spill something on your main one.
Setting Up Your Workspace
Before we start cooking, we need a clean and organized workspace. In Dataform, a workspace is like a dedicated countertop where you can experiment with your recipes without messing up the main kitchen.
- Navigate to your existing project repository.
- Click on your repository.
- Click Create development workspace.
- In the Create development workspace window, do the following:
- In the Workspace ID field, enter a descriptive feature branch name -Workspace Ids can only include numbers, letters, hyphens or underscores.
- Click Create development workspace.
- Click on your new workspace.
- Click Initialize workspace.
data:image/s3,"s3://crabby-images/513b0/513b01ea4674e5bdbf1061f8769346106656878d" alt=""
Crafting a Recipe (Creating a Table)
Now that your workspace is prepped, let's craft a recipe for our data transformation. In Dataform, this means defining a new table using a SQLX file.
- Create a New File: In the "Definitions" directory of your workspace, click the "+" dropdown and choose "Create File." Name your file something descriptive, like "sales_summary.sqlx."
- Write Your Recipe: In the new file, you'll see it's divided into two main parts:
- The config Block: This is where you provide metadata about your recipe, like the type of dish you're making (table or view), the schema (where it will be served), the name of the dish, and a description of each ingredient (column). It's like the header of your recipe card, providing essential information at a glance.
- The SQL Query: This is the heart of your recipe, containing the actual SQL code that transforms your raw ingredients into a finished dish. It's like the step-by-step instructions on your recipe card, guiding you through the cooking process.
- Here's an example of what your sales_summary.sqlx recipe might look like:
This recipe defines a new table called sales_summary that will store our calculated sales metrics.
Taste Testing and Refining (Testing and Executing Code)
Just like a chef tastes their creations before serving them, you need to test your Dataform recipes to ensure they produce the desired results. Dataform provides two ways to do this: "Run" and "Start Execution."
- "Run" - A Quick Taste Test: Think of "Run" as a quick taste test. It executes your SQL query without making any permanent changes to your data. This is useful for checking if your logic is correct and the results are as expected. It's like dipping a spoon into your soup to check the seasoning before serving it.
- "Start Execution" - Serving the Final Dish: "Start Execution" is like plating your dish and serving it to your guests. It not only runs your SQL query but also creates or updates the table or view defined in your config block. This is the option you'll use when you're ready to make your data transformation official.
data:image/s3,"s3://crabby-images/a149d/a149dfd198e680b300822ac21f8d11b435a3d1d7" alt=""
How to Test and Execute:
- Edit Your SQLX File: Open your sales_summary.sqlx file in the Dataform UI.
- Test with "Run": Click the "Run" button to execute your query and preview the results. This won't affect any existing tables or views.
- Execute with "Start Execution": Once you're satisfied with the results, click "Start Execution" —> Actions —> sales_summary to create the sales_summary table.
data:image/s3,"s3://crabby-images/ff46d/ff46dd215df8787f6253a3d3e04b058eebdafa06" alt="some-file-6c9f1f4f-eb0c-4485-903b-d82506274cbf"
- Confirm Success: Check the "Executions" tab to confirm that your code ran successfully and the table was created as expected.
data:image/s3,"s3://crabby-images/69864/69864d2a73f524c55b35fe4ef7be8ec2f367c0aa" alt="some-file-a1ee8c1e-d0be-4562-8375-2cea548e3f29"
By using both "Run" and "Start Execution," you can thoroughly test your Dataform recipes and ensure they produce delicious, high-quality data dishes.
Sharing Your Recipe (Version Control)
Once you're happy with your recipe, it's time to share it with the world! Dataform integrates with Git for seamless version control.
- Commit Your Changes: Click "Commit # changes" and add a descriptive commit message, like "Added new table for sales summary."
data:image/s3,"s3://crabby-images/db4ba/db4ba0a02fc8aee3d1bbb2eae24b1e401b024a3d" alt=""
data:image/s3,"s3://crabby-images/f5348/f5348b82b4b62bc41eb8f14cfaac0d0d50897fab" alt="some-file-7a00b2da-70f7-426c-82cd-c1ce98bb6a39"
- Push to Remote: Click "Push to Remote Branch" to push your changes to your Git repository.
data:image/s3,"s3://crabby-images/ed3c8/ed3c8d0a93f45a987cca43144e87df8e7b30cc72" alt=""
- Create a Pull Request: Open your Git repository and create a pull request to merge your changes into the main branch.
By following these steps, you can use Dataform to transform your raw data into insightful masterpieces. It's like having a sous chef, a recipe book, and a taste tester all in one!
Preparing the Ingredients for Our ETL Feast
This exploration of Dataform sets the table for our end-to-end ETL feast. In the upcoming posts, we'll introduce the other chefs in our kitchen: Composer/Airflow for orchestrating the cooking process and Dataplex for organizing our pantry and keeping our ingredients fresh. Stay tuned!
Helpful Resources
🍬 Sweet & Sour Candy (this week’s good, bad, or weird of the tech world)
🤢 ChatGPT Search can be tricked into misleading users, new research reveals - “ChatGPT Search, an AI-powered search engine that went live this month, can be fooled into generating completely misleading summaries, U.K. newspaper The Guardian has found.”
😀 AI helps ID paint chemistry of Berlin Wall murals - Italian scientists developed a neural network that, combined with handheld devices, can accurately analyze the complex chemical composition of paints used in street art, like the Berlin Wall murals. This breakthrough enables more effective conservation and restoration of such art, despite its often undocumented and ephemeral nature.
🍫 One last bite
__________________________________________________________
Imposter syndrome is a feeling, not a fact. ~Yours Truly
__________________________________________________________
This post is public, so feel free to share it.
Originally posted at: