Ironman's Lab

Ingesting and Flattening XML Files in Databricks

A Schema-Driven Approach with Bad Record Handling

Posted on August 17, 2024

It feels good to be back in the Databricks environment! After my last adventure tackling JSON in BigQuery, I’m excited to share a solution that’s much more straightforward thanks to the powerful tools Databricks offers. Today, we’re diving into how to ingest XML files, flatten them, and handle bad records... [Read More]

Tags: ETL Nested XML Datatbricks

Flattening Nested JSON with BigQuery UDF and dbt Macros

A workaround with limited ETL tooling

Posted on May 2, 2024

Dealing with nested JSON data can feel like you’re trying to solve a puzzle where the pieces keep changing shapes. It’s tricky, especially when you don’t have access to heavy-duty tools like PySpark in Databricks that are built to tackle these kinds of jobs with ease. [Read More]

Tags: ETL Nested JSON Bigquery UDF dbt

Databricks Job Orchestration - Reuse Cluster and Multi-Process Jobs

Parallel Running Jobs as ADF Foreach Loop

Posted on February 6, 2022

In the last paragraph of my previous post ETL Becomes So Easy with Databricks and Delta Lake, I left a question about databricks Job Orchestration benefits and issues in ADF, I am going to introduce how do we solve it in this blog. [Read More]

Tags: Databricks Delta Lake Pyspark ADF ETL Multi-processing

ETL Becomes So Easy with Databricks and Delta Lake

Dimension Table generation SCD Type 1 and 2

Posted on November 27, 2021

Back in old days by using SSIS, when we tried to create a data warehouse, for each dimension table will need to be setup a component in the package. [Read More]

Tags: Databricks Delta Lake Pyspark ADF ETL Dimension

Visualize Complex Hierarchy and Slice with Any Node

Data Modelling and Custom Visual

Posted on August 11, 2021

I was majorly involved in Power BI deployment and United Analytics Platform investigation recently, and did a lot of documenting work in architect, general practices or guidance level, which is a good chance to help me reviewing my knowledge but nothing very new to me. Luckily I got an interesting... [Read More]

Tags: PowerBI Hierarchy Data Model Node