It feels good to be back in the Databricks environment! After my last adventure tackling JSON in BigQuery, I’m excited to share a solution that’s much more straightforward thanks to the powerful tools Databricks offers. Today, we’re diving into how to ingest XML files, flatten them, and handle bad records...
[Read More]
Flattening Nested JSON with BigQuery UDF and dbt Macros
A workaround with limited ETL tooling
Dealing with nested JSON data can feel like you’re trying to solve a puzzle where the pieces keep changing shapes. It’s tricky, especially when you don’t have access to heavy-duty tools like PySpark in Databricks that are built to tackle these kinds of jobs with ease.
[Read More]
Databricks Job Orchestration - Reuse Cluster and Multi-Process Jobs
Parallel Running Jobs as ADF Foreach Loop
In the last paragraph of my previous post ETL Becomes So Easy with Databricks and Delta Lake, I left a question about databricks Job Orchestration benefits and issues in ADF, I am going to introduce how do we solve it in this blog.
[Read More]
ETL Becomes So Easy with Databricks and Delta Lake
Dimension Table generation SCD Type 1 and 2
Back in old days by using SSIS, when we tried to create a data warehouse, for each dimension table will need to be setup a component in the package.
[Read More]
Visualize Complex Hierarchy and Slice with Any Node
Data Modelling and Custom Visual
I was majorly involved in Power BI deployment and United Analytics Platform investigation recently, and did a lot of documenting work in architect, general practices or guidance level, which is a good chance to help me reviewing my knowledge but nothing very new to me. Luckily I got an interesting...
[Read More]