Star vs. Snowflake

This is the final installment of our schema series. We’ve seen the Star Express and the Snowflake District. Now, it’s time for the ultimate showdown. How does Conductor Mickey decide which tracks to lay down for the “Invisible City”? In the previous posts, we explored two very different ways to organize a subway station: The Star: Fast, flat, and simple. The Snowflake: Detailed, organized, and space-saving. But in the real world of Data Engineering, you can’t always have both. You have to choose your priority: Speed or Storage? ...

February 9, 2026

The Snowflake Schema: The 'Detailed' District

In our Invisible City, as the population grows and the subway map expands, Conductor Mickey sometimes finds that a simple Star Hub isn’t enough. When the “Station” information becomes a giant, messy pile of repeating words, it’s time to move to the Snowflake Schema. If the Star Schema is the “Express Line,” the Snowflake is the “Highly Detailed District.” What is a Snowflake Schema? In the previous post, we saw how the Star Schema keeps things simple. But what happens when your data starts repeating itself? What if you have 10 million rows, and you’ve written “United States” and “North America” 10 million times? ...

February 9, 2026

The Star Schema—The 'Express' Hub

In our digital subway city, some stations are “Hubs.” They sit right in the middle, and every train line passes through them. In data engineering, if you don’t organize your stations, your subway maps will look like a plate of spaghetti. To keep things clean, we use Dimensional Modeling. The most famous layout for speed is the Star Schema. What is a Star Schema? The Star Schema is designed for one thing: Speed. It organizes your data into two distinct types of citizens: Facts and Dimensions. ...

February 9, 2026

The Medallion Architecture Journey

In our Invisible City, the subway doesn’t just move people from point A to point B. It moves them through a refinement process. When data first arrives at the station, it’s usually messy, tired, and covered in “digital mud.” We can’t let them go straight to the Mayor’s office (the Business Dashboard) looking like that! To fix this, Conductor Mickey uses the Medallion Architecture. It’s a three-stage filtration system that turns “Raw Junk” into “Pure Gold.” ...

February 8, 2026

Time Travel in Azure SQL Database

In our Invisible City subway system, we’ve talked about the “Time Machine” features in Delta Lake for big data. But what if your main terminal is built on a traditional, reliable foundation like Azure SQL Database? Usually, a database has short-term memory. If a passenger changes their destination from “Station A” to “Station B,” the database overwrites the old entry. Station A is gone forever. But sometimes, Conductor Mickey needs to know exactly what the city looked like last Tuesday at 2:00 PM. To do that without a magic wand, he uses Temporal Tables. ...

February 8, 2026

Understanding Delta Lake and Delta Tables

In our Invisible City subway system, we’ve already upgraded our shipping containers from old wooden crates (CSV) to futuristic, high-tech pods (Parquet). But a depot full of pods is still just a pile of boxes. If a train crashes while unloading, or if two conductors try to move the same crate at once, you get a “Data Crash.” To solve this, Conductor Mickey uses a special system: Delta Lake. But wait—is “Delta Lake” the same as a “Delta Table”? Let’s clear up the confusion with a trip to the Smart Depot. ...

February 8, 2026

Why Parquet is the Future of Data Shipping

In our Invisible City, if the Data Lake is the storage yard, then Parquet is the high-tech, space-saving shipping container we use to pack the goods. Most people are used to CSV files—they are like old-fashioned wooden crates. They work, but they are heavy, take up too much space, and are slow to move. Parquet is the “Flat-Pack” futuristic alternative. The Vertical Train: Why Data Engineers Love Parquet Imagine a subway train full of passengers. Each passenger has a Name, an Age, and a Destination. ...

February 8, 2026

The Depot, the Terminal, and the Modern Hub: Where Does the Data Live?

In our Invisible City, moving data is half the battle. The other half is putting it somewhere. Depending on who you ask—the engineers, the analysts, or the city planners—they’ll give you different answers on where the data should live. Let’s walk through the three main types of “storage facilities” in our city infrastructure, with Conductor Mickey as our guide. 1. The Data Lake: The Massive Storage Yard The Analogy: Imagine a giant, sprawling industrial yard at the edge of the city. It’s cheap land where we throw everything. Old train seats, crates of unsorted tickets, literal piles of raw iron, and even some lost luggage. ...

February 7, 2026

Why I Build Data Subways

If you walk through a bustling city like New York, Tokyo, or London, you see the lights, the shops, and the skyscrapers. That is the Front-end. If you look at the people making big decisions in boardrooms based on colorful charts, that is Data Analytics. But beneath their feet, there is a second city. A city of tunnels, high-voltage cables, and steel tracks. Thousands of people are moving through the dark at 60 miles per hour so that the “visible” city can function. ...

February 7, 2026