Why Slowly Changing Dimensions (SCD) Matter
Dimension data—like customer addresses—often changes gradually over time. In many business scenarios, you need both historical and current information in a data warehouse to perform accurate analytics and meet operational needs. Slowly Changing Dimensions (SCD) provide various methods to handle these changes while balancing requirements for simplicity, storage, and historical tracking.
Real-World Use Case: Customer Address Changes
Consider a retail business that:
1. Requires historical reporting of a customer’s address at the time of each purchase.
2. Needs the current address for shipping and customer support.
An SCD strategy helps manage these dual requirements by determining how changes to dimension data are recorded over time—whether you overwrite data, keep a full history, or store only limited historical snapshots.
The Three Core SCD Types
1. SCD Type 1: Overwrite
Definition: Overwrites old dimension data with new data; no history is preserved.
Use Case: When only the latest value matters (e.g., a change in email address where historical versions are not relevant).
Note: While there are other types of SCD (e.g., Type 0, 4, and 6), this article will focus on the most commonly used types: 1, 2, and 3.
1. No Historical Columns: Only columns for the current state exist (e.g., email, address).
2. Overwrite Mechanism: If John moves or changes email, the old data is simply replaced.
2. SCD Type 2: Historical Tracking
Definition: Preserves all historical records by adding a new row whenever a change occurs.
Use Case: Ideal when you must track an entity’s data changes over time (e.g., a customer’s address history for compliance or detailed analytics).
1. customer_sk (Surrogate Key): The primary key for each versioned record.
2. customer_id (Natural Key): Identifies the entity in the source system.
3. Dates & Flags: start_date, end_date, and current_flag track which version is current and when each version was valid.
Type 2 Implementation in Databricks:
Recommended by LinkedIn
3. SCD Type 3: Limited History
Definition: Stores limited history in the same row, typically adding columns like previous_address.
Use Case: When you only need to track the immediate past and the current dimension value (e.g., last known address before the current one).
1. current_address: Overwrites with the newest address.
2. prev_address: Retains the most recent prior address. No additional changes are stored beyond that.
Choosing the Right SCD Strategy
• SCD Type 1 (Overwrite)
• Pros: Simpler design, minimal storage
• Cons: Loses all history
• Ideal For: Cases where past values do not impact analysis or compliance
• SCD Type 2 (Full History)
• Pros: Maintains complete historical records
• Cons: More complex, requires more storage
• Ideal For: When historical accuracy is critical (regulatory or deeper time-based analytics)
• SCD Type 3 (Limited History)
• Pros: Captures the most recent change without the complexity of Type 2
• Cons: Only stores a limited snapshot (often just one previous value)
• Ideal For: Situations requiring minimal but still relevant historical context
Extended Considerations
• SCD Type 0: Data remains static (no changes permitted).
• Hybrid Strategies: You can combine types (e.g., Type 2 + Type 3) for more nuanced scenarios.
• Performance & Storage: Type 2 can grow large quickly; plan for indexing, partitioning, or archiving.
• Surrogate Keys: Commonly used in Type 2 to differentiate multiple records belonging to the same business entity.
• ETL/ELT Frameworks: Tools like Databricks, dbt, or Snowflake Stored Procedures can automate or simplify SCD implementations.
Conclusion
Slowly Changing Dimensions are indispensable for accurately managing evolving dimension data. By choosing and implementing the appropriate SCD type—1, 2, or 3—you can tailor your data warehouse to either overwrite data, maintain full historical snapshots, or store a limited history of dimension changes. This flexibility ensures that your data remains reliable and relevant, aligning perfectly with your organization’s analytical and operational requirements.
Lead Fullstack Engineer | Typescript Software Engineer | Nestjs | Nodejs | Reactjs | AWS
2moExcellent overview of SCD types and considerations! This is a helpful guide for anyone designing or maintaining a data warehouse.
Senior Backend NodeJS | Speaker at Google GDG for AI | Kotlin | MVVM | Fastify | Coroutines | Koin | SOLID | Unit Tests | Instrumented Tests | GraphQL | Knex.js
2moGreat post! Thanks for sharing!
Back End Engineer | Software Engineer | TypeScript | NodeJS | ReactJS | AWS | MERN | GraphQL | Jenkins | Docker
2moThanks for sharing 🚀
Analytics Engineer | Data Engineer | Data Analyst | Business Data Analyst
2moGreat point! 🌟 Handling Slowly Changing Dimensions (SCD) is a fundamental part of maintaining both historical and current data in a data warehouse, especially when you need to track changes in data that evolve over time, like customer addresses.
Senior Software Engineer | Backend-Focused Fullstack Developer | .NET | C# | Angular | TypeScript | JavaScript | Azure | SQL Server
2moGreat content Miguel Angelo, thanks for sharing 🚀