Self reminder: When designing high-level model pipelines + dbt model structuring, keep in mind of how to deprecate models as well.
While we always thrive in building models to resolve pain-points of our stakeholders, or delivering A*-grade business analysis, it is very common that people forget to ask the questions: “when can we drop this?” or “what would be an indicator that we don’t need this anymore?”.
At the time of writing, I can think of few ways that a model can end:
- Static tables
- Refers to models that are no longer updated, yet still necessary for pipelines (eg: union old data, left join to get more attribution, etc)
- We can drop these tables once there is no more upstream
- Example: upstream system migrated to a different service, however legacy data is not migrated along. Hence as downstream “data aggregator”, we need to keep track of both sources.
- Archive and KIV
- Keep the script as backup, and do not delete the physical table yet. Usually we do this when making big changes to models, and we want to be able to revert quickly.
- Finding deleted files (or the commit that did it) is not straightforward.
- In dbt context, “keep as backup” means to move the model script out of the designated path for
models
- means that dbt will not consider that SQL/Py script as a dbt model anymore.
- Keep the script as backup, and do not delete the physical table yet. Usually we do this when making big changes to models, and we want to be able to revert quickly.
- Total removal
- Just delete the model and drop the table.
With these things structured nicely, we can write macros to drop tables programatically. 🥳