sluofoss

Recent Notes

Graph Analytics
Aug 16, 2025
- code2flow
Understanding repo
Aug 16, 2025
- code2flow
Tea
Aug 03, 2025

❯

❯

Data Engineering

❯

Data Engineering Code Design Patterns

Data Engineering Code Design Patterns

Nov 10, 20241 min read

data-engineering
design-patterns
idempotence
determinism
idempotent
atomic
singleton
pyspark
factory

source from reddit discussion
General OOP design patterns
speculations

source from reddit discussion

https://www.reddit.com/r/dataengineering/comments/wdzyhr/code_design_patterns_for_data_pipelines/
idempotence anddeterminism. Make sure the pipeline produces the same results for every execution date that it is triggered.
idempotence: you dont get duplicates, you get exactly the same set of data, doesnt matter how many times you run it.
determinism: you can preddict the transformation output
Writing self repairing and self backfilling pipelines. They breakidempotence but are simpler to operate in my experience.
idempotent,atomic, anddeterminism
singleton is already provided bypyspark
Butfactory pattern will do charm and reduces efforts.

General OOP design patterns

https://refactoring.guru/design-patterns/catalog
here we list 3 major categories of design patterns:
- creational
  - factory
  - abstract factory
  - builder
  - prototype
  - singleton
- structural
  - adapter
  - bridge
  - composite
  - decorator
  - facade
  - flyweight
  - proxy
- behavioural
  - chain of responsibility
  - command
  - iterator
  - mediator
  - mememto
  - observer
  - state
  - strategy
  - template
  - visitor

speculations

creational seems to be more on the architectural side of things fordata-engineering?
structoral should be helpful in developing individual components of pipeline
behavioural should be helpful in tieing different building blocks of pipeline together.

Graph View

source from reddit discussion
General OOP design patterns
speculations

Backlinks

No backlinks found

Created with Quartz v4.4.0 © 2025

GitHub