Hi Tristan! Good piece! I have a question for you. In the company I work for, I observe we did something you suggest, in a way. So our analysts created pipelines from raw data ingested. The problems are:

  • We have data silos, each team create their own mart. This becomes problem when they discuss same metrics but has different number, due to slightly different way to process
  • Repetitive effort, there are pipelines that is similar but running separately
  • Cost, despite of our effort to keep sharing the best practice, pipelines created by analyst grew fast and hard to control

Do you have any suggestions around this?

We are thinking to build a centralized pipeline and data model. Centralized data model maintain single definition and there will be no repetead effort. Cost also a lot more manageable. However that means dedicated data engineers working in “the old way”.

