Start Data Engineering(@startdataeng) 's Twitter Profileg
Start Data Engineering

@startdataeng

I write about data engineering | SQL | Python | Distributed systems. Get my free data engineering course at https://t.co/sZTEcV0Q9W

ID:1249353981106798595

linkhttps://www.startdataengineering.com/ calendar_today12-04-2020 15:11:13

1,4K Tweets

7,5K Followers

30 Following

Start Data Engineering(@startdataeng) 's Twitter Profile Photo

An orchestration tool that I've been impressed with is Not Dagster. Easy setup, powerful features and great docs.

Use 👇🏽 to play around with a pipeline on dagster

startdataengineering.com/post/data-engi…

engineering

account_circle
Start Data Engineering(@startdataeng) 's Twitter Profile Photo

Starting a data project is a lot of work! It can be overwhelming to define the problem, set up systems, and then code!

Use this DE project as a blueprint to build your own:

startdataengineering.com/post/data-engi…

engineering
viz pipeline

Starting a data project is a lot of work! It can be overwhelming to define the problem, set up systems, and then code! Use this DE project as a blueprint to build your own: startdataengineering.com/post/data-engi… #data #dataengineering #Database #DataAnalytics #dataviz #Python #datapipeline
account_circle
Start Data Engineering(@startdataeng) 's Twitter Profile Photo

Accidentally deleting data (or dropping a partition), either through a pipeline bug or manual error, is common in data warehouses.

Ensure data is retrievable with Time travel. Here is an example with Apache Iceberg:

engineering recovery pipeline

Accidentally deleting data (or dropping a partition), either through a pipeline bug or manual error, is common in data warehouses. Ensure data is retrievable with Time travel. Here is an example with Apache Iceberg: #data #dataengineering #datarecovery #datapipeline #SQL
account_circle
Start Data Engineering(@startdataeng) 's Twitter Profile Photo

It can be overwhelming to start learning data engineering. I'd recommend starting with the basics of python, sql, UNIX commands, building a simple data project, update Github, Linkedin. Landing a DE job is 60% part learning and 40% marketing. See reply 👇🏽 for helpful links.

account_circle
Start Data Engineering(@startdataeng) 's Twitter Profile Photo

Accidentally deleting data (or dropping a partition), either through a pipeline bug or manual error, is common in data warehouses.

Ensure data is retrievable with Time travel. Here is an example with Apache Iceberg:

engineering recovery pipeline

Accidentally deleting data (or dropping a partition), either through a pipeline bug or manual error, is common in data warehouses. Ensure data is retrievable with Time travel. Here is an example with Apache Iceberg: #data #dataengineering #datarecovery #datapipeline #SQL
account_circle
Start Data Engineering(@startdataeng) 's Twitter Profile Photo

Are you researching developing efficiency data pipelines? You might have heard the terms functional/ factory pattern, etc.

Checkout this post, that goes over popular code design patterns for data pipelines:

startdataengineering.com/post/code-patt…

engineering pipelines

account_circle
Start Data Engineering(@startdataeng) 's Twitter Profile Photo

Learning data engineering? Build a pipeline locally.

1. Python to pull data from an API (e.g. Coincap)
2. Load data into a local Postgres container
3. Automate it with cron/task scheduler

Start small, build, improve, & repeat.

engineering

Learning data engineering? Build a pipeline locally. 1. Python to pull data from an API (e.g. Coincap) 2. Load data into a local Postgres container 3. Automate it with cron/task scheduler Start small, build, improve, & repeat. #data #dataengineering #pythonlearning #Python
account_circle
Start Data Engineering(@startdataeng) 's Twitter Profile Photo

When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems!

engineering

E.g. Stream a file(note () and not []), get diff between date cols

When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems! #data #dataengineering #Python #pythonlearning #Generator E.g. Stream a file(note () and not []), get diff between date cols
account_circle