Start Data Engineering (@startdataeng) Twitter Tweets • TwiCopy

Start Data Engineering

2 years ago

An orchestration tool that I've been impressed with is Not Dagster. Easy setup, powerful features and great docs.

Use 👇🏽 to play around with a pipeline on dagster

startdataengineering.com/post/data-engi…

#data #data engineering #Python #Database #DataAnalytics

account_circle

Start Data Engineering

@startdataeng

1 year ago

Starting as a DE? 90% of what you will need
is SQL (OLAP), python, & distributed system basics

Don't overcomplicate!

#data
#data engineering
#SQL
#Database
#Python

account_circle

Start Data Engineering

@startdataeng

2 years ago

Starting a data project is a lot of work! It can be overwhelming to define the problem, set up systems, and then code!

Use this DE project as a blueprint to build your own:

startdataengineering.com/post/data-engi…

#data #data engineering #Database #DataAnalytics
#data viz #Python #data pipeline

account_circle

Start Data Engineering

@startdataeng

2 years ago

Preparing for SQL interviews?
Do Leetcode SQL hard, sort by freq, and do the first 40

#data
#data engineering
#Software
#SQL

account_circle

Start Data Engineering

@startdataeng

3 weeks ago

Accidentally deleting data (or dropping a partition), either through a pipeline bug or manual error, is common in data warehouses.

Ensure data is retrievable with Time travel. Here is an example with Apache Iceberg:

#data #data engineering #data recovery #data pipeline #SQL

thumb_up_off_alt23

chat_bubble_outline0

repeat4

shareShare

account_circle

Start Data Engineering

@startdataeng

2 years ago

It can be overwhelming to start learning data engineering. I'd recommend starting with the basics of python, sql, UNIX commands, building a simple data project, update Github, Linkedin. Landing a DE job is 60% part learning and 40% marketing. See reply 👇🏽 for helpful links.

account_circle

Start Data Engineering

@startdataeng

3 weeks ago

Accidentally deleting data (or dropping a partition), either through a pipeline bug or manual error, is common in data warehouses.

Ensure data is retrievable with Time travel. Here is an example with Apache Iceberg:

#data #data engineering #data recovery #data pipeline #SQL

thumb_up_off_alt23

chat_bubble_outline0

repeat4

shareShare

account_circle

Start Data Engineering

@startdataeng

3 weeks ago

Are you researching developing efficiency data pipelines? You might have heard the terms functional/ factory pattern, etc.

Checkout this post, that goes over popular code design patterns for data pipelines:

startdataengineering.com/post/code-patt…

#data #python #data engineering #data pipelines

thumb_up_off_alt25

chat_bubble_outline0

repeat8

shareShare

account_circle

Start Data Engineering

@startdataeng

3 years ago

Exercise project for anyone starting in data engineering startdataengineering.com/post/data-engi…
#dataengineering #bigdata #ETL #ApacheAirflow #AWS #ApacheSpark

account_circle

Start Data Engineering

@startdataeng

3 weeks ago

Learning data engineering? Build a pipeline locally.

1. Python to pull data from an API (e.g. Coincap)
2. Load data into a local Postgres container
3. Automate it with cron/task scheduler

Start small, build, improve, & repeat.

#data #data engineering #pythonlearning #Python

account_circle

Start Data Engineering

@startdataeng

2 years ago

When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems!

#data #data engineering #Python #pythonlearning #Generator

E.g. Stream a file(note () and not []), get diff between date cols

account_circle