vutrinh (@_vutrinh) Twitter Tweets • TwiCopy

vutrinh

@_vutrinh

+ Follow

My mom read my articles to support her son. Now, she can design a data architecture and write ETL scripts.

ID: 1638587225146560512

linkhttps://vutr.substack.com calendar_today22-03-2023 17:03:50

48 Tweet

111 Followers

215 Following

AutoMQ: Cost-Effective Auto-Scaling Kafka

@automq_lab

a year ago

🎉 Wow. This is truly an epic masterpiece. Article from Vu Trinh(vutrinh), with its vivid illustrations, breaks down and explains the technical architecture of AutoMQ in a very clear and understandable way. If you're interested in the cloud-native technical architecture of

🎉 Wow. This is truly an epic masterpiece. Article from Vu Trinh(<a href="/_vutrinh/">vutrinh</a>), with its vivid illustrations, breaks down and explains the technical architecture of AutoMQ in a very clear and understandable way. If you're interested in the cloud-native technical architecture of

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

vutrinh

@_vutrinh

a year ago

🚀🚀 How does the Apache Iceberg reading process look like? ◉ The reader first visits the catalog to retrieve the table's current metadata file location. ◉ After fetching the metadata file, it collects the table’s schema and checks partition schemes to understand the data

🚀🚀 How does the <a href="/ApacheIceberg/">Apache Iceberg</a> reading process look like?

◉ The reader first visits the catalog to retrieve the table's current metadata file location.

◉ After fetching the metadata file, it collects the table’s schema and checks partition schemes to understand the data

thumb_up_off_alt0

chat_bubble_outline0

repeat1

shareShare

vutrinh

@_vutrinh

a year ago

🚀🚀 How does the Apache Spark plan the execution for us? (With the help of Catalyst Optimizer) When defining DataFrame transformation logic, it must first go through an optimized process before execution. This involves four key phases: ◉ Analysis: Spark SQL starts by

🚀🚀 How does the <a href="/ApacheSpark/">Apache Spark</a> plan the execution for us?

(With the help of Catalyst Optimizer)

When defining DataFrame transformation logic, it must first go through an optimized process before execution. This involves four key phases:

◉ Analysis: Spark SQL starts by

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

vutrinh

@_vutrinh

a year ago

🤔 My humble observation Large-scale cloud OLAP has increasingly converged toward the lakehouse paradigm. Below are some insights from my research—feel free to discuss or share corrections if you find anything off! 📌 In this context: ➝ Internal tables refer to data loaded

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

vutrinh

@_vutrinh

a year ago

🚀🚀 How does Apache Spark execute the applications for us? A few weeks ago, I wrote an article that gave an overview of Apache Spark. Let’s revisit how Spark handles processing—from user-defined logic to execution by the executors: ◉ Defining the Application: The user defines

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Shivang Agarwal

@shivang_in

a year ago

Have you ever wondered how the Parquet dataset is written on the Disk? Parquet is a self-described file format that contains all the information needed for the application that consumes the file. Parquet organizes data in a hybrid format behind the scenes.

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Pekka Enberg

@penberg

a year ago

Paper I would love to read but instead have to write? 🤔

thumb_up_off_alt469

chat_bubble_outline11

repeat18

shareShare

vutrinh

@_vutrinh

a year ago

🚀🚀 DuckDB is great. It allows us to execute analytics SQLs on the local laptop with minutes set up. Here are some bullet points about its storage after my sefl-learning process via DuckDB’s materials and source code. ◉ Two modes: persistent and in-memory; the latter will

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

vutrinh

@_vutrinh

9 months ago

Parquet is not a columnar format. Indeed, it’s a hybrid format combining the best of row and column formats. Parquet groups data into subsets of rows. (horizontal partition.) In each subset, data for each column is stored close together. (vertical partition) A Parquet file is

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare