vutrinh (@_vutrinh) 's Twitter Profile
vutrinh

@_vutrinh

My mom read my articles to support her son. Now, she can design a data architecture and write ETL scripts.

ID: 1638587225146560512

linkhttps://vutr.substack.com calendar_today22-03-2023 17:03:50

48 Tweet

111 Followers

215 Following

AutoMQ: Cost-Effective Auto-Scaling Kafka (@automq_lab) 's Twitter Profile Photo

🎉 Wow. This is truly an epic masterpiece. Article from Vu Trinh(vutrinh), with its vivid illustrations, breaks down and explains the technical architecture of AutoMQ in a very clear and understandable way. If you're interested in the cloud-native technical architecture of

🎉 Wow. This is truly an epic masterpiece. Article from Vu Trinh(<a href="/_vutrinh/">vutrinh</a>), with its vivid illustrations, breaks down and explains the technical architecture of AutoMQ in a very clear and understandable way. If you're interested in the cloud-native technical architecture of
vutrinh (@_vutrinh) 's Twitter Profile Photo

🚀🚀 How does the Apache Iceberg reading process look like? ◉ The reader first visits the catalog to retrieve the table's current metadata file location. ◉ After fetching the metadata file, it collects the table’s schema and checks partition schemes to understand the data

🚀🚀 How does the <a href="/ApacheIceberg/">Apache Iceberg</a>  reading process look like?

â—‰ The reader first visits the catalog to retrieve the table's current metadata file location.

◉ After fetching the metadata file, it collects the table’s schema and checks partition schemes to understand the data
vutrinh (@_vutrinh) 's Twitter Profile Photo

🚀🚀 How does the Apache Spark plan the execution for us? (With the help of Catalyst Optimizer) When defining DataFrame transformation logic, it must first go through an optimized process before execution. This involves four key phases: ◉ Analysis: Spark SQL starts by

🚀🚀 How does the <a href="/ApacheSpark/">Apache Spark</a> plan the execution for us?

(With the help of Catalyst Optimizer)

When defining DataFrame transformation logic, it must first go through an optimized process before execution. This involves four key phases:

â—‰ Analysis: Spark SQL starts by
vutrinh (@_vutrinh) 's Twitter Profile Photo

🤔 My humble observation Large-scale cloud OLAP has increasingly converged toward the lakehouse paradigm. Below are some insights from my research—feel free to discuss or share corrections if you find anything off! 📌 In this context: ➝ Internal tables refer to data loaded

vutrinh (@_vutrinh) 's Twitter Profile Photo

🚀🚀 How does Apache Spark execute the applications for us? A few weeks ago, I wrote an article that gave an overview of Apache Spark. Let’s revisit how Spark handles processing—from user-defined logic to execution by the executors: ◉ Defining the Application: The user defines

🚀🚀 How does Apache Spark execute the applications for us?

A few weeks ago, I wrote an article that gave an overview of Apache Spark. Let’s revisit how Spark handles processing—from user-defined logic to execution by the executors:

â—‰ Defining the Application: The user defines
Shivang Agarwal (@shivang_in) 's Twitter Profile Photo

Have you ever wondered how the Parquet dataset is written on the Disk? Parquet is a self-described file format that contains all the information needed for the application that consumes the file. Parquet organizes data in a hybrid format behind the scenes.

Have you ever wondered how the Parquet dataset is written on the Disk?

Parquet is a self-described file format that contains all the information needed for the application that consumes the file.

Parquet organizes data in a hybrid format behind the scenes.
vutrinh (@_vutrinh) 's Twitter Profile Photo

🚀🚀 DuckDB is great. It allows us to execute analytics SQLs on the local laptop with minutes set up. Here are some bullet points about its storage after my sefl-learning process via DuckDB’s materials and source code. ◉ Two modes: persistent and in-memory; the latter will

vutrinh (@_vutrinh) 's Twitter Profile Photo

Parquet is not a columnar format. Indeed, it’s a hybrid format combining the best of row and column formats. Parquet groups data into subsets of rows. (horizontal partition.) In each subset, data for each column is stored close together. (vertical partition) A Parquet file is

Parquet is not a columnar format.

Indeed, it’s a hybrid format combining the best of row and column formats.

Parquet groups data into subsets of rows. (horizontal partition.)

In each subset, data for each column is stored close together. (vertical partition)

A Parquet file is