Freelance Data(bricks) Engineer | #ApacheSpark #DeltaLake #UnityCatalog #Databricks #ApacheKafka #KafkaStreams | Java Champion | @theASF | #DatabricksMVP
ID: 38913594
https://www.linkedin.com/in/jaceklaskowski/ 09-05-2009 19:41:45
25,25K Tweet
6,6K Takipçi
853 Takip Edilen



The books have arrived 🥰📚📦 The pile is exactly how I'm gonna read them (from top to bottom), starting from the one about #ApacheIceberg 🧊🏔️ Thanks O'Reilly Media for these complimentary copies 🙏



Jane Street has started up our tech talk series after a pandemic-driven hiatus. Our first talk is from Charlie Marsh of Ruff fame, talking about how they made uv, their new package manager for Python, so fast! youtu.be/gSKTfG1GXYQ?si…




Blog post from Xiangpeng Hao explaining the different levels of pruning ApacheDataFusion applies when reading Parquet files: blog.haoxp.xyz/posts/parquet-… The diagrams in particular are 🧑🍳👌







Correction: GlareDB is moving away from DataFusion! Sean Smith's excellent talk discusses problems with building a DBMS using off-shelf parts. Like DuckDB, the GlareDB rewrite borrows ideas from TUM Database Group's HyPer system but it's written in Rust: youtube.com/watch?v=Sor3KZ…

Data Catalogs are getting much-needed attention across #datalakehouse and #datawarehouse as the plot thickens, as they say. We are sharing some of the deep internal research we did to support our multi-catalog sync feature in the Onehouse product in this blog from Kyle Weller .

Vinoth Chandar One thing I did not expect when doing this research was coming to the unfortunate realization that you might need more than one catalog to cover all the bases for a complete data platform solution...



🎉 We’re proud to announce the Apache Hudi 1.0 release! This release has been the result of a massive community effort, with tons of new code (re)written. I want to thank all 60+ contributors who worked on ~180K lines of change. 🗒️ Release blog: hudi.apache.org/blog/2024/12/1… Hudi
