Mark Lyons
@mcl5tech
product @cloudera | prev product @aws @dremio @verticaunified • #data #analytics #design #tech for 🌍
ID: 907193988
http://www.markclyons.com 27-10-2012 02:22:36
1,1K Tweet
910 Takipçi
4,4K Takip Edilen
Always great to catch up with people who have depth in the data space to share the stories from academic papers to how companies have been created. Thanks Juan Sequeda Tim Gasper
Merge-On-Read (MOR) Vs Copy-On-Write (COW) in Apache Iceberg. Both these approaches are used to deal with deletes & updates of data files in the Data lake. Let’s break down @IcebergDevs👇 #DataEngineering #data
How do we migrate from one catalog to another for Apache Iceberg tables? if you are already using a catalog (say HDFS) & want to change it to something else (say AWS Glue), how is that possible? A 🧵 for @IcebergDevs #dataengineering
Manage data as code? Just like Git but for Data? That's right! projectnessie is an open source work that brings the capabilities of Git-like branching to the world of data & specifically to data lake table formats like #ApacheIceberg #dataengineering
The ApacheArrow project has grown in all axes 🚀 In fact, more & more tools/libraries in the #dataanalytics space have started using Arrow. In this blog post, we go through the evolution of Apache Arrow from usage, capability & community angles. dremio.com/blog/apache-ar…
Query planning in Apache Iceberg Being able to efficiently plan queries is super critical for faster execution of the queries run by analysts 🧑🏻💻 This is specifically critical when dealing with large-scale data such as data in data lakes. Read @IcebergDevs 👇 #dataengineering