Sherin Jacob (@jcsherin) 's Twitter Profile
Sherin Jacob

@jcsherin

Programmer

ID: 3515788533

linkhttps://protoship.io/ calendar_today01-09-2015 20:01:16

264 Tweet

226 Takipçi

153 Takip Edilen

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

It is a common misconception that Apache Parquet files are restricted to basic statistics. Footer metadata and offset-based addressing permit user-defined index structures today. Latest ApacheDataFusion blog from Qi Zhi, Jigao Luo and myself explains how datafusion.apache.org/blog/2025/07/1…

It is a common misconception that <a href="/ApacheParquet/">Apache Parquet</a> files are restricted to basic statistics. Footer metadata and offset-based addressing permit user-defined index structures today.

Latest <a href="/ApacheDataFusio/">ApacheDataFusion</a>  blog from Qi Zhi, Jigao Luo and myself explains how datafusion.apache.org/blog/2025/07/1…
Azim Afroozeh (@afroozeh3) 's Twitter Profile Photo

I'm excited to share that our paper (in collaboration with Peter Boncz ) has been accepted at VLDB 2025 in London and will be presented there: The FastLanes File Format In this paper, we introduce the FastLanes file format with Expression Encoding—a new way to define and combine

Maximilian Kuschewski (@maxikuschewski) 's Twitter Profile Photo

Andrew Lamb One improvement regarding benchmaxxing is having thousands of diverse benchmark queries instead of dozens. Plugging the new SQLStorm paper below ;)

<a href="/andrewlamb1111/">Andrew Lamb</a> One improvement regarding benchmaxxing is having thousands of diverse benchmark queries instead of dozens. Plugging the new SQLStorm paper below ;)
Stefan Marr (@smarr) 's Twitter Profile Photo

How can you slow down a program? And perhaps more importantly, why would you? Blog post on our upcoming VMIL Workshop at SPLASH paper. stefan-marr.de/2025/08/how-to… The research was led by Humphrey Burchell.

Peter Boncz (@peterabcz) 's Twitter Profile Photo

Tobias Schmidt (TUM) at VLDB 2025 🇬🇧 presented SQLStorm, which uses LLMs to generate a huge amount of large queries. SQLStorm now has 18K different complex queries and runs on a large real-world dataset (stackoverflow) paper: vldb.org/pvldb/vol18/p4… code: github.com/SQL-Storm/SQLS…

Tobias Schmidt (TUM) at <a href="/VLDBconf/">VLDB 2025 🇬🇧</a> presented SQLStorm, which uses LLMs to generate a huge amount of large queries.

SQLStorm now has 18K different complex queries and runs on a large real-world dataset (stackoverflow)  

paper: vldb.org/pvldb/vol18/p4…
code: github.com/SQL-Storm/SQLS…
Xuanwo (@onlyxuanwo) 's Twitter Profile Photo

People asked me about how OpenDAL makes money: the answer is it doesn’t. OpenDAL is for public goods, it helps you to access storage services and make money 🫡

Andy Pavlo (@andypavlo.bsky.social) (@andy_pavlo) 's Twitter Profile Photo

The sordid backstory is that there was an collaboration attempt to unify on a single format with CMU, Tsinghua, Meta, CWI, Voltron, Nvidia, and SpiralDB. The plan was to create a consortium and start with Meta's Nimble. But then lawyers got involved and it all fell apart.

Andy Pavlo (@andypavlo.bsky.social) (@andy_pavlo) 's Twitter Profile Photo

So instead of working together, everyone (including us) released their own format: → velox-lib Nimble: github.com/facebookincuba… → CWI DA FastLanes: github.com/cwida/FastLanes → Spiral Vortex: vortex.dev

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

Our new thrift parser in the Rust Apache Parquet implementation is a 🎁 that keeps on giving performance wise 🚀 github.com/apache/arrow-r… We are also working on a blog post that has a deeper explanation

Our new thrift parser in the Rust <a href="/ApacheParquet/">Apache Parquet</a>  implementation is a 🎁 that keeps on giving performance wise 🚀 github.com/apache/arrow-r…

We are also working on a blog post that has a deeper explanation
Sherin Jacob (@jcsherin) 's Twitter Profile Photo

New post -- A B+Tree Node Underflows: Merge or Borrow? jacobsherin.com/posts/2025-08-… An interesting engineering trade-off I stumbled upon implementing a concurrent B+Tree from scratch; where production databases diverge from textbook algorithms, and each does it their own way.

v (@iavins) 's Twitter Profile Photo

We use asserts all the time in Turso DB and also in the Turso Server. They're in release builds and shipped to production. And yes, they could crash the server. Asserts are my favorites, and I use them whenever possible. Just yesterday I merged a PR that contained asserts and

We use asserts all the time in Turso DB and also in the Turso Server. They're in release builds and shipped to production. And yes, they could crash the server.

Asserts are my favorites, and I use them whenever possible. Just yesterday I merged a PR that contained asserts and
Angelo 🇵🇷 (@ngeloxyz) 's Twitter Profile Photo

First one is: "Speedrunning the lakehouse" by Jacopo Tagliabue (CTO of Bauplan) He asks: What if we started from scratch? Building a lakehouse infrastructure from scratch. Hilarious, funny, and informative youtube.com/watch?v=dvBRC9…

Wes McKinney (@wesmckinn) 's Twitter Profile Photo

Excited to announce a new side project, a power user terminal UI for your personal finances: moneyflow.dev For years I've used personal finance tools like Mint and now Monarch. The data cleaning can be slow and tedious, so I made this to speed that up!

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

ApacheDataFusion 's policy for AI assisted contribution: AI is great, but not AI dumps: maintainers could finish the task faster by using AI directly, and the submitters gain little knowledge when acting as a pass through AI proxy. datafusion.apache.org/contributor-gu…

Xuanwo (@onlyxuanwo) 's Twitter Profile Photo

For everyone interested in data infra, want to get a quick sense of how big data works, how data systems are designed, and what the tradeoffs are, start with this share from Xiangpeng Hao, really nice intro! intro-data-system.xiangpeng.systems

For everyone interested in data infra, want to get a quick sense of how big data works, how data systems are designed, and what the tradeoffs are, start with this share from <a href="/MOVNTDQ/">Xiangpeng Hao</a>, really nice intro!

intro-data-system.xiangpeng.systems
Jasim (@jasim_ab) 's Twitter Profile Photo

Been working on a tiny LLM service to help me write prompts just like regular well-typed application code. Here's a sample use case - map freeform text to an address form:

samlaf (@samlafer) 's Twitter Profile Photo

New paper by Nancy Lynch summarizing her career's influence on the field of distributed computing. arxiv.org/pdf/2502.20468 If you don't know who she is, she's the L in FLP and DLS. Marc Brooker has a good summary article: brooker.co.za/blog/2014/05/1…

New paper by Nancy Lynch summarizing her career's influence on the field of distributed computing.
arxiv.org/pdf/2502.20468

If you don't know who she is, she's the L in FLP and DLS. <a href="/MarcJBrooker/">Marc Brooker</a> has a good summary article: brooker.co.za/blog/2014/05/1…