jasmine wang (@jasminechenwang) 's Twitter Profile
jasmine wang

@jasminechenwang

Yoga, Cognitive Science, Open Source technology, Startups, Sunset at the Beach

ID: 1542231877158785024

calendar_today29-06-2022 19:42:18

42 Tweet

36 Followers

40 Following

LanceDB (@lancedb) 's Twitter Profile Photo

Live at #DataAISummit from Databricks Databricks AI Research and LanceDB . A joint talk by changhiskhan and Zero Qu Congrats to both teams on the newly announced storage optimized vector search. Now we take billion vector scale to the moon!

Live at #DataAISummit from <a href="/databricks/">Databricks</a> <a href="/DbrxMosaicAI/">Databricks AI Research</a> and <a href="/lancedb/">LanceDB</a> . 
A joint talk by <a href="/changhiskhan/">changhiskhan</a>  and Zero Qu

Congrats to both teams on the newly announced storage optimized vector search.

Now we take billion vector scale to the moon!
LanceDB (@lancedb) 's Twitter Profile Photo

Missed Ethanโ€™s talk at @DataCouncilAI 2025? ๐ŸŽค He shares how @RunwayML tackles multimodal data challengesโ€”and how LanceDB helps store, query, and retrieve it all efficiently. ๐ŸŽฅ Watch here: youtube.com/watch?v=6kf58xโ€ฆ Ethan's slides: ethanrosenthal.com/keynotes/data_โ€ฆ #LanceDB

LanceDB (@lancedb) 's Twitter Profile Photo

Today weโ€™re announcing ourย $30 million Series A. This round is led byย Theory VC with support fromย CRV , Y Combinator, Databricks, Runway , Zero Prime VC , swiftventurecapital,ย and more. Your belief in a future powered by multimodal dataย brings us one step closer to that reality.

Today weโ€™re announcing ourย $30 million Series A.

This round is led byย <a href="/Theoryvc/">Theory VC</a> with support fromย <a href="/CRV/">CRV</a> , <a href="/ycombinator/">Y Combinator</a>, <a href="/databricks/">Databricks</a>, <a href="/runwayml/">Runway</a> , <a href="/ZeroPrimeVC/">Zero Prime VC</a> , <a href="/swift_vc/">swiftventurecapital</a>,ย and more. Your belief in a future powered by multimodal dataย brings us one step closer to that reality.
LanceDB (@lancedb) 's Twitter Profile Photo

We just published a ๐—ป๐—ฒ๐˜„ ๐—ฏ๐—น๐—ผ๐—ด (lancedb.com/blog/multimodaโ€ฆ) on what the ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น ๐—Ÿ๐—ฎ๐—ธ๐—ฒ๐—ต๐—ผ๐˜‚๐˜€๐—ฒ actually does. The Lakehouse is ๐—ณ๐—ผ๐—ฟ ๐˜๐—ต๐—ผ๐˜€๐—ฒ working with a mix of text, images, audio, and structured data - ๐˜„๐—ต๐—ผ ๐˜„๐—ถ๐˜€๐—ต ๐˜๐—ผ ๐—ฎ๐˜ƒ๐—ผ๐—ถ๐—ฑ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฎ๐—ถ๐—ป of

We just published a ๐—ป๐—ฒ๐˜„ ๐—ฏ๐—น๐—ผ๐—ด (lancedb.com/blog/multimodaโ€ฆ) on what the ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น ๐—Ÿ๐—ฎ๐—ธ๐—ฒ๐—ต๐—ผ๐˜‚๐˜€๐—ฒ actually does.

The Lakehouse is ๐—ณ๐—ผ๐—ฟ ๐˜๐—ต๐—ผ๐˜€๐—ฒ working with a mix of text, images, audio, and structured data - ๐˜„๐—ต๐—ผ ๐˜„๐—ถ๐˜€๐—ต ๐˜๐—ผ ๐—ฎ๐˜ƒ๐—ผ๐—ถ๐—ฑ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฎ๐—ถ๐—ป of
Andriy Mulyar (@andriy_mulyar) 's Twitter Profile Photo

swyx dr. jack morris - built by solid db people and hackable (we have a contributor at nomic to it) - used by top ai companies / labs / products for it's nice properties when used in a training loops (e.g. midjourney has been using it since 2023) so probably not going anywhere - feels like the right

LanceDB (@lancedb) 's Twitter Profile Photo

๐Ÿš€ Video from Toronto Machine Learning Society (TMLS) : Character.AI x @LanceDB on building a unified multimodal data lake , a single system for text, audio, video & image retrieval. changhiskhan Ryan Vilim Simpler pipelines, lower infra costs, faster AI dev. ๐ŸŽฅ Watch: youtu.be/8zMeYwR9uQI #AI #LLM

๐Ÿš€ Video from <a href="/TMLS_TO/">Toronto Machine Learning Society (TMLS)</a> :
<a href="/character_ai/">Character.AI</a>  x @LanceDB on building a unified multimodal data lake , a single system for text, audio, video &amp; image retrieval.
<a href="/changhiskhan/">changhiskhan</a> <a href="/ryanvilim/">Ryan Vilim</a> 

Simpler pipelines, lower infra costs, faster AI dev.

๐ŸŽฅ Watch: youtu.be/8zMeYwR9uQI

#AI #LLM
LanceDB (@lancedb) 's Twitter Profile Photo

The data prep bottleneck for fine-tuning LLMs is a common challenge. ๐—ข๐˜‚๐—ฟ ๐—ป๐—ฒ๐˜„ ๐—ถ๐—ป๐˜๐—ฒ๐—ด๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜„๐—ถ๐˜๐—ต ๐— ๐—ฒ๐˜๐—ฎ'๐˜€ ๐—ฆ๐˜†๐—ป๐˜๐—ต๐—ฒ๐˜๐—ถ๐—ฐ ๐——๐—ฎ๐˜๐—ฎ ๐—ž๐—ถ๐˜ ๐—ต๐—ฒ๐—ฟ๐—ฒ ๐˜๐—ผ ๐—ณ๐—ถ๐˜… ๐˜๐—ต๐—ฎ๐˜! It simplifies the entire workflow with a ๐˜€๐˜๐—ฟ๐—ฎ๐—ถ๐—ด๐—ต๐˜๐—ณ๐—ผ๐—ฟ๐˜„๐—ฎ๐—ฟ๐—ฑ ๐—–๐—Ÿ๐—œ for

LanceDB (@lancedb) 's Twitter Profile Photo

When building a columnar file reader, it becomes clear that ๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜ ๐—ท๐˜‚๐˜€๐˜ ๐—ฎ๐—ป ๐—ฎ๐—ฏ๐˜€๐˜๐—ฟ๐—ฎ๐—ฐ๐˜ ๐—ฐ๐—ผ๐—ป๐—ฐ๐—ฒ๐—ฝ๐˜.ย (t.ly/3AyJh) It is the set of rules that determines how every byte of data is stored and accessed on disk. A few months ago,

Apache Spark (@apachespark) 's Twitter Profile Photo

Join us for our webinar onย Apache Sparkโ„ข and Lance Spark Connectorย with Jack Ye (LanceDB) on September 25! ๐Ÿ‘ Learn how the Lance Spark Connector enables Apache Sparkโ„ข to work with Lanceโ€™s AI-native multimodal storage. โœ… Weโ€™ll look at how Spark can handle embeddings, images,

Join us for our webinar onย Apache Sparkโ„ข and Lance Spark Connectorย with Jack Ye (<a href="/lancedb/">LanceDB</a>) on September 25! ๐Ÿ‘

Learn how the Lance Spark Connector enables Apache Sparkโ„ข to work with Lanceโ€™s AI-native multimodal storage. โœ… Weโ€™ll look at how Spark can handle embeddings, images,
changhiskhan (@changhiskhan) 's Twitter Profile Photo

This is a big milestone for Lance format. The F3 paper (dl.acm.org/doi/10.1145/37โ€ฆ) verified that Lance has THE fastest random access, essential for search, shuffle, and many other AI workloads. But it incorrectly assumed it was because of lack of compression. With 2.1, we show

LanceDB (@lancedb) 's Twitter Profile Photo

1/7 ๐ŸŽจ In a world of infinite scroll, discovering art still feels like searching for a needle in a haystack. With SemanticDotArt, we flipped the question: What if you searched by mood, not just metadata? See how we did this in LanceDB ๐Ÿ‘‡๐Ÿฝ

LanceDB (@lancedb) 's Twitter Profile Photo

1/5 LanceDB ๐Ÿซถ๐Ÿป DuckDB Weโ€™re happy to announce a new Lance extension for DuckDB! You can simply install this extension in DuckDB and point at your Lance datasets from within a DuckDB CLI or a Python script, while getting ๐—ณ๐˜‚๐—น๐—น ๐—ฆ๐—ค๐—Ÿ ๐—ฐ๐—ฎ๐—ฝ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐—ถ๐—ฒ๐˜€ ๐—ผ๐—ป ๐˜๐—ผ๐—ฝ ๐—ผ๐—ณ

1/5 <a href="/lancedb/">LanceDB</a> ๐Ÿซถ๐Ÿป <a href="/duckdb/">DuckDB</a>

Weโ€™re happy to announce a new Lance extension for DuckDB! You can simply install this extension in DuckDB and point at your Lance datasets from within a DuckDB CLI or a Python script, while getting ๐—ณ๐˜‚๐—น๐—น ๐—ฆ๐—ค๐—Ÿ ๐—ฐ๐—ฎ๐—ฝ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐—ถ๐—ฒ๐˜€ ๐—ผ๐—ป ๐˜๐—ผ๐—ฝ ๐—ผ๐—ณ
LanceDB (@lancedb) 's Twitter Profile Photo

1/5 Large multimodal blobs donโ€™t have to break dataset workflows. Images and videos are often treated as external files, separate from metadata and indexes. Once datasets get large, that split makes exploration, curation, and training painful. Lance changes that on the ๐Ÿค—

LanceDB (@lancedb) 's Twitter Profile Photo

1/6 Hereโ€™s a quick example of how to read Hugging Face datasets via LanceDB. Start with opening a LanceDB connection to a dataset on the Hub using the hf:// prefix path.

1/6 Hereโ€™s a quick example of how to read <a href="/huggingface/">Hugging Face</a> datasets via LanceDB. 

Start with opening a LanceDB connection to a dataset on the Hub using the hf:// prefix path.
Julien Chaumond (@julien_c) 's Twitter Profile Photo

in case you missed it LanceDB and HF are partnering up to unlock the next generation of large dataset storage on the Hub ๐Ÿ”ฅ And it's fire! - Supports storing embeddings (and their indexes) directly alongside the data - Vector search / similarity search is built-in - Large

in case you missed it <a href="/lancedb/">LanceDB</a> and HF are partnering up to unlock the next generation of large dataset storage on the Hub ๐Ÿ”ฅ

And it's fire!

- Supports storing embeddings (and their indexes) directly alongside the data
- Vector search / similarity search is built-in
- Large
LanceDB (@lancedb) 's Twitter Profile Photo

1/4 Branching for ML data shouldnโ€™t slow down production. Iceberg branching โ†’ shared metadata bottlenecks. Delta shallow clone โ†’ isolation, but loses Git-like UX. We want both. Hereโ€™s how Lance unifies branching, tagging, and shallow clone for AI workloads ๐Ÿงต

1/4 Branching for ML data shouldnโ€™t slow down production.

Iceberg branching โ†’ shared metadata bottlenecks.
Delta shallow clone โ†’ isolation, but loses Git-like UX.

We want both.

Hereโ€™s how Lance unifies branching, tagging, and shallow clone for AI workloads ๐Ÿงต
LanceDB (@lancedb) 's Twitter Profile Photo

1/3 Geospatial support just landed in Lance. And no new storage format work was required. Because Lance is Arrow-native, GeoArrow extension types work out of the box. Geometry columns are preserved end-to-end with zero special casing.

1/3 Geospatial support just landed in Lance.

And no new storage format work was required.

Because Lance is Arrow-native, GeoArrow extension types work out of the box. Geometry columns are preserved end-to-end with zero special casing.
LanceDB (@lancedb) 's Twitter Profile Photo

3/3 Huge thanks to: ๐Ÿ™Œย Xin Sun (@bytedance) for driving the R-Tree implementation ๐Ÿ™Œย Jay Narale (Uber) for the BKD prototype + benchmarking ๐Ÿ™Œย Kyle Barron kylebarron.dev on bsky (#GeoArrow / #GeoDataFusion) for ecosystem guidance ๐Ÿ™Œย Tim Saucer (#ApacheDataFusion) for helping ensure a clean

AI่ถ…ๅ…ƒๅŸŸ (@aisuperdomain) 's Twitter Profile Photo

๐Ÿš€ๆƒณไธๅˆฐ่ฟ™ไธชไธบOpenClawๅฎšๅˆถ็š„ๅขžๅผบ็‰ˆLanceDBๆ’ไปถ้กน็›ฎๅ‘ๅธƒๆ‰ไธ‰ๅคฉ๏ผŒ้ƒฝๆœ‰ๅคงไฝฌๅผ€ๅง‹้€’ไบคprไบ†ใ€‚ #OpenClaw github.com/win4r/memory-lโ€ฆ