jasmine wang
@jasminechenwang
Yoga, Cognitive Science, Open Source technology, Startups, Sunset at the Beach
ID: 1542231877158785024
29-06-2022 19:42:18
42 Tweet
36 Followers
40 Following
Live at #DataAISummit from Databricks Databricks AI Research and LanceDB . A joint talk by changhiskhan and Zero Qu Congrats to both teams on the newly announced storage optimized vector search. Now we take billion vector scale to the moon!
Missed Ethanโs talk at @DataCouncilAI 2025? ๐ค He shares how @RunwayML tackles multimodal data challengesโand how LanceDB helps store, query, and retrieve it all efficiently. ๐ฅ Watch here: youtube.com/watch?v=6kf58xโฆ Ethan's slides: ethanrosenthal.com/keynotes/data_โฆ #LanceDB
Today weโre announcing ourย $30 million Series A. This round is led byย Theory VC with support fromย CRV , Y Combinator, Databricks, Runway , Zero Prime VC , swiftventurecapital,ย and more. Your belief in a future powered by multimodal dataย brings us one step closer to that reality.
We just published a ๐ป๐ฒ๐ ๐ฏ๐น๐ผ๐ด (lancedb.com/blog/multimodaโฆ) on what the ๐ ๐๐น๐๐ถ๐บ๐ผ๐ฑ๐ฎ๐น ๐๐ฎ๐ธ๐ฒ๐ต๐ผ๐๐๐ฒ actually does. The Lakehouse is ๐ณ๐ผ๐ฟ ๐๐ต๐ผ๐๐ฒ working with a mix of text, images, audio, and structured data - ๐๐ต๐ผ ๐๐ถ๐๐ต ๐๐ผ ๐ฎ๐๐ผ๐ถ๐ฑ ๐๐ต๐ฒ ๐ฝ๐ฎ๐ถ๐ป of
swyx dr. jack morris - built by solid db people and hackable (we have a contributor at nomic to it) - used by top ai companies / labs / products for it's nice properties when used in a training loops (e.g. midjourney has been using it since 2023) so probably not going anywhere - feels like the right
๐ Video from Toronto Machine Learning Society (TMLS) : Character.AI x @LanceDB on building a unified multimodal data lake , a single system for text, audio, video & image retrieval. changhiskhan Ryan Vilim Simpler pipelines, lower infra costs, faster AI dev. ๐ฅ Watch: youtu.be/8zMeYwR9uQI #AI #LLM
The data prep bottleneck for fine-tuning LLMs is a common challenge. ๐ข๐๐ฟ ๐ป๐ฒ๐ ๐ถ๐ป๐๐ฒ๐ด๐ฟ๐ฎ๐๐ถ๐ผ๐ป ๐๐ถ๐๐ต ๐ ๐ฒ๐๐ฎ'๐ ๐ฆ๐๐ป๐๐ต๐ฒ๐๐ถ๐ฐ ๐๐ฎ๐๐ฎ ๐๐ถ๐ ๐ต๐ฒ๐ฟ๐ฒ ๐๐ผ ๐ณ๐ถ๐ ๐๐ต๐ฎ๐! It simplifies the entire workflow with a ๐๐๐ฟ๐ฎ๐ถ๐ด๐ต๐๐ณ๐ผ๐ฟ๐๐ฎ๐ฟ๐ฑ ๐๐๐ for
When building a columnar file reader, it becomes clear that ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ ๐ถ๐ ๐ป๐ผ๐ ๐ท๐๐๐ ๐ฎ๐ป ๐ฎ๐ฏ๐๐๐ฟ๐ฎ๐ฐ๐ ๐ฐ๐ผ๐ป๐ฐ๐ฒ๐ฝ๐.ย (t.ly/3AyJh) It is the set of rules that determines how every byte of data is stored and accessed on disk. A few months ago,
Join us for our webinar onย Apache Sparkโข and Lance Spark Connectorย with Jack Ye (LanceDB) on September 25! ๐ Learn how the Lance Spark Connector enables Apache Sparkโข to work with Lanceโs AI-native multimodal storage. โ Weโll look at how Spark can handle embeddings, images,
1/5 LanceDB ๐ซถ๐ป DuckDB Weโre happy to announce a new Lance extension for DuckDB! You can simply install this extension in DuckDB and point at your Lance datasets from within a DuckDB CLI or a Python script, while getting ๐ณ๐๐น๐น ๐ฆ๐ค๐ ๐ฐ๐ฎ๐ฝ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ถ๐ฒ๐ ๐ผ๐ป ๐๐ผ๐ฝ ๐ผ๐ณ
1/6 Hereโs a quick example of how to read Hugging Face datasets via LanceDB. Start with opening a LanceDB connection to a dataset on the Hub using the hf:// prefix path.
3/3 Huge thanks to: ๐ย Xin Sun (@bytedance) for driving the R-Tree implementation ๐ย Jay Narale (Uber) for the BKD prototype + benchmarking ๐ย Kyle Barron kylebarron.dev on bsky (#GeoArrow / #GeoDataFusion) for ecosystem guidance ๐ย Tim Saucer (#ApacheDataFusion) for helping ensure a clean