Xin Zhang | 张鑫 (@xinzhangai) Twitter Tweets • TwiCopy

Tengyu Ma

7 months ago

We benchmarked 10 optimizers and found that the recent new optimizers still have limited speed up (~10%) over Adam at a "larger" scale (1.2B, 8x data than Chinchilla optimal). I guess that means more research to be done in this area!

thumb_up_off_alt142

chat_bubble_outline8

repeat19

shareShare

Xin Zhang | 张鑫

@xinzhangai

7 months ago

New open-weight small embedding model surpass mGTE !! With Matryoshka Embedding 🪆

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Xin Zhang | 张鑫

@xinzhangai

7 months ago

Impressive models! 🚀 Congrats!!! We now have new SOTA encoder backbone. Our mGTE-MLM-base outperforms XLM-R-base in 2024 summer (or jan by training), its a "semi-modern" encoder 😃. Glad to see it stands as one of only two baselines, alongside XLM-R.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Nandan Thakur

@beirmug

7 months ago

Really excited to share that FreshStack has been accepted at #neurips25 D&B Track (poster)! 🥁🥁 Huge congratulations to all my Databricks Mosaic Research co-authors! Time to see you in San Diego! 🍻

Really excited to share that FreshStack has been accepted at #neurips25 D&B Track (poster)! 🥁🥁

Huge congratulations to all my <a href="/DbrxMosaicAI/">Databricks Mosaic Research</a> co-authors! Time to see you in San Diego! 🍻

thumb_up_off_alt64

chat_bubble_outline3

repeat10

shareShare

Percy Liang

@percyliang

7 months ago

-2016 (classic era): focus on data efficiency 2017-2025 (pretraining era): focus on compute efficiency 2026-: focus on data efficiency (again) The standard Transformer paradigm is optimized for compute efficiency. As we look at data efficiency, we'll see very different design

thumb_up_off_alt628

chat_bubble_outline15

repeat68

shareShare

Femke Plantinga

@femke_plantinga

7 months ago

Should you fine-tune your embedding model? (Spoiler: probably not 𝘺𝘦𝘵) 𝘉𝘦𝘧𝘰𝘳𝘦 jumping into fine-tuning, ask yourself: is your retrieval pipeline actually failing because of domain-specific knowledge gaps, or could it be something simpler? Here's what to check first:

thumb_up_off_alt332

chat_bubble_outline11

repeat69

shareShare

Xueguang Ma

@xueguang_ma

7 months ago

Two headache things I had with the existing Search API: 1) Needing an extra crawl API to get full context. 2) not very accurate filtering over date, which leads to search time contamination for time-sensitive data. Perplexity Search API looks like a good replacement.

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

tomaarsen

@tomaarsen

7 months ago

dr. jack morris Search will always be necessary. If you wanted to teach a human to be as smart as possible, you'd teach them how to find information and critical thinking, instead of trying to teach them everything about everything.

thumb_up_off_alt11

chat_bubble_outline1

repeat3

shareShare

Thinking Machines

@thinkymachines

7 months ago

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.

thumb_up_off_alt3,3K

chat_bubble_outline77

repeat533

shareShare

tomaarsen

@tomaarsen

7 months ago

We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵

thumb_up_off_alt136

chat_bubble_outline9

repeat24

shareShare

Weiwei Sun

@sunweiwei12

6 months ago

Context engineering is key to building LLM agents. Can we let agents actively manage their own context? We introduce Context-Folding, giving agents the ability to branch and compress their context. Trained with RL on Search and SWE task, it beats ReAct using 10× less context.

thumb_up_off_alt177

chat_bubble_outline5

repeat28

shareShare

Mixedbread

@mixedbreadai

6 months ago

One More (Small) Thing: Introducing mxbai-colbert-edge-v0 17M and 32M. They are are the result of an easily reproducible way to train ColBERT models from scratch. They're strong, too: the 17M variant would rank first on the LongEmbed leaderboard for models under 1B parameters.

thumb_up_off_alt128

chat_bubble_outline5

repeat23

shareShare

clem 🤗

@clementdelangue

6 months ago

The main breakthrough of GPT-5 was to route your messages between a couple of different models to give you the best, cheapest & fastest answer possible. This is cool but imagine if you could do this not only for a couple of models but hundreds of them, big and small, fast and

thumb_up_off_alt1,1K

chat_bubble_outline120

repeat163

shareShare

Pamela Fox

@pamelafox

6 months ago

At #PyBay25, Guido van Rossum demo'd a Python package for "structured RAG". During ingestion, it uses LLM to extract structured data (entities/topics/verbs) and stores in standard DB, and then retrieves by structuring the user query as well. Try it out at: github.com/microsoft/type…

At #PyBay25, <a href="/gvanrossum/">Guido van Rossum</a> demo'd a Python package for "structured RAG".
During ingestion, it uses LLM to extract structured data (entities/topics/verbs) and stores in standard DB, and then retrieves by structuring the user query as well.
Try it out at:
github.com/microsoft/type…

thumb_up_off_alt372

chat_bubble_outline3

repeat56

shareShare

Qwen

@alibaba_qwen

6 months ago

Qwen Deep Research just got a major upgrade. ⚡️ It now creates not only the report, but also a live webpage 🌐 and a podcast 🎙️ - Powered by Qwen3-Coder, Qwen-Image, and Qwen3-TTS. Your insights, now visual and audible. ✨ 👉 chat.qwen.ai/?inputFeature=…

thumb_up_off_alt1,1K

chat_bubble_outline67

repeat217

shareShare

Wenyue Hua

@huawenyue31539

6 months ago

Hi all, I am hosting a dinner party on 11.5 at EMNLP this year! We've invited a bunch VCs and startup people, also fantastic panelist to talk about embodied AI and LLM agents. All are welcome to attend!!

thumb_up_off_alt44

chat_bubble_outline3

repeat15

shareShare