Willie Neiswanger (@willieneis) 's Twitter Profile
Willie Neiswanger

@willieneis

Assistant Professor @USC in CS + AI. Previously @Stanford, @SCSatCMU. Machine Learning, Decision Making, AI-for-Science, Generative AI, Uncertainty, ML Systems.

ID: 25907462

linkhttps://willieneis.github.io calendar_today22-03-2009 23:38:31

175 Tweet

1,1K Followers

244 Following

LLM360 (@llm360) 's Twitter Profile Photo

Please welcome K2-65B🏔️, the most performant fully-open LLM released to date. As a blueprint for open-source AGI, we release all model checkpoints, code, logs, and data. About K2: 🧠65 billion parameters 🪟Fully transparent & reproducible 🔓Apache 2.0 📈Outperforms Llama 2 70B

Colin White (@crwhite_ml) 's Twitter Profile Photo

🚨Llama 3.1 405B eval just dropped🚨 🥇 in instruction following 🥈 in reasoning On par with GPT-4o in math and coding It’s a great day for the open-source community!! Full evals on the challenging, contamination-free benchmark ➡️ livebench.ai

🚨Llama 3.1 405B eval just dropped🚨
🥇 in instruction following
🥈 in reasoning
On par with GPT-4o in math and coding
It’s a great day for the open-source community!!
Full evals on the challenging, contamination-free benchmark ➡️ livebench.ai
LLM360 (@llm360) 's Twitter Profile Photo

✨ Check out our revamped repo! Analysis360: Open Implementations of LLM Analyses 🔗 github.com/LLM360/Analysi… Featuring tutorials on: 💾 Data memorization 🧠 LLM unlearning ⚖️ AI safety, toxicity, & bias 🔍 Mechanistic interpretability 📊 Evaluation metrics

✨ Check out our revamped repo!

Analysis360: Open Implementations of LLM Analyses

🔗 github.com/LLM360/Analysi…

Featuring tutorials on:
💾 Data memorization
🧠 LLM unlearning
⚖️ AI safety, toxicity, & bias
🔍 Mechanistic interpretability
📊 Evaluation metrics
Yisong Yue (@yisongyue) 's Twitter Profile Photo

Quantifying the Value of Information is generally intractable, and prior work uses heuristic approximations that are still quite expensive. We propose PS-BAX, which extends posterior sampling to the Bayesian Algorithm Execution setting: arxiv.org/abs/2410.20596 (appearing at

Quantifying the Value of Information is generally intractable, and prior work uses heuristic approximations that are still quite expensive.

We propose PS-BAX, which extends posterior sampling to the Bayesian Algorithm Execution setting:
arxiv.org/abs/2410.20596
(appearing at
JB (@iamjbdel) 's Twitter Profile Photo

SuperCharged Euclid is on 🤗 Hugging Face Also, this is the best paper heading I’ve seen in quite some time. The 'en tête' looks fantastic. (⚡Llama 3.3) Chat with the paper: huggingface.co/spaces/hugging… 🤗 Model: huggingface.co/euclid-multimo… 🤗 Dataset: huggingface.co/datasets/eucli… 🤗 Paper:

SuperCharged Euclid is on 🤗 Hugging Face

Also, this is the best paper heading I’ve seen in quite some time. The 'en tête' looks fantastic.

(⚡Llama 3.3) Chat with the paper: huggingface.co/spaces/hugging…
🤗 Model: huggingface.co/euclid-multimo…
🤗 Dataset: huggingface.co/datasets/eucli…
🤗 Paper:
Jiarui Zhang (Jerry) (@jiaruiz58876329) 's Twitter Profile Photo

[1/11] Many recent studies have shown that current multimodal LLMs (MLLMs) struggle with low-level visual perception (LLVP) — the ability to precisely describe the fine-grained/geometric details of an image. How can we do better? Introducing Euclid, our first study at improving

[1/11] Many recent studies have shown that current multimodal LLMs (MLLMs) struggle with low-level visual perception (LLVP) — the ability to precisely describe the fine-grained/geometric details of an image.

How can we do better?

Introducing Euclid, our first study at improving
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Tina: Tiny Reasoning Models via LoRA "the best Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness

Tina: Tiny Reasoning Models via LoRA

"the best Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness
Shangshang Wang (@upupwang) 's Twitter Profile Photo

😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA! [1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵

😋 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA!

[1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵
Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Is LoRA (Low Rank Adaptation) relevant in 2025 for reasoning models? I recently read "Tina: Tiny Reasoning Models via LoRA (arxiv.org/abs/2504.15777)", and it made me pause for a moment: when was the last time I heard someone excitedly talk/write about LoRA? LoRA (Low-Rank

Is LoRA (Low Rank Adaptation) relevant in 2025 for reasoning models?

I recently read "Tina: Tiny Reasoning Models via LoRA (arxiv.org/abs/2504.15777)", and it made me pause for a moment: when was the last time I heard someone excitedly talk/write about LoRA?

LoRA (Low-Rank
Deqing Fu (@deqingfu) 's Twitter Profile Photo

Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!

Textual steering vectors can improve visual understanding in multimodal LLMs!

You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. 
And They Steer!
Shangshang Wang (@upupwang) 's Twitter Profile Photo

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency. Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency.

Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.