Nicholas Roberts (@nick11roberts) 's Twitter Profile
Nicholas Roberts

@nick11roberts

Ph.D. student @WisconsinCS. Working on foundation models and breaking past scaling laws. Previously at CMU @mldcmu, UCSD @ucsd_cse, FCC @fresnocity.

ID: 558161705

linkhttp://nick11roberts.science calendar_today19-04-2012 23:55:34

469 Tweet

1,1K Takipçi

1,1K Takip Edilen

Harit Vishwakarma (@harit_v) 's Twitter Profile Photo

Excited to be at ICML’25!!! I'll present papers on improving LLM inference and evaluation and pseudolabeling-based semi-supervised learning. Come and say hi during these sessions, or chat anytime during the week! [C1]. Prune 'n Predict: Optimizing LLM Decision-making with

Excited to be at ICML’25!!! I'll present papers on improving LLM inference and evaluation and pseudolabeling-based semi-supervised learning.

Come and say hi during these sessions, or chat anytime during the week!

[C1]. Prune 'n Predict: Optimizing LLM Decision-making with
Fred Sala (@fredsala) 's Twitter Profile Photo

Heading to #ICML! I’ll be representing SprocketLab at UW–Madison and Snorkel AI. Reach out if you want to chat about data-centric AI, data development, agents, and foundation models.

Harit Vishwakarma (@harit_v) 's Twitter Profile Photo

Join us today in the morning poster session at #ICML2025. We will talk about some neat ways for reducing uncertainty and improving LLM accuracy at test-time on multi-choice tasks (e.g., tool selection) using conformal prediction and an additional inference round. 📍 East

Join us today in the morning poster session at #ICML2025.

We will talk about some neat ways for reducing uncertainty and improving LLM accuracy at test-time on multi-choice tasks (e.g., tool selection) using conformal prediction and an additional inference round.

📍 East
Harit Vishwakarma (@harit_v) 's Twitter Profile Photo

Next up this morning at #ICML2025, we will be presenting our work on pseudolabeling-based semi-supervised learning (SSL). East Exhibition Hall A&B # E-1304, 11 am to 1:30 pm Paper: openreview.net/pdf?id=w4c5bLk… Pseudolabeling-based SSL relies on the model’s confidence scores and

Next up this morning at #ICML2025, we will be presenting our work on pseudolabeling-based semi-supervised learning (SSL).

East Exhibition Hall A&B # E-1304, 11 am to 1:30 pm
Paper: openreview.net/pdf?id=w4c5bLk…

Pseudolabeling-based SSL relies on the model’s confidence scores and
Tzu-Heng Huang (@zihengh1) 's Twitter Profile Photo

LLM judges are powerful for automated evaluation but expensive and biased.📣 Meet PAJAMA, a new framework that distills LLM judging logic into a compact, executable form (a new representation), cutting costs from thousands to just cents.🚀 We'll present at ICML PRAL on Friday!

LLM judges are powerful for automated evaluation but expensive and biased.📣 Meet PAJAMA, a new framework that distills LLM judging logic into a compact, executable form (a new representation), cutting costs from thousands to just cents.🚀 We'll present at ICML PRAL on Friday!
Fred Sala (@fredsala) 's Twitter Profile Photo

“Rubrics” have become a buzzword in AI, but the concept predates the hype. At Snorkel AI, we’re excited to share a fun primer on what rubric‑based evaluation is—and why it’s critical for today’s generative and agentic models.

“Rubrics” have become a buzzword in AI, but the concept predates the hype. At <a href="/SnorkelAI/">Snorkel AI</a>, we’re excited to share a fun primer on what rubric‑based evaluation is—and why it’s critical for today’s generative and agentic models.
jack morris (@jxmnop) 's Twitter Profile Photo

first i thought scaling laws originated in OpenAI (2020) then i thought they came from Baidu (2017) now i am enlightened: Scaling Laws were first explored at Bell Labs (1993)

first i thought scaling laws originated in OpenAI (2020)

then i thought they came from Baidu (2017)

now i am enlightened:
Scaling Laws were first explored at Bell Labs (1993)
Sharon Zhou (@realsharonzhou) 's Twitter Profile Photo

More data ≠ better fine-tuning. Sometimes the 'undertrained' model is more useful because it's still plastic. We need to map the capability vs. adaptability frontier better.

Nicholas Roberts (@nick11roberts) 's Twitter Profile Photo

Just read this new paper from Anthropic’s very own Claude called “A Mathematical Theory of Communication” and my brain is broken 🤯

Just read this new paper from Anthropic’s very own Claude called “A Mathematical Theory of Communication” and my brain is broken 🤯
Nicholas Roberts (@nick11roberts) 's Twitter Profile Photo

Excited for Conference on Language Modeling next week! I'm taking meetings on scaling laws, hybrid LLMs (Transformer↔SSM/Mamba), agents 🎓I'm also graduating and open to chatting about future opportunities. Grab a slot! docs.google.com/forms/d/e/1FAI… FYI: Tue 4:30–6:30 I’ll be at my poster #COLM2025

Fred Sala (@fredsala) 's Twitter Profile Photo

Super excited to present our new work on hybrid architecture models—getting the best of Transformers and SSMs like Mamba—at #COLM2025! Come chat with Nicholas Roberts at poster session 2 on Tuesday. Thread below! (1)

Super excited to present our new work on hybrid architecture models—getting the best of Transformers and SSMs like Mamba—at #COLM2025! Come chat with <a href="/nick11roberts/">Nicholas Roberts</a> at poster session 2 on Tuesday. Thread below! (1)
Radical Numerics (@radicalnumerics) 's Twitter Profile Photo

Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to

Albert Ge (@albert_ge_95) 's Twitter Profile Photo

🔭 Towards Extending Open dLLMs to 131k Tokens dLLMs behave differently from AutoRegressive models—they lack attention sinks, making long-context extension tricky. A few simple tweaks go a long way!! ✍️blog albertge.notion.site/longdllm 💻code github.com/lbertge/longdl…

🔭 Towards Extending Open dLLMs to 131k Tokens
dLLMs behave differently from AutoRegressive models—they lack attention sinks, making long-context extension tricky.
A few simple tweaks go a long way!!
✍️blog albertge.notion.site/longdllm
💻code github.com/lbertge/longdl…
Michael Poli (@michaelpoli6) 's Twitter Profile Photo

We just released the largest open-source diffusion language model (RND1). RND1 is important to me on a personal level: it symbolizes our commitment to open-source exploration of radically different designs for AI at scale — training objectives, architectures, domains. There is

Radical Numerics (@radicalnumerics) 's Twitter Profile Photo

Sliding window attention (SWA) is powering frontier hybrid models for efficiency. Is there something better? Introducing Phalanx, a faster and better quality drop-in replacement for sliding window attention (SWA). Phalanx is a new family of hardware and numerics-aware windowed

Sliding window attention (SWA) is powering frontier hybrid models for efficiency. Is there something better?

Introducing Phalanx, a faster and better quality drop-in replacement for sliding window attention (SWA).

Phalanx is a new family of hardware and numerics-aware windowed
Fred Sala (@fredsala) 's Twitter Profile Photo

The coolest trend for AI is shifting from conversation to action—less talking and more doing. This is also a great opportunity for evals: we need benchmarks that measure utility, including in an economic sense. terminalbench is my favorite effort of this type!

Harit Vishwakarma (@harit_v) 's Twitter Profile Photo

Introducing SnorkelSpatial: A New Benchmark for Evaluating Spatial Reasoning in LLMs Spatial reasoning is everywhere from navigating city maps to understanding molecular interactions. But how well do LLMs handle tasks that require tracking objects moving through space?