Nicholas Roberts (@nick11roberts) Twitter Tweets • TwiCopy

Harit Vishwakarma

4 months ago

Excited to be at ICML’25!!! I'll present papers on improving LLM inference and evaluation and pseudolabeling-based semi-supervised learning. Come and say hi during these sessions, or chat anytime during the week! [C1]. Prune 'n Predict: Optimizing LLM Decision-making with

thumb_up_off_alt24

chat_bubble_outline1

repeat11

shareShare

Fred Sala

@fredsala

4 months ago

Heading to #ICML! I’ll be representing SprocketLab at UW–Madison and Snorkel AI. Reach out if you want to chat about data-centric AI, data development, agents, and foundation models.

thumb_up_off_alt39

chat_bubble_outline1

repeat9

shareShare

Harit Vishwakarma

@harit_v

4 months ago

Join us today in the morning poster session at #ICML2025. We will talk about some neat ways for reducing uncertainty and improving LLM accuracy at test-time on multi-choice tasks (e.g., tool selection) using conformal prediction and an additional inference round. 📍 East

thumb_up_off_alt12

chat_bubble_outline0

repeat6

shareShare

Harit Vishwakarma

@harit_v

4 months ago

Next up this morning at #ICML2025, we will be presenting our work on pseudolabeling-based semi-supervised learning (SSL). East Exhibition Hall A&B # E-1304, 11 am to 1:30 pm Paper: openreview.net/pdf?id=w4c5bLk… Pseudolabeling-based SSL relies on the model’s confidence scores and

thumb_up_off_alt15

chat_bubble_outline0

repeat7

shareShare

Tzu-Heng Huang

@zihengh1

4 months ago

LLM judges are powerful for automated evaluation but expensive and biased.📣 Meet PAJAMA, a new framework that distills LLM judging logic into a compact, executable form (a new representation), cutting costs from thousands to just cents.🚀 We'll present at ICML PRAL on Friday!

thumb_up_off_alt28

chat_bubble_outline1

repeat9

shareShare

Fred Sala

@fredsala

4 months ago

“Rubrics” have become a buzzword in AI, but the concept predates the hype. At Snorkel AI, we’re excited to share a fun primer on what rubric‑based evaluation is—and why it’s critical for today’s generative and agentic models.

“Rubrics” have become a buzzword in AI, but the concept predates the hype. At <a href="/SnorkelAI/">Snorkel AI</a>, we’re excited to share a fun primer on what rubric‑based evaluation is—and why it’s critical for today’s generative and agentic models.

thumb_up_off_alt33

chat_bubble_outline1

repeat15

shareShare

jack morris

@jxmnop

3 months ago

first i thought scaling laws originated in OpenAI (2020) then i thought they came from Baidu (2017) now i am enlightened: Scaling Laws were first explored at Bell Labs (1993)

thumb_up_off_alt1,1K

chat_bubble_outline39

repeat97

shareShare

Sharon Zhou

@realsharonzhou

3 months ago

More data ≠ better fine-tuning. Sometimes the 'undertrained' model is more useful because it's still plastic. We need to map the capability vs. adaptability frontier better.

thumb_up_off_alt223

chat_bubble_outline14

repeat19

shareShare

Nicholas Roberts

@nick11roberts

2 months ago

Just read this new paper from Anthropic’s very own Claude called “A Mathematical Theory of Communication” and my brain is broken 🤯

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Nicholas Roberts

@nick11roberts

2 months ago

Excited for Conference on Language Modeling next week! I'm taking meetings on scaling laws, hybrid LLMs (Transformer↔SSM/Mamba), agents 🎓I'm also graduating and open to chatting about future opportunities. Grab a slot! docs.google.com/forms/d/e/1FAI… FYI: Tue 4:30–6:30 I’ll be at my poster #COLM2025

thumb_up_off_alt21

chat_bubble_outline0

repeat14

shareShare

Fred Sala

@fredsala

a month ago

Super excited to present our new work on hybrid architecture models—getting the best of Transformers and SSMs like Mamba—at #COLM2025! Come chat with Nicholas Roberts at poster session 2 on Tuesday. Thread below! (1)

thumb_up_off_alt71

chat_bubble_outline2

repeat25

shareShare

Radical Numerics

@radicalnumerics

a month ago

Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to

thumb_up_off_alt1,1K

chat_bubble_outline103

repeat256

shareShare

Eric Nguyen

@exnx

a month ago

Hello world, we wanted to share an early preview of what we're building at Radical Numerics!

thumb_up_off_alt105

chat_bubble_outline3

repeat11

shareShare

Albert Ge

@albert_ge_95

a month ago

🔭 Towards Extending Open dLLMs to 131k Tokens dLLMs behave differently from AutoRegressive models—they lack attention sinks, making long-context extension tricky. A few simple tweaks go a long way!! ✍️blog albertge.notion.site/longdllm 💻code github.com/lbertge/longdl…

thumb_up_off_alt203

chat_bubble_outline5

repeat49

shareShare

Michael Poli

@michaelpoli6

a month ago

We just released the largest open-source diffusion language model (RND1). RND1 is important to me on a personal level: it symbolizes our commitment to open-source exploration of radically different designs for AI at scale — training objectives, architectures, domains. There is

thumb_up_off_alt330

chat_bubble_outline9

repeat40

shareShare

Radical Numerics

@radicalnumerics

a month ago

Sliding window attention (SWA) is powering frontier hybrid models for efficiency. Is there something better? Introducing Phalanx, a faster and better quality drop-in replacement for sliding window attention (SWA). Phalanx is a new family of hardware and numerics-aware windowed

thumb_up_off_alt197

chat_bubble_outline12

repeat49

shareShare

Fred Sala

@fredsala

a month ago

The coolest trend for AI is shifting from conversation to action—less talking and more doing. This is also a great opportunity for evals: we need benchmarks that measure utility, including in an economic sense. terminalbench is my favorite effort of this type!

thumb_up_off_alt33

chat_bubble_outline1

repeat18

shareShare

Harit Vishwakarma

@harit_v

a month ago

Introducing SnorkelSpatial: A New Benchmark for Evaluating Spatial Reasoning in LLMs Spatial reasoning is everywhere from navigating city maps to understanding molecular interactions. But how well do LLMs handle tasks that require tracking objects moving through space?

thumb_up_off_alt11

chat_bubble_outline1

repeat4

shareShare