Yonatan Belinkov (@boknilev) 's Twitter Profile
Yonatan Belinkov

@boknilev

Assistant professor of computer science @TechnionLive. #NLProc

ID: 554869994

linkhttp://www.cs.technion.ac.il/~belinkov calendar_today16-04-2012 04:54:07

2,2K Tweet

4,4K Takipçi

1,1K Takip Edilen

XLLM-Reason-Plan (@xllmreasonplan) 's Twitter Profile Photo

⏰ Only 9 days away! Join us at Conference on Language Modeling on October 10 for the first workshop on the application of LLM explainability to reasoning and planning. Featuring: 📑 20 poster presentations 🎤 9 distinguished speakers View our schedule at tinyurl.com/xllm-workshop.

⏰ Only 9 days away!
Join us at <a href="/COLM_conf/">Conference on Language Modeling</a> on October 10 for the first workshop on the application of LLM explainability to reasoning and planning. 
Featuring:
📑 20 poster presentations
🎤 9 distinguished speakers 
View our schedule at tinyurl.com/xllm-workshop.
Itay Itzhak (@itay_itzhak_) 's Twitter Profile Photo

Happening tomorrow! CoLM 2025 spotlight oral at 10:00 + poster at 11:00 🎤🧠 We’ll dive into cognitive biases in LLMs and what finetuning hides. The talk’s good, promise 🙂 See ya tomorrowmorning! #CoLM2025

Aaron Mueller (@amuuueller) 's Twitter Profile Photo

I'll be in Montréal this Friday to speak at #COLM2025's INTERPLAY workshop! As a Québecophile, I have many recommendations for those of you in town the whole week: 🧵

AI21 Labs (@ai21labs) 's Twitter Profile Photo

1/5 Releasing Jamba Reasoning 3B under Apache 2.0: Hybrid SSM-Transformer architecture that tops accuracy & speed across record context lengths. e.g. 3-5X faster than Llama 3.2 3B and Qwen3 4B at 32K tokens.

1/5 Releasing Jamba Reasoning 3B under Apache 2.0: Hybrid SSM-Transformer architecture that tops accuracy &amp; speed across record context lengths. e.g. 3-5X faster than Llama 3.2 3B and Qwen3 4B at 32K tokens.
Adi Simhi (@adisimhi) 's Twitter Profile Photo

🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? 🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵

🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? 
🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
AI21 Labs (@ai21labs) 's Twitter Profile Photo

🧠 Jamba Reasoning 3B leads tiny reasoning models (Artificial Analysis). 🥇 #1 on #IFBench (52%) for instruction following 📈 21 on the Artificial Analysis Intelligence Index 👉Charts by Artificial Analysis: artificialanalysis.ai/models/open-so…

🧠 Jamba Reasoning 3B leads tiny reasoning models (Artificial Analysis).
🥇 #1 on #IFBench (52%) for instruction following
 📈 21 on the <a href="/ArtificialAnlys/">Artificial Analysis</a> Intelligence Index

👉Charts by <a href="/ArtificialAnlys/">Artificial Analysis</a>: artificialanalysis.ai/models/open-so…
NDIF (@ndif_team) 's Twitter Profile Photo

Ever wished you could explore what's happening inside a 405B parameter model without writing any code? Workbench, our AI interpretability interface, is now live for public beta at workbench.ndif.us!

David Alvarez Melis (@elmelis) 's Twitter Profile Photo

📄 New preprint alert: We study 🪃Boomerang Distillation🪃, a surprising phenomenon that allows generating a family of pre-trained LLMs of intermediate sizes from a single teacher–student pair — 𝐧𝐨 𝐞𝐱𝐭𝐫𝐚 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐫𝐞𝐪𝐮𝐢𝐫𝐞𝐝! 🧵👇

Niels Rogge (@nielsrogge) 's Twitter Profile Photo

For people thinking that DeepSeek-OCR is the first model to render text as images, the University of Copenhagen already did this in 2023 Paper is called "Language Modelling with Pixels". They trained a Masked AutoEncoder (MAE) by rendering text as images and masking patches

For people thinking that DeepSeek-OCR is the first model to render text as images, the University of Copenhagen already did this in 2023

Paper is called "Language Modelling with Pixels". They trained a Masked AutoEncoder (MAE) by rendering text as images and masking patches
Peter Hase (@peterbhase) 's Twitter Profile Photo

I would encourage technical AI types to consider working in grantmaking! Schmidt Sciences is hiring for a unique position where you get to continue your own research at the same time Link: jobs.lever.co/schmidt-entiti…

Johnny Tian-Zheng Wei (@johntzwei) 's Twitter Profile Photo

Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵

Announcing 🔭✨Hubble, a suite of open-source LLMs to advance the study of memorization! 

Pretrained models up to 8B params, with controlled insertion of texts (e.g., book passages, biographies, test sets, and more!) designed to emulate key memorization risks 🧵
Mor Ventura (@mor_ventura95) 's Twitter Profile Photo

Wrapping up: ✅ DeLeaker: inference-time semantic leakage mitigation method ✅ SLIM: first dedicated dataset for semantic leakage ✅ Eval Framework: comparative dedicated evaluation framework 📄 Paper: arxiv.org/abs/2510.15015 🌐 Project Page: venturamor.github.io/DeLeaker/

Yoav Artzi (@yoavartzi) 's Twitter Profile Photo

.Cornell University is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/jobs/30971

.<a href="/Cornell/">Cornell University</a>  is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca.

Deadline for full consideration is Nov 20, 2025!
academicjobsonline.org/ajo/jobs/30971
Adi Simhi (@adisimhi) 's Twitter Profile Photo

LLMs can hallucinate due to different reasons: ❌They don't know (lack of knowledge) ❌ They "know" but are uncertain ❌They "know" and are certain New Extended version of our paper that combines our understanding of hallucination on the knowledge and certainty axis is out🧵

LLMs can hallucinate due to different reasons: 
❌They don't know (lack of knowledge) 
❌ They "know" but are uncertain 
❌They "know" and are certain
New Extended version of our paper that combines our understanding of hallucination on the knowledge and certainty axis is out🧵
Yonatan Belinkov (@boknilev) 's Twitter Profile Photo

Q: which of these can be checked by an LLM as well as an overly loaded human reviewer? Appropriateness Formatting Length Anonymity Limitations Responsible Checklist Potential Violation Justification Need Ethics Review Ethics Review Justification aclrollingreview.org/reviewerguidel…