Cyrus Rashtchian (@cyrusrashtchian) 's Twitter Profile
Cyrus Rashtchian

@cyrusrashtchian

Research Scientist @GoogleAI working on Machine Learning, Theory + Practice, Robustness, and beyond (he/him)

ID: 1210312444221935616

linkhttp://www.cyrusrashtchian.com calendar_today26-12-2019 21:32:38

607 Tweet

1,1K Followers

457 Following

Google AI (@googleai) 's Twitter Profile Photo

At #ICLR2025? Stop by the Google booth today at 12PM to learn about Gecko, an interpretable, automatic metric for evaluating the prompt adherence of images & videos. Given an image/video + description, the model determines what is missing from the prompt. arxiv.org/abs/2404.16820

At #ICLR2025? Stop by the Google booth today at 12PM to learn about Gecko, an interpretable, automatic metric for evaluating the prompt adherence of images & videos. Given an image/video + description, the model determines what is missing from the prompt.  arxiv.org/abs/2404.16820
Google AI (@googleai) 's Twitter Profile Photo

Are you trying to decide between pursuing a career in industry or one in academia? Struggling to make a choice? Stop by the #CHI2025 Google booth at 1:10PM JST today, where Orson Xu (Visiting Faculty Researcher) will host an AMA about Industry vs. Academia.

Are you trying to decide between pursuing a career in industry or one in academia? Struggling to make a choice? Stop by the #CHI2025 Google booth at 1:10PM JST today, where Orson Xu (Visiting Faculty Researcher) will host an AMA about Industry vs. Academia.
Vercept (@vercept_ai) 's Twitter Profile Photo

Today we're excited to introduce Vy, our AI that sees and acts on your computer. At Vercept, our mission is to reinvent how humans use computers–enabling you to accomplish orders of magnitude more than what you can do today. Vy is a first glimpse at AI that sees and uses your

Cyrus Rashtchian (@cyrusrashtchian) 's Twitter Profile Photo

As an AC for #ICML2025 I felt like this cycle was particularly difficult for author-reviewer discussions. I prefer the easier to use forum format rather than the new confusing openreview set-up. I also prefer the 1-10 scoring -- I feel like the granularity actually helps a lot!

Niloofar (on faculty job market!) (@niloofar_mire) 's Twitter Profile Photo

📣Thrilled to announce I’ll join Carnegie Mellon University (CMU Engineering & Public Policy & Language Technologies Institute | @CarnegieMellon) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at AI at Meta FAIR in SF, working with Kamalika Chaudhuri’s amazing team on privacy, security, and reasoning in LLMs!

📣Thrilled to announce I’ll join Carnegie Mellon University (<a href="/CMU_EPP/">CMU Engineering & Public Policy</a> &amp; <a href="/LTIatCMU/">Language Technologies Institute | @CarnegieMellon</a>) as an Assistant Professor starting Fall 2026!

Until then, I’ll be a Research Scientist at <a href="/AIatMeta/">AI at Meta</a> FAIR in SF, working with <a href="/kamalikac/">Kamalika Chaudhuri</a>’s amazing team on privacy, security, and reasoning in LLMs!
Yi Tay (@yitayml) 's Twitter Profile Photo

Sharing a pretty interesting story about how two research projects can end up with a very different outcome despite almost similar type of “work” being done, just because of research motivation and taste. Here, I was on the side that got rekt so I thought I would share my lessons

Niklas Muennighoff (@muennighoff) 's Twitter Profile Photo

In 2022, with Yong Zheng-Xin (Yong) & team, we showed that models trained to follow instructions in English can follow instructions in other languages. Our new work below shows that models trained to reason in English can also reason in other languages!

Cyrus Rashtchian (@cyrusrashtchian) 's Twitter Profile Photo

🚨Nice coverage of our Sufficient Context work for RAG systems! 🙌🏼 "This approach makes it possible to determine if an LLM has enough information to answer a query accurately, a critical factor for developers building real-world enterprise applications..." ⬇️

Ben Dickson (@bendee983) 's Twitter Profile Photo

LLM often struggle to figure out if they have enough context to answer a question or abstain. And using RAG can cause further confusion, as irrelevant information can put the model off. I spoke to Cyrus Rashtchian on "sufficient context," a new technique to figure out whether

elvis (@omarsar0) 's Twitter Profile Photo

New Lens on RAG Systems RAG systems are more brittle than you think, even when provided sufficient context. Great work from Google and collaborators. Good tips for devs included. Here are my notes:

New Lens on RAG Systems

RAG systems are more brittle than you think, even when provided sufficient context.

Great work from Google and collaborators.

Good tips for devs included.

Here are my notes:
Tu Vu (@tuvllms) 's Twitter Profile Photo

✨ New paper ✨ 🚨 Scaling test-time compute can lead to inverse or flattened scaling!! We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways: ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s

✨ New paper ✨
🚨 Scaling test-time compute can lead to inverse or flattened scaling!!

We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways:

➡️ Frontier LLMs struggle on Seal-0 (SealQA’s
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Here is my 2 hour long workshop i just finished at the AI Engineer World's fair. This is all you need to know to learn on how to use Gemini 2.5! It is beginner friendly from getting your first API key to multimodality, function calling and MCP. 🆓 Completely free - runs

Here is my 2 hour long workshop i just finished at the <a href="/aiDotEngineer/">AI Engineer</a> World's fair. This is all you need to know to learn on how to use Gemini 2.5! It is beginner friendly from getting your first API key to multimodality, function calling and MCP.

🆓 Completely free - runs
Ryan Marten (@ryanmart3n) 's Twitter Profile Photo

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals.

We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

Introducing Weaver, a test time scaling method for verification! Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny

Introducing Weaver, a test time scaling method for verification! 

Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong  optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny
Google Research (@googleresearch) 's Twitter Profile Photo

SLED is a decoding strategy that uses all of an LLM’s layers, instead of just the last one, to better align the output with the model’s intrinsic knowledge, enhancing model accuracy without the need of external data or additional fine-tuning. Learn more: goo.gle/3K60Cnz

SLED is a decoding strategy that uses all of an LLM’s layers, instead of just the last one, to better align the output with the model’s intrinsic knowledge, enhancing model accuracy without the need of external data or additional fine-tuning. Learn more: goo.gle/3K60Cnz
Cyrus Rashtchian (@cyrusrashtchian) 's Twitter Profile Photo

🚨 New blog post out on how to make LLMs more factual! We put together nice animations to show how our decoding method works under the hood! 🌟SLED leads to >10% improvements, out of the box, even for newer LLMs like Gemma 3 and GPT-OSS Code: jayzhang42.github.io/sled_page/

Yossi Matias (@ymatias) 's Twitter Profile Photo

We need to keep pushing for factual accuracy in LLMs. SLED is a new decoding approach from Google Research that uses all of an LLM's internal layers, instead of just the last one, to better align the output with the model's intrinsic knowledge. This enhances accuracy without