Cyrus Rashtchian (@cyrusrashtchian) Twitter Tweets • TwiCopy

Google AI

7 months ago

At #ICLR2025? Stop by the Google booth today at 12PM to learn about Gecko, an interpretable, automatic metric for evaluating the prompt adherence of images & videos. Given an image/video + description, the model determines what is missing from the prompt. arxiv.org/abs/2404.16820

thumb_up_off_alt162

chat_bubble_outline11

repeat46

shareShare

Google AI

@googleai

7 months ago

Are you trying to decide between pursuing a career in industry or one in academia? Struggling to make a choice? Stop by the #CHI2025 Google booth at 1:10PM JST today, where Orson Xu (Visiting Faculty Researcher) will host an AMA about Industry vs. Academia.

thumb_up_off_alt119

chat_bubble_outline8

repeat33

shareShare

Vercept

@vercept_ai

7 months ago

Today we're excited to introduce Vy, our AI that sees and acts on your computer. At Vercept, our mission is to reinvent how humans use computers–enabling you to accomplish orders of magnitude more than what you can do today. Vy is a first glimpse at AI that sees and uses your

thumb_up_off_alt257

chat_bubble_outline24

repeat35

shareShare

Cyrus Rashtchian

@cyrusrashtchian

7 months ago

As an AC for #ICML2025 I felt like this cycle was particularly difficult for author-reviewer discussions. I prefer the easier to use forum format rather than the new confusing openreview set-up. I also prefer the 1-10 scoring -- I feel like the granularity actually helps a lot!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Niloofar (on faculty job market!)

@niloofar_mire

6 months ago

📣Thrilled to announce I’ll join Carnegie Mellon University (CMU Engineering & Public Policy & Language Technologies Institute | @CarnegieMellon) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at AI at Meta FAIR in SF, working with Kamalika Chaudhuri’s amazing team on privacy, security, and reasoning in LLMs!

📣Thrilled to announce I’ll join Carnegie Mellon University (<a href="/CMU_EPP/">CMU Engineering & Public Policy</a> & <a href="/LTIatCMU/">Language Technologies Institute | @CarnegieMellon</a>) as an Assistant Professor starting Fall 2026!

Until then, I’ll be a Research Scientist at <a href="/AIatMeta/">AI at Meta</a> FAIR in SF, working with <a href="/kamalikac/">Kamalika Chaudhuri</a>’s amazing team on privacy, security, and reasoning in LLMs!

thumb_up_off_alt1,1K

chat_bubble_outline212

repeat67

shareShare

Yi Tay

@yitayml

6 months ago

Sharing a pretty interesting story about how two research projects can end up with a very different outcome despite almost similar type of “work” being done, just because of research motivation and taste. Here, I was on the side that got rekt so I thought I would share my lessons

thumb_up_off_alt379

chat_bubble_outline8

repeat30

shareShare

Niklas Muennighoff

@muennighoff

6 months ago

In 2022, with Yong Zheng-Xin (Yong) & team, we showed that models trained to follow instructions in English can follow instructions in other languages. Our new work below shows that models trained to reason in English can also reason in other languages!

thumb_up_off_alt65

chat_bubble_outline2

repeat8

shareShare

Cyrus Rashtchian

@cyrusrashtchian

6 months ago

Excited to work with you!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Cyrus Rashtchian

@cyrusrashtchian

6 months ago

Jacob is a great researcher and mentor! Apply away!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Cyrus Rashtchian

@cyrusrashtchian

6 months ago

🚨Nice coverage of our Sufficient Context work for RAG systems! 🙌🏼 "This approach makes it possible to determine if an LLM has enough information to answer a query accurately, a critical factor for developers building real-world enterprise applications..." ⬇️

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Ben Dickson

@bendee983

6 months ago

LLM often struggle to figure out if they have enough context to answer a question or abstain. And using RAG can cause further confusion, as irrelevant information can put the model off. I spoke to Cyrus Rashtchian on "sufficient context," a new technique to figure out whether

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

elvis

@omarsar0

6 months ago

New Lens on RAG Systems RAG systems are more brittle than you think, even when provided sufficient context. Great work from Google and collaborators. Good tips for devs included. Here are my notes:

thumb_up_off_alt1,1K

chat_bubble_outline33

repeat234

shareShare

Tu Vu

@tuvllms

6 months ago

✨ New paper ✨ 🚨 Scaling test-time compute can lead to inverse or flattened scaling!! We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways: ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s

thumb_up_off_alt142

chat_bubble_outline4

repeat35

shareShare

Philipp Schmid

@_philschmid

6 months ago

Here is my 2 hour long workshop i just finished at the AI Engineer World's fair. This is all you need to know to learn on how to use Gemini 2.5! It is beginner friendly from getting your first API key to multimodality, function calling and MCP. 🆓 Completely free - runs

Here is my 2 hour long workshop i just finished at the <a href="/aiDotEngineer/">AI Engineer</a> World's fair. This is all you need to know to learn on how to use Gemini 2.5! It is beginner friendly from getting your first API key to multimodality, function calling and MCP.

🆓 Completely free - runs

thumb_up_off_alt250

chat_bubble_outline9

repeat33

shareShare

Ryan Marten

@ryanmart3n

5 months ago

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data

thumb_up_off_alt880

chat_bubble_outline27

repeat181

shareShare

Azalia Mirhoseini

@azaliamirh

5 months ago

Introducing Weaver, a test time scaling method for verification! Weaver shrinks the generation-verification gap through a low-overhead weak-to-strong optimization of a mixture of verifiers (e.g., LM judges and reward models). The Weavered mixture can be distilled into a tiny

thumb_up_off_alt169

chat_bubble_outline2

repeat35

shareShare

Google Research

@googleresearch

2 months ago

SLED is a decoding strategy that uses all of an LLM’s layers, instead of just the last one, to better align the output with the model’s intrinsic knowledge, enhancing model accuracy without the need of external data or additional fine-tuning. Learn more: goo.gle/3K60Cnz

thumb_up_off_alt143

chat_bubble_outline6

repeat19

shareShare

Cyrus Rashtchian

@cyrusrashtchian

2 months ago

🚨 New blog post out on how to make LLMs more factual! We put together nice animations to show how our decoding method works under the hood! 🌟SLED leads to >10% improvements, out of the box, even for newer LLMs like Gemma 3 and GPT-OSS Code: jayzhang42.github.io/sled_page/

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Yossi Matias

@ymatias

2 months ago

We need to keep pushing for factual accuracy in LLMs. SLED is a new decoding approach from Google Research that uses all of an LLM's internal layers, instead of just the last one, to better align the output with the model's intrinsic knowledge. This enhances accuracy without

thumb_up_off_alt6

chat_bubble_outline1

repeat2

shareShare