Jiayi Geng (@jiayiigeng) 's Twitter Profile
Jiayi Geng

@jiayiigeng

Incoming PhD @LTIatCMU; MS @princeton_nlp & @PrincetonPLI; Undergrad @mcgillu/@McGillScience | Interested in multi-agent communication.

ID: 1558703867197816832

linkhttps://jiayigeng.github.io/ calendar_today14-08-2022 06:35:55

24 Tweet

211 Takipçi

161 Takip Edilen

Jiahao Qiu (@jiahaoqiu99) 's Twitter Profile Photo

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI

The GAIA game is over, and Alita is the final answer.

Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus.

Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI
CLS (@chengleisi) 's Twitter Profile Photo

This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research and produce papers end-to-end, at a level where some of these generated papers were already accepted by top-tier conferences/workshops. Intology’s

Yike Wang (@yikewang_) 's Twitter Profile Photo

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

LLMs are helpful for scientific research — but will they continuously be helpful?

Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).
Yijia Shao (@echoshao8899) 's Twitter Profile Photo

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want.

While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵
Anthropic (@anthropicai) 's Twitter Profile Photo

New on the Anthropic Engineering blog: how we built Claude’s research capabilities using multiple agents working in parallel. We share what worked, what didn't, and the engineering challenges along the way. anthropic.com/engineering/bu…

Jiayi Geng (@jiayiigeng) 's Twitter Profile Photo

I'm thrilled to share that I've moved to Pittsburgh and joined NeuLab at CMU as a research intern this summer, advised by Graham Neubig! I'll also start my PhD Language Technologies Institute | @CarnegieMellon this fall. Feel free to reach out if you're interested in chatting about multi-agent systems, LLMs for scientific

CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Graham Neubig (@gneubig) 's Twitter Profile Photo

What will software development look like in 2026? With coding agents rapidly improving, dev roles may look quite different. My current workflow has changed a lot: - Work in github, not IDEs - Agents in parallel - Write English, not code - More code review Thoughts + a video👇

Xiang Yue@ICLR2025🇸🇬 (@xiangyue96) 's Twitter Profile Photo

People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true? In our study (arxiv.org/pdf/2507.00432), we

People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true?

In our study (arxiv.org/pdf/2507.00432), we
Gaurav Ghosal (@gaurav_ghosal) 's Twitter Profile Photo

1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM training🧵

1/So much of privacy research is designing post-hoc methods to make models mem. free.
It’s time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM training🧵
Jiayi Geng (@jiayiigeng) 's Twitter Profile Photo

In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper: arxiv.org/abs/2410.21333

Jiayi Geng (@jiayiigeng) 's Twitter Profile Photo

Check out this cool video (made by Ryan Liu @ ICML, CogSci) for our #icml25 paper, "Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse"🤗

Jiahao Qiu (@jiahaoqiu99) 's Twitter Profile Photo

🚀 Just released: "A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence"! We provide the first comprehensive review of agents capable of self-evolution—highlighting what, when, and how agents evolve, key benchmarks and applications, and future directions

🚀 Just released: "A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence"!
We provide the first comprehensive review of agents capable of self-evolution—highlighting what, when, and how agents evolve, key benchmarks and applications, and future directions