Jiayi Geng (@jiayiigeng) Twitter Tweets • TwiCopy

Jiayi Geng

@jiayiigeng

+ Follow

Incoming PhD @LTIatCMU; MS @princeton_nlp & @PrincetonPLI; Undergrad @mcgillu/@McGillScience | Interested in multi-agent communication.

ID: 1558703867197816832

linkhttps://jiayigeng.github.io/ calendar_today14-08-2022 06:35:55

24 Tweet

211 Takipçi

161 Takip Edilen

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI

thumb_up_off_alt61

chat_bubble_outline15

repeat26

shareShare

CLS

@chengleisi

2 months ago

This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research and produce papers end-to-end, at a level where some of these generated papers were already accepted by top-tier conferences/workshops. Intology’s

thumb_up_off_alt212

chat_bubble_outline13

repeat43

shareShare

Yike Wang

@yikewang_

2 months ago

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

thumb_up_off_alt236

chat_bubble_outline10

repeat53

shareShare

Yijia Shao

@echoshao8899

2 months ago

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵

thumb_up_off_alt280

chat_bubble_outline6

repeat47

shareShare

Anthropic

@anthropicai

2 months ago

New on the Anthropic Engineering blog: how we built Claude’s research capabilities using multiple agents working in parallel. We share what worked, what didn't, and the engineering challenges along the way. anthropic.com/engineering/bu…

thumb_up_off_alt3,3K

chat_bubble_outline99

repeat722

shareShare

Jiayi Geng

@jiayiigeng

2 months ago

I'm thrilled to share that I've moved to Pittsburgh and joined NeuLab at CMU as a research intern this summer, advised by Graham Neubig! I'll also start my PhD Language Technologies Institute | @CarnegieMellon this fall. Feel free to reach out if you're interested in chatting about multi-agent systems, LLMs for scientific

thumb_up_off_alt369

chat_bubble_outline11

repeat13

shareShare

CLS

@chengleisi

a month ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

thumb_up_off_alt553

chat_bubble_outline10

repeat162

shareShare

Graham Neubig

@gneubig

a month ago

What will software development look like in 2026? With coding agents rapidly improving, dev roles may look quite different. My current workflow has changed a lot: - Work in github, not IDEs - Agents in parallel - Write English, not code - More code review Thoughts + a video👇

thumb_up_off_alt119

chat_bubble_outline3

repeat16

shareShare

Xiang Yue@ICLR2025🇸🇬

@xiangyue96

a month ago

People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true? In our study (arxiv.org/pdf/2507.00432), we

thumb_up_off_alt604

chat_bubble_outline14

repeat124

shareShare

Jiayi Geng

@jiayiigeng

23 days ago

🧐Check out our poster 11 am today @ West-320!

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Gaurav Ghosal

@gaurav_ghosal

23 days ago

1/So much of privacy research is designing post-hoc methods to make models mem. free. It’s time we turn that around with architectural changes. Excited to add Memorization Sinks to the transformer architecture this #ICML2025 to isolate memorization during LLM training🧵

thumb_up_off_alt57

chat_bubble_outline1

repeat23

shareShare

Jiayi Geng

@jiayiigeng

23 days ago

In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper: arxiv.org/abs/2410.21333

thumb_up_off_alt83

chat_bubble_outline1

repeat11

shareShare

Jiayi Geng

@jiayiigeng

16 days ago

Check out this cool video (made by Ryan Liu @ ICML, CogSci) for our #icml25 paper, "Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse"🤗

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Jiahao Qiu

@jiahaoqiu99

11 days ago

🚀 Just released: "A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence"! We provide the first comprehensive review of agents capable of self-evolution—highlighting what, when, and how agents evolve, key benchmarks and applications, and future directions

thumb_up_off_alt154

chat_bubble_outline2

repeat39

shareShare