Antonis Antoniades (@anton_iades) 's Twitter Profile
Antonis Antoniades

@anton_iades

CS PhD student @ucsbNLP teaching machines to think like humans and humans to think like machines. Prev. @UCSBPhysics Guitar/Bouzouki. Cyprus/CA ๐Ÿง ๐Ÿค–๐ŸŽธ๐ŸŒŠ

ID: 21491925

linkhttps://a-antoniades.github.io calendar_today21-02-2009 15:24:04

1,1K Tweet

524 Followers

953 Following

Antonis Antoniades (@anton_iades) 's Twitter Profile Photo

This paper is great. It shows why normal benchmarks cannot effectively convey the "goodness" of a certain model. I wondered if encouraging certain "words" may boost the performance of these reasoning chains. Language expressions are a proxy for the behaviors they investigate.

Antonis Antoniades (@anton_iades) 's Twitter Profile Photo

My hunch is that although the performance difference between base GPT-4 and 4.5 is (seemingly) small, it may actually lead to a quite significant difference with post-training.

Antonis Antoniades (@anton_iades) 's Twitter Profile Photo

if you see someone running a process on the GPUs for like 7+ days, they either know exactly what they're doing, or have no clue at all... (speaking from experience ๐Ÿ˜‚)

Alfonso Amayuelas (@alfonamayuelas) 's Twitter Profile Photo

๐Ÿ“œ๐Ÿšจ Check out our latest work on "Self-Resource Allocation in Multi-Agent LLM Systems" where we explore how LLMs can be used to optimize task allocation in multi-agent systems ๐Ÿค– ๐Ÿงต(1/3)

๐Ÿ“œ๐Ÿšจ Check out our latest work on "Self-Resource Allocation in Multi-Agent LLM Systems" where we explore how LLMs can be used to optimize task allocation in multi-agent systems ๐Ÿค–
๐Ÿงต(1/3)
Antonis Antoniades (@anton_iades) 's Twitter Profile Photo

If you're at ICLR you may want to grab the opportunity to talk with my incredible co-authors Kexun Zhang and Yuxi XIE on everything Search + Agent related at our SWE-Search poster, on Thursday at 3:00pm, Hall 3 + Hall 2B #156. ๐Ÿ˜ iclr.cc/virtual/2025/pโ€ฆ

If you're at ICLR you may want to grab the opportunity to talk with my incredible co-authors <a href="/kexun_zhang/">Kexun Zhang</a> and <a href="/sigrid_xie/">Yuxi XIE</a> on everything Search + Agent related at our SWE-Search poster, on Thursday at 3:00pm, Hall 3 + Hall 2B #156. ๐Ÿ˜ iclr.cc/virtual/2025/pโ€ฆ
Alfonso Amayuelas (@alfonamayuelas) 's Twitter Profile Photo

New paper ๐Ÿšจ๐Ÿ“œ๐Ÿš€ Introducing โ€œAgents of Change: Self-Evolving LLM Agents for Strategic Planningโ€! In this work, we show how LLM-powered agents can rewrite their own prompts & code to climb the learning curve in the board game Settlers of Catan ๐ŸŽฒ ๐Ÿงต๐Ÿ‘‡

New paper ๐Ÿšจ๐Ÿ“œ๐Ÿš€
Introducing โ€œAgents of Change: Self-Evolving LLM Agents for Strategic Planningโ€!
In this work, we show how LLM-powered agents  can rewrite their own prompts &amp; code to climb the learning curve in the board game Settlers of Catan ๐ŸŽฒ
๐Ÿงต๐Ÿ‘‡
Antonis Antoniades (@anton_iades) 's Twitter Profile Photo

The thing people who dismiss LLMs donโ€™t get is that even if theyโ€™re not the end game, theyโ€™ll be key to getting us there.

Antonis Antoniades (@anton_iades) 's Twitter Profile Photo

Great work on Searching in complex environments. The authors identified many of the problems we faced in SWE-Search: agent scaffolding, environment reliability, and selecting the correct final solution (in SWE-Search we addressed the latter using multi-agent debate verifier).

Antonis Antoniades (@anton_iades) 's Twitter Profile Photo

Elon's "search for the ultimate truth" paradigm for AI is kind of genius tbh. It's also very connected to RLVR training, where only the outcome matters, getting the "true" answer. :)