UCL DARK (@ucl_dark) Twitter Tweets • TwiCopy

Davide Paglieri

7 months ago

Excited to be in Singapore for ICLR 2025! 🇸🇬 📷We will present BALROG at the poster session on Saturday, 3:00-5:30 PM, Hall 3, #252 Sneak peak at the poster, including the updated leaderboard with some new models, more on them soon 👀 Bartłomiej Cupiał, Ulyana Piterbarg, Tim Rocktäschel

thumb_up_off_alt61

chat_bubble_outline4

repeat12

shareShare

Laura Ruis

@lauraruis

7 months ago

Presenting this today 3-530 at poster #208, come say hi 🙋‍♀️

thumb_up_off_alt66

chat_bubble_outline0

repeat16

shareShare

Jeff Clune

@jeffclune

7 months ago

Very excited for this keynote by Tim Rocktäschel! Awesome to see open-endedness go from a niche (😉) area to a keynote at #ICLR ! 🌱🌿🌳🌲🍀🌍✨ 📈 🧬🧪 cc Joel Lehman Kenneth Stanley

Very excited for this keynote by <a href="/_rockt/">Tim Rocktäschel</a>! Awesome to see open-endedness go from a niche (😉) area to a keynote at #ICLR ! 🌱🌿🌳🌲🍀🌍✨ 📈 🧬🧪 cc <a href="/joelbot3000/">Joel Lehman</a> <a href="/kenneth0stanley/">Kenneth Stanley</a>

thumb_up_off_alt105

chat_bubble_outline1

repeat16

shareShare

Kenneth Stanley

@kenneth0stanley

7 months ago

Awesome to see a keynote on open-endedness at #ICLR - way to go Tim Rocktäschel ! You have the right message at the right time and I appreciate the callout in the abstract. I wish I was there to see this. Open-endedness is the next frontier for AI as the benchmark race loses its allure.

thumb_up_off_alt72

chat_bubble_outline4

repeat15

shareShare

Davide Paglieri

@paglieridavide

7 months ago

🚨 New top scorer on the BALROG LLM leaderboard! DeepSeek R1, evaluated via NVIDIA’s NIM API, takes the lead. Its strong reasoning capabilities help it make solid progress on BALROG’s toughest tasks — the best performance we’ve seen so far! 🧠⚔️

🚨 New top scorer on the BALROG LLM leaderboard!

DeepSeek R1, evaluated via <a href="/nvidia/">NVIDIA</a>’s NIM API, takes the lead.

Its strong reasoning capabilities help it make solid progress on BALROG’s toughest tasks — the best performance we’ve seen so far! 🧠⚔️

thumb_up_off_alt17

chat_bubble_outline2

repeat7

shareShare

Davide Paglieri

@paglieridavide

7 months ago

Evaluating DeepSeek R1 on BALROG wouldn’t have been possible on our academic budget. Huge thanks to NVIDIA NIM, Karin Sevegnani, and Georg Zitzlsberger for making it happen! We’ve shared more details in this blog post — check it out: developer.nvidia.com/blog/benchmark…

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Davide Paglieri

@paglieridavide

7 months ago

👑 BALROG has a new King! 🥊 Gemini-2.5-Pro enters the ring and dominates its competitors, outperforming the second-best model by ~6%.

thumb_up_off_alt46

chat_bubble_outline1

repeat6

shareShare

Davide Paglieri

@paglieridavide

7 months ago

Not only is Gemini-2.5-Pro currently the top performer, it’s also by far the most cost-effective model in its performance category!

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

UCL DARK

@ucl_dark

7 months ago

Join UCL DARK's Tim Rocktäschel for a can't-miss keynote diving deep into how Open-Endedness, World Models, and Automated Innovation are going to shape the next wave of AI. 🔥 #ICLR2025 🇸🇬 🕑 Starting at 2PM (in 30 minutes) 📍 Hall 1 Apex

thumb_up_off_alt30

chat_bubble_outline0

repeat4

shareShare

hardmaru

@hardmaru

7 months ago

Tim Rocktäschel’s keynote talk at #ICLR2025 about Open-Endedness and AI. “Almost no prerequisite to any major invention was invented with that invention in mind.” “Basically almost everybody in my lab at UCL and at DeepMind have read this book: Why Greatness Cannot Be Planned.”

thumb_up_off_alt432

chat_bubble_outline8

repeat69

shareShare

Katja Hofmann

@katjahofmann

7 months ago

Tim Rocktäschel - congratulations on such an exciting ICLR 2026 keynote! A case for world models as step to open ended agent learning, including robotics

<a href="/_rockt/">Tim Rocktäschel</a> - congratulations on such an exciting <a href="/iclr_conf/">ICLR 2026</a> keynote! A case for world models as step to open ended agent learning, including robotics

thumb_up_off_alt16

chat_bubble_outline1

repeat1

shareShare

Roberta Raileanu

@robertarail

7 months ago

Such an inspiring and thought-provoking keynote talk by ⁦⁦⁦Tim Rocktäschel⁩ at #ICLR2025 highlighting a promising path towards automating innovation: combining foundation models with open-endedness methods 🔥

Such an inspiring and thought-provoking keynote talk by ⁦⁦⁦<a href="/_rockt/">Tim Rocktäschel</a>⁩ at #ICLR2025 highlighting a promising path towards automating innovation: combining foundation models with open-endedness methods 🔥

thumb_up_off_alt80

chat_bubble_outline2

repeat6

shareShare

Mikayel Samvelyan

@_samvelyan

7 months ago

Incredible keynote by Tim Rocktäschel! Some might fear that open-endedness is a disease for AI safety; I say it might be the remedy. Our 🌈Rainbow Teaming work proved these methods are great at finding & fixing LLM vulnerabilities. Hope the safety folks use more of these powerful methods

Incredible keynote by <a href="/_rockt/">Tim Rocktäschel</a>! Some might fear that open-endedness is a disease for AI safety; I say it might be the remedy.
Our 🌈Rainbow Teaming work proved these methods are great at finding & fixing LLM vulnerabilities. Hope the safety folks use more of these powerful methods

thumb_up_off_alt53

chat_bubble_outline0

repeat8

shareShare

Jack Parker-Holder

@jparkerholder

7 months ago

Excited to share our progress in scaling foundation world models with a packed room at ICLR 2026

thumb_up_off_alt59

chat_bubble_outline0

repeat3

shareShare

Mikayel Samvelyan

@_samvelyan

7 months ago

What an amazing and thought-provoking talk by Roberta Raileanu. Despite all the recent progress, we’re still far from AI agents making true scientific breakthroughs — and there’s so much important work ahead.

What an amazing and thought-provoking talk by <a href="/robertarail/">Roberta Raileanu</a>. Despite all the recent progress, we’re still far from AI agents making true scientific breakthroughs — and there’s so much important work ahead.

thumb_up_off_alt54

chat_bubble_outline2

repeat11

shareShare

Ishita Mediratta

@ishitamed

7 months ago

Roberta Raileanu made a strong case today in her talk. No need to worry about AI saturating research — there are still plenty of deep, open problems. This slide captures it well and she shared many interesting directions worth exploring!

<a href="/robertarail/">Roberta Raileanu</a> made a strong case today in her talk. No need to worry about AI saturating research — there are still plenty of deep, open problems. This slide captures it well and she shared many interesting directions worth exploring!

thumb_up_off_alt23

chat_bubble_outline0

repeat2

shareShare

Roberta Raileanu

@robertarail

7 months ago

Indeed, plenty of open research problems to work on, despite the hype of AI automating all AI research.

thumb_up_off_alt44

chat_bubble_outline2

repeat4

shareShare

Laura Ruis

@lauraruis

7 months ago

This got accepted to #ICML2025 as a *spotlight paper* (top 2.6%!) 🚀 --- work that Yi Xu did as an Msc student! Surely this will mark the start of an exceptional academic journey

thumb_up_off_alt102

chat_bubble_outline1

repeat8

shareShare

Tim Rocktäschel

@_rockt

7 months ago

Our UCL DARK MSc student Yi Xu managed to get his work accepted as a spotlight paper at ICML Conference 2025 (top 2.6% submissions) 🚀 What an amazing success testament to the outstanding supervision by Robert Kirk and Laura Ruis.

thumb_up_off_alt67

chat_bubble_outline1

repeat6

shareShare

Davide Paglieri

@paglieridavide

7 months ago

Gemini 2.5 Pro completes Pokémon Blue 🤯🔥 But how does it fare in much harder, more unforgiving games? On NetHack, it barely scratches the surface—just 1.7% progression, as tested in BALROG, our new benchmark for agentic LLMs 🗡️ Check it out: balrogai.com

thumb_up_off_alt22

chat_bubble_outline1

repeat4

shareShare