UCL DARK (@ucl_dark) 's Twitter Profile
UCL DARK

@ucl_dark

UCL Deciding, Acting, and Reasoning with Knowledge (DARK) Lab at @AI_UCL led by @_rockt, @egrefen, @robertarail, and @jparkerholder.

ID: 1286209302483603457

linkhttp://ucldark.com calendar_today23-07-2020 08:00:16

700 Tweet

3,3K Takipçi

193 Takip Edilen

Davide Paglieri (@paglieridavide) 's Twitter Profile Photo

Excited to be in Singapore for ICLR 2025! 🇸🇬 📷We will present BALROG at the poster session on Saturday, 3:00-5:30 PM, Hall 3, #252 Sneak peak at the poster, including the updated leaderboard with some new models, more on them soon 👀 Bartłomiej Cupiał, Ulyana Piterbarg, Tim Rocktäschel

Excited to be in Singapore for ICLR 2025! 🇸🇬 

📷We will present BALROG at the poster session on Saturday, 3:00-5:30 PM, Hall 3, #252

Sneak peak at the poster, including the updated leaderboard with some new models, more on them soon 👀

<a href="/CupiaBart/">Bartłomiej Cupiał</a>, <a href="/ulyanapiterbarg/">Ulyana Piterbarg</a>, <a href="/_rockt/">Tim Rocktäschel</a>
Jeff Clune (@jeffclune) 's Twitter Profile Photo

Very excited for this keynote by Tim Rocktäschel! Awesome to see open-endedness go from a niche (😉) area to a keynote at #ICLR ! 🌱🌿🌳🌲🍀🌍✨ 📈 🧬🧪 cc Joel Lehman Kenneth Stanley

Very excited for this keynote by <a href="/_rockt/">Tim Rocktäschel</a>! Awesome to see open-endedness go from a niche (😉) area to a keynote at #ICLR ! 🌱🌿🌳🌲🍀🌍✨ 📈 🧬🧪 cc <a href="/joelbot3000/">Joel Lehman</a> <a href="/kenneth0stanley/">Kenneth Stanley</a>
Kenneth Stanley (@kenneth0stanley) 's Twitter Profile Photo

Awesome to see a keynote on open-endedness at #ICLR - way to go Tim Rocktäschel ! You have the right message at the right time and I appreciate the callout in the abstract. I wish I was there to see this. Open-endedness is the next frontier for AI as the benchmark race loses its allure.

Davide Paglieri (@paglieridavide) 's Twitter Profile Photo

🚨 New top scorer on the BALROG LLM leaderboard! DeepSeek R1, evaluated via NVIDIA’s NIM API, takes the lead. Its strong reasoning capabilities help it make solid progress on BALROG’s toughest tasks — the best performance we’ve seen so far! 🧠⚔️

🚨 New top scorer on the BALROG LLM leaderboard!

DeepSeek R1, evaluated via <a href="/nvidia/">NVIDIA</a>’s NIM API, takes the lead.

Its strong reasoning capabilities help it make solid progress on BALROG’s toughest tasks — the best performance we’ve seen so far! 🧠⚔️
Davide Paglieri (@paglieridavide) 's Twitter Profile Photo

Evaluating DeepSeek R1 on BALROG wouldn’t have been possible on our academic budget. Huge thanks to NVIDIA NIM, Karin Sevegnani, and Georg Zitzlsberger for making it happen! We’ve shared more details in this blog post — check it out: developer.nvidia.com/blog/benchmark…

Davide Paglieri (@paglieridavide) 's Twitter Profile Photo

👑 BALROG has a new King! 🥊 Gemini-2.5-Pro enters the ring and dominates its competitors, outperforming the second-best model by ~6%.

👑 BALROG has a new King!

🥊 Gemini-2.5-Pro enters the ring and dominates its competitors, outperforming the second-best model by ~6%.
Davide Paglieri (@paglieridavide) 's Twitter Profile Photo

Not only is Gemini-2.5-Pro currently the top performer, it’s also by far the most cost-effective model in its performance category!

Not only is Gemini-2.5-Pro currently the top performer, it’s also by far the most cost-effective model in its performance category!
UCL DARK (@ucl_dark) 's Twitter Profile Photo

Join UCL DARK's Tim Rocktäschel for a can't-miss keynote diving deep into how Open-Endedness, World Models, and Automated Innovation are going to shape the next wave of AI. 🔥 #ICLR2025 🇸🇬 🕑 Starting at 2PM (in 30 minutes) 📍 Hall 1 Apex

hardmaru (@hardmaru) 's Twitter Profile Photo

Tim Rocktäschel’s keynote talk at #ICLR2025 about Open-Endedness and AI. “Almost no prerequisite to any major invention was invented with that invention in mind.” “Basically almost everybody in my lab at UCL and at DeepMind have read this book: Why Greatness Cannot Be Planned.”

Tim Rocktäschel’s keynote talk at #ICLR2025 about Open-Endedness and AI.

“Almost no prerequisite to any major invention was invented with that invention in mind.”

“Basically almost everybody in my lab at UCL and at DeepMind have read this book: Why Greatness Cannot Be Planned.”
Roberta Raileanu (@robertarail) 's Twitter Profile Photo

Such an inspiring and thought-provoking keynote talk by ⁦⁦⁦Tim Rocktäschel⁩ at #ICLR2025 highlighting a promising path towards automating innovation: combining foundation models with open-endedness methods 🔥

Such an inspiring and thought-provoking keynote talk by ⁦⁦⁦<a href="/_rockt/">Tim Rocktäschel</a>⁩ at #ICLR2025 highlighting a promising path towards automating innovation: combining foundation models with open-endedness methods 🔥
Mikayel Samvelyan (@_samvelyan) 's Twitter Profile Photo

Incredible keynote by Tim Rocktäschel! Some might fear that open-endedness is a disease for AI safety; I say it might be the remedy. Our 🌈Rainbow Teaming work proved these methods are great at finding & fixing LLM vulnerabilities. Hope the safety folks use more of these powerful methods

Incredible keynote by <a href="/_rockt/">Tim Rocktäschel</a>! Some might fear that open-endedness is a disease for AI safety; I say it might be the remedy.
Our 🌈Rainbow Teaming work proved these methods are great at finding &amp; fixing LLM vulnerabilities. Hope the safety folks use more of these powerful methods
Mikayel Samvelyan (@_samvelyan) 's Twitter Profile Photo

What an amazing and thought-provoking talk by Roberta Raileanu. Despite all the recent progress, we’re still far from AI agents making true scientific breakthroughs — and there’s so much important work ahead.

What an amazing and thought-provoking talk by <a href="/robertarail/">Roberta Raileanu</a>. Despite all the recent progress, we’re still far from AI agents making true scientific breakthroughs — and there’s so much important work ahead.
Ishita Mediratta (@ishitamed) 's Twitter Profile Photo

Roberta Raileanu made a strong case today in her talk. No need to worry about AI saturating research — there are still plenty of deep, open problems. This slide captures it well and she shared many interesting directions worth exploring!

<a href="/robertarail/">Roberta Raileanu</a> made a strong case today in her talk. No need to worry about AI saturating research — there are still plenty of deep, open problems. This slide captures it well and she shared many interesting directions worth exploring!
Laura Ruis (@lauraruis) 's Twitter Profile Photo

This got accepted to #ICML2025 as a *spotlight paper* (top 2.6%!) 🚀 --- work that Yi Xu did as an Msc student! Surely this will mark the start of an exceptional academic journey

Tim Rocktäschel (@_rockt) 's Twitter Profile Photo

Our UCL DARK MSc student Yi Xu managed to get his work accepted as a spotlight paper at ICML Conference 2025 (top 2.6% submissions) 🚀 What an amazing success testament to the outstanding supervision by Robert Kirk and Laura Ruis.

Davide Paglieri (@paglieridavide) 's Twitter Profile Photo

Gemini 2.5 Pro completes Pokémon Blue 🤯🔥 But how does it fare in much harder, more unforgiving games? On NetHack, it barely scratches the surface—just 1.7% progression, as tested in BALROG, our new benchmark for agentic LLMs 🗡️ Check it out: balrogai.com