Thibaud Gloaguen (@tibglo) 's Twitter Profile
Thibaud Gloaguen

@tibglo

ID: 1913135951179907072

calendar_today18-04-2025 07:42:20

8 Tweet

7 Takipçi

45 Takip Edilen

Jasper Dekoninck (@j_dekoninck) 's Twitter Profile Photo

A new open reasoning model, K2-Think, was recently released boasting scores comparable to GPT-OSS 120B and getting a lot of media attention. However, their performance relies on flawed evaluation marked by contamination, unfair comparisons, and misrepresentation of results. 🧵

A new open reasoning model, K2-Think, was recently released boasting scores comparable to GPT-OSS 120B and getting a lot of media attention.

However, their performance relies on flawed evaluation marked by contamination, unfair comparisons, and misrepresentation of results. 🧵
Nikola Jovanović @ ICLR 🇸🇬 (@ni_jovanovic) 's Twitter Profile Photo

New paper: arxiv.org/pdf/2509.15208 All code is in our WMAR repo: github.com/facebookresear… Another fun collab with Pierre Fernandez Tomáš Souček and the rest of the team at FAIR MSL!

kache (@yacinemtb) 's Twitter Profile Photo

when an AI researcher tells you they're "working all day" they're not really working. they just take their laptop to the bathroom to check how their runs are doing at 2 am but its not actually work. just a severe gambling addiction

Nikola Jovanović @ ICLR 🇸🇬 (@ni_jovanovic) 's Twitter Profile Photo

MathArena Update: Claims about Grok 4 Fast seem to check out, it matches the performance of Grok 4 but is much faster and 20-50x cheaper. Good release! This holds across final-answer competitions, Apex problems, and Project Euler. 🧵

MathArena Update: Claims about Grok 4 Fast seem to check out, it matches the performance of Grok 4 but is much faster and 20-50x cheaper. Good release!

This holds across final-answer competitions, Apex problems, and Project Euler. 🧵
INSAIT Institute (@insaitinstitute) 's Twitter Profile Photo

🚀 We are delighted to release MamayLMv1.0 - the first open and efficient multimodal LLM for Ukrainian that can handle both text and visual data! 📊 MamayLMv1.0 outperforms up to 5x larger open models on Ukrainian tests, maintains strong English skills  and surpasses proprietary

🚀 We are delighted to release MamayLMv1.0 - the first open and efficient multimodal LLM for Ukrainian that can handle both text and visual data!

📊 MamayLMv1.0 outperforms up to 5x larger open models on Ukrainian tests, maintains strong English skills  and surpasses proprietary
Hanna Yukhymenko (@a_yukh) 's Twitter Profile Photo

🚀Releasing MamayLM v1.0 🇺🇦 MamayLM can now see! 👀 The new v1.0 version now has visual and enhanced long context capabilities, showcasing even stronger performance on Ukrainian and English languages.

🚀Releasing MamayLM v1.0 🇺🇦

MamayLM can now see! 👀 The new v1.0 version now has visual and enhanced long context capabilities, showcasing even stronger performance on Ukrainian and English languages.
Niels Mündler (@nielstron) 's Twitter Profile Photo

Looking into the latest shots fired by google against openai. overall nice idea to look a bit closer and clean this dataset up.

Looking into the latest shots fired by google against openai.

overall nice idea to look a bit closer and clean this dataset up.
Jasper Dekoninck (@j_dekoninck) 's Twitter Profile Photo

Introducing ChessImageBench: a benchmark for chessboard generation that breaks state-of-the-art AI models. We find that models fail to generate accurate chessboards. Moreover, VLMs like GPT-5 can’t outperform a simple baseline when detecting mistakes in the generated boards. 🧵

Introducing ChessImageBench: a benchmark for chessboard generation that breaks state-of-the-art AI models. We find that models fail to generate accurate chessboards. Moreover, VLMs like GPT-5 can’t outperform a simple baseline when detecting mistakes in the generated boards. 🧵
INSAIT Institute (@insaitinstitute) 's Twitter Profile Photo

🚀⚛️ Major result: we are announcing qblaze – a state-of-the-art quantum simulator, built by researchers at INSAIT and ETH Zurich! 🥇 qblaze sets a record for the largest number factored to date with Shor’s algorithm by a quantum circuits simulator – a 39 bit number (549 755 813

🚀⚛️ Major result: we are announcing qblaze – a state-of-the-art quantum simulator, built by researchers at INSAIT and <a href="/ETH_en/">ETH Zurich</a>!

🥇 qblaze sets a record for the largest number factored to date with Shor’s algorithm by a quantum circuits simulator – a 39 bit number (549 755 813
Kazuki Egashira (@kazukiega) 's Twitter Profile Photo

🚨 Be careful when pruning an LLM! 🚨 Even when the model appears benign, it might start behaving maliciously (e.g., jailbroken) once you download and prune it. Here’s how our attack works 🧵 arxiv.org/abs/2510.07985

🚨 Be careful when pruning an LLM! 🚨

Even when the model appears benign, it might start behaving maliciously (e.g., jailbroken) once you download and prune it.

Here’s how our attack works 🧵

arxiv.org/abs/2510.07985
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Pruning can make a normal looking LLM turn harmful only after users prune it. i.e. pruning itself can trigger hidden backdoors at deployment. Pruning zeros many small weights to save memory and speed, and vLLM makes that step easy for deployments. The attack estimates which

Pruning can make a normal looking LLM turn harmful only after users prune it.

i.e. pruning itself can trigger hidden backdoors at deployment.

Pruning zeros many small weights to save memory and speed, and vLLM makes that step easy for deployments.

The attack estimates which
Kangwook Lee (@kangwook_lee) 's Twitter Profile Photo

DLLMs seem promising... but parallel generation is not always possible Diffusion-based LLMs can generate many tokens at different positions at once, while most autoregressive LLMs generate tokens one by one. This makes diffusion-based LLMs highly attractive when we need fast

DLLMs seem promising... but parallel generation is not always possible

Diffusion-based LLMs can generate many tokens at different positions at once, while most autoregressive LLMs generate tokens one by one.

This makes diffusion-based LLMs highly attractive when we need fast
Thibaud Gloaguen (@tibglo) 's Twitter Profile Photo

I have created a small website to help explain my latest work on watermarking diffusion models. There is also a satisfying Manim animation for visualization 😌 diffusionlm-watermark.ing