Thibaud Gloaguen (@tibglo) Twitter Tweets • TwiCopy

Jasper Dekoninck

3 months ago

A new open reasoning model, K2-Think, was recently released boasting scores comparable to GPT-OSS 120B and getting a lot of media attention. However, their performance relies on flawed evaluation marked by contamination, unfair comparisons, and misrepresentation of results. 🧵

thumb_up_off_alt329

chat_bubble_outline20

repeat58

shareShare

Nikola Jovanović @ ICLR 🇸🇬

@ni_jovanovic

3 months ago

New paper: arxiv.org/pdf/2509.15208 All code is in our WMAR repo: github.com/facebookresear… Another fun collab with Pierre Fernandez Tomáš Souček and the rest of the team at FAIR MSL!

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare

kache

@yacinemtb

3 months ago

when an AI researcher tells you they're "working all day" they're not really working. they just take their laptop to the bathroom to check how their runs are doing at 2 am but its not actually work. just a severe gambling addiction

thumb_up_off_alt872

chat_bubble_outline29

repeat36

shareShare

Nikola Jovanović @ ICLR 🇸🇬

@ni_jovanovic

3 months ago

MathArena Update: Claims about Grok 4 Fast seem to check out, it matches the performance of Grok 4 but is much faster and 20-50x cheaper. Good release! This holds across final-answer competitions, Apex problems, and Project Euler. 🧵

thumb_up_off_alt655

chat_bubble_outline40

repeat94

shareShare

INSAIT Institute

@insaitinstitute

3 months ago

🚀 We are delighted to release MamayLMv1.0 - the first open and efficient multimodal LLM for Ukrainian that can handle both text and visual data! 📊 MamayLMv1.0 outperforms up to 5x larger open models on Ukrainian tests, maintains strong English skills and surpasses proprietary

thumb_up_off_alt9

chat_bubble_outline1

repeat4

shareShare

Hanna Yukhymenko

@a_yukh

3 months ago

🚀Releasing MamayLM v1.0 🇺🇦 MamayLM can now see! 👀 The new v1.0 version now has visual and enhanced long context capabilities, showcasing even stronger performance on Ukrainian and English languages.

thumb_up_off_alt25

chat_bubble_outline1

repeat11

shareShare

Niels Mündler

@nielstron

2 months ago

Looking into the latest shots fired by google against openai. overall nice idea to look a bit closer and clean this dataset up.

thumb_up_off_alt3

chat_bubble_outline2

repeat1

shareShare

Jasper Dekoninck

@j_dekoninck

2 months ago

Introducing ChessImageBench: a benchmark for chessboard generation that breaks state-of-the-art AI models. We find that models fail to generate accurate chessboards. Moreover, VLMs like GPT-5 can’t outperform a simple baseline when detecting mistakes in the generated boards. 🧵

thumb_up_off_alt8

chat_bubble_outline2

repeat1

shareShare

INSAIT Institute

@insaitinstitute

2 months ago

🚀⚛️ Major result: we are announcing qblaze – a state-of-the-art quantum simulator, built by researchers at INSAIT and ETH Zurich! 🥇 qblaze sets a record for the largest number factored to date with Shor’s algorithm by a quantum circuits simulator – a 39 bit number (549 755 813

🚀⚛️ Major result: we are announcing qblaze – a state-of-the-art quantum simulator, built by researchers at INSAIT and <a href="/ETH_en/">ETH Zurich</a>!

🥇 qblaze sets a record for the largest number factored to date with Shor’s algorithm by a quantum circuits simulator – a 39 bit number (549 755 813

thumb_up_off_alt8

chat_bubble_outline0

repeat6

shareShare

Kazuki Egashira

@kazukiega

2 months ago

🚨 Be careful when pruning an LLM! 🚨 Even when the model appears benign, it might start behaving maliciously (e.g., jailbroken) once you download and prune it. Here’s how our attack works 🧵 arxiv.org/abs/2510.07985

thumb_up_off_alt18

chat_bubble_outline1

repeat14

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

Pruning can make a normal looking LLM turn harmful only after users prune it. i.e. pruning itself can trigger hidden backdoors at deployment. Pruning zeros many small weights to save memory and speed, and vLLM makes that step easy for deployments. The attack estimates which

thumb_up_off_alt20

chat_bubble_outline1

repeat7

shareShare

Kangwook Lee

@kangwook_lee

2 months ago

DLLMs seem promising... but parallel generation is not always possible Diffusion-based LLMs can generate many tokens at different positions at once, while most autoregressive LLMs generate tokens one by one. This makes diffusion-based LLMs highly attractive when we need fast

thumb_up_off_alt331

chat_bubble_outline12

repeat48

shareShare

Thibaud Gloaguen

@tibglo

2 months ago

I have created a small website to help explain my latest work on watermarking diffusion models. There is also a satisfying Manim animation for visualization 😌 diffusionlm-watermark.ing

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare