Sangmin Bae (@raymin0223) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

New efficient Hybrid LLMs from @NVIDIA: Nemotron-H! Introducing a family of models combining Mamba-2, Self-Attention & FFNs for 8B, 47B and 56B sizes. • 3x faster and 1.5x smaller 47B model is on par with Qwen-72B and Llama-70B • 1.8x faster Hybrid 8B than transformers

thumb_up_off_alt309

chat_bubble_outline9

repeat90

shareShare

Weijia Shi

@weijiashi2

2 months ago

Our previous work showed that 𝐜𝐫𝐞𝐚𝐭𝐢𝐧𝐠 𝐯𝐢𝐬𝐮𝐚𝐥 𝐜𝐡𝐚𝐢𝐧‑𝐨𝐟‑𝐭𝐡𝐨𝐮𝐠𝐡𝐭𝐬 𝐯𝐢𝐚 𝐭𝐨𝐨𝐥 𝐮𝐬𝐞 significantly boosts GPT‑4o’s visual reasoning performance. Excited to see this idea incorporated into OpenAI’s o3 and o4‑mini models (openai.com/index/thinking…).

thumb_up_off_alt258

chat_bubble_outline3

repeat40

shareShare

Google DeepMind

@googledeepmind

2 months ago

Gemini 2.5 Flash just dropped. ⚡ As a hybrid reasoning model, you can control how much it ‘thinks’ depending on your 💰 - making it ideal for tasks like building chat apps, extracting data and more. Try an early version in Google AI Studio → ai.dev

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat221

shareShare

The AI Timeline

@theaitimeline

2 months ago

🚨This week's top AI/ML research papers: - BitNet b1.58 2B4T Technical Report - Reasoning Models Can Be Effective Without Thinking - ReTool - Sleep-time Compute - Nemotron-H - Kimina-Prover Preview - CLIMB - Dynamic Cheatsheet - How new data permeates LLM knowledge and how to

thumb_up_off_alt606

chat_bubble_outline7

repeat71

shareShare

Aran Komatsuzaki

@arankomatsuzaki

2 months ago

Dynamic Early Exit in Reasoning Models - Allows LLMs to self-truncate CoT sequences by dynamic early exit - Reduces the CoT length by ~35% while improving accuracy by 1% - 10%.

thumb_up_off_alt243

chat_bubble_outline5

repeat37

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

2 months ago

Tina: Tiny Reasoning Models via LoRA "the best Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness

thumb_up_off_alt771

chat_bubble_outline13

repeat141

shareShare

Sangmin Bae

@raymin0223

2 months ago

Sharing our poster ICLR 2026! Please stop by poster if you're interested 😊 Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA 📍 Hall 3 + Hall 2B, Poster #262 🗓️ Saturday, April 26, 10:00 a.m. — 12:30 p.m.

thumb_up_off_alt19

chat_bubble_outline0

repeat4

shareShare

Reza Bayat

@reza_byt

2 months ago

Mohammad, my amazing co-author, will present our work today at ICLR (#304 at 3 p.m.). Don’t miss it, you can learn far more about fundamental research from him than from any paper. I was really fortunate to work with him and to experience a glimpse of the fundamental research

thumb_up_off_alt22

chat_bubble_outline2

repeat4

shareShare

Google DeepMind

@googledeepmind

a month ago

We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO

thumb_up_off_alt4,4K

chat_bubble_outline85

repeat663

shareShare

Aran Komatsuzaki

@arankomatsuzaki

a month ago

dKV-Cache: The Cache for Diffusion Language Models - Achieves from 2-10× speedup in inference, largely narrowing the gap between ARs and DLMs - Can be used for diffusion LMs as well

thumb_up_off_alt213

chat_bubble_outline4

repeat44

shareShare

机器之心 JIQIZHIXIN

@synced_global

a month ago

🔥 State-Space Meets Diffusion: A New Era for Video World Models This paper tackles a core bottleneck in video-based world modeling: long-term memory. While video diffusion models are great at short-term frame prediction, their memory fades fast—especially when modeling long

thumb_up_off_alt425

chat_bubble_outline2

repeat63

shareShare

Ali Behrouz

@behrouz_ali

25 days ago

What makes attention the critical component for most advances in LLMs and what holds back long-term memory modules (RNNs)? Can we strictly generalize Transformers? Presenting Atlas (A powerful Titan): a new architecture with long-term in-context memory that learns how to

thumb_up_off_alt897

chat_bubble_outline23

repeat133

shareShare

Zhihan Yang

@zhihanyangzy

21 days ago

📢Thrilled to share our new paper: Esoteric Language Models (Eso-LMs) > 🔀Fuses autoregressive (AR) and masked diffusion (MDM) paradigms > 🚀First to unlock KV caching for MDMs (65x speedup!) > 🥇Sets new SOTA on generation speed-vs-quality Pareto frontier How? Dive in👇

thumb_up_off_alt276

chat_bubble_outline5

repeat62

shareShare

Han Guo

@hanguo97

19 days ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

Dongmin Park @ iclr25

@dongmin_park11

19 days ago

🚨New Paper Alert As a game company, KRAFTON AI is actively exploring how to apply LLM agents to video games. We present Orak—a foundational video gaming benchmark for LLM agents! Includes Pokémon, StarCraft II, Slay the Spire, Darkest Dungeon, Ace Attorney, and more in🧵

🚨New Paper Alert

As a game company, <a href="/Krafton_AI/">KRAFTON AI</a> is actively exploring how to apply LLM agents to video games.

We present Orak—a foundational video gaming benchmark for LLM agents!

Includes Pokémon, StarCraft II, Slay the Spire, Darkest Dungeon, Ace Attorney, and more in🧵

thumb_up_off_alt71

chat_bubble_outline2

repeat20

shareShare

Tri Dao

@tri_dao

19 days ago

State space models and RNNs compress history into a constant size state, while attn has KV cache scaling linearly in seqlen. We can instead start from RNNs and let the state size grow logarithmically with seqlen. Feels like a sweet spot. Also beautiful connection to classical

thumb_up_off_alt957

chat_bubble_outline11

repeat107

shareShare

AK

@_akhaliq

18 days ago

Nvidia presents Inference-Time Hyper-Scaling with KV Cache Compression

thumb_up_off_alt511

chat_bubble_outline7

repeat58

shareShare

Avi Chawla

@_avichawla

13 days ago

ML researchers just built a new ensemble technique. It even outperforms XGBoost, CatBoost, and LightGBM. Here's a complete breakdown (explained visually):

thumb_up_off_alt2,2K

chat_bubble_outline33

repeat255

shareShare

Sundar Pichai

@sundarpichai

7 days ago

Gemini 2.5 Pro + 2.5 Flash are now stable and generally available. Plus, get a preview of Gemini 2.5 Flash-Lite, our fastest + most cost-efficient 2.5 model yet. 🔦 Exciting steps as we expand our 2.5 series of hybrid reasoning models that deliver amazing performance at the

thumb_up_off_alt4,4K

chat_bubble_outline170

repeat445

shareShare

DailyPapers

@huggingpapers

3 days ago

Discrete Diffusion in Large Language and Multimodal Models: A Survey just released on Hugging Face Get an overview of research in discrete diffusion LLMs and MLLMs, which achieve performance comparable to autoregressive models with up to 10x faster inference!

thumb_up_off_alt379

chat_bubble_outline6

repeat91

shareShare

Sangmin Bae

Gate.io

Pavlo Molchanov

Weijia Shi

Google DeepMind

The AI Timeline

Aran Komatsuzaki

Tanishq Mathew Abraham, Ph.D.

Sangmin Bae

Reza Bayat

Google DeepMind

Aran Komatsuzaki

机器之心 JIQIZHIXIN

Ali Behrouz

Zhihan Yang

Han Guo

Dongmin Park @ iclr25

Tri Dao

AK

Avi Chawla

Sundar Pichai

DailyPapers