huskydoge (@huskydogewoof) Twitter Tweets • TwiCopy

huskydoge

2 months ago

My respect for the robotics community has just reached a new level. - My main quest this year: understand, optimize model arch and algorithms better, and learn to write kernels. - Side quest: learn robotics from any robo researcher I knew!

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Shengqu Cai

@prime_cai

2 months ago

I agree with every single point mentioned in this blog. Let me share my own story on this too: I started my research career working on 3D/graphics in 2022-ish, back then it was the peak of NeRFs, then 3D generation (with GANs), then score distillation with T2I DMs, like

thumb_up_off_alt86

chat_bubble_outline3

repeat10

shareShare

Guanya Shi

@guanyashi

2 months ago

Want to reproduce the cool "wall flip" motion? Check out our previous OmniRetarget project (released in Sep 2025)! omniretarget.github.io

thumb_up_off_alt82

chat_bubble_outline0

repeat9

shareShare

Hao Kang

@gt_haokang

2 months ago

🔥Modifying 2 lines of code and get your agentic serving/rollout up to 3.9x faster losslessly! ⚡️Say hello to ThunderAgent, a fast, simple, and program-aware agentic Inference System. 🥇 We propose a program abstraction to schedule all GPU and CPU resources, the first

thumb_up_off_alt101

chat_bubble_outline3

repeat23

shareShare

Sophia Xing

@swolphiax

2 months ago

if life were a game

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

huskydoge

@huskydogewoof

2 months ago

Simplicity never goes out of fashion. Cool work by Xingjian Bai Guande He Xun Huang and others! This work reminds me of some observations from follow up explorations of the video world model Pandora: arxiv.org/abs/2406.09455. - Pandora uses an (almost) causal Transformer

Simplicity never goes out of fashion. Cool work by <a href="/SimulatedAnneal/">Xingjian Bai</a> <a href="/guande_he/">Guande He</a> <a href="/xxunhuang/">Xun Huang</a> and others!

This work reminds me of some observations from follow up explorations of the video world model Pandora: arxiv.org/abs/2406.09455.
- Pandora uses an (almost) causal Transformer

thumb_up_off_alt32

chat_bubble_outline1

repeat3

shareShare

Nicholas Boffi

@nmboffi

2 months ago

We just brought flow maps to language modeling for one-step sequence generation 💥 Discrete diffusion is not necessary -- continuous flows over one-hot encodings achieve SoTA performance and ≥8.3× faster generation 🔥 We believe this is a major step forward for discrete

thumb_up_off_alt249

chat_bubble_outline3

repeat45

shareShare

Zhengyang Geng

@zhengyanggeng

2 months ago

I don't want to say Matrix Decomposition is all you need for attention, but at least it is what you need lol. Take a step back to the 80s/90s. Classic Hebbian learning, Oja's rule, and Sanger's rule, as computational models for synaptic plasticity, already proved that NNs

thumb_up_off_alt623

chat_bubble_outline8

repeat39

shareShare

Zhengyang Geng

@zhengyanggeng

2 months ago

Video diffusion Transformers tend to split causal semantics (early layers) and rendering (later layers). Making this separation explicit, semantic generator + renderer, has clear practical benefits. Very nice paper with systematic probing and analysis! If the renderer is an MF

thumb_up_off_alt43

chat_bubble_outline0

repeat5

shareShare

Zhijian Liu

@zhijianliu_

2 months ago

Reasoning LLMs generate very long chains-of-thought, so even small quantization errors add up. With AWQ, Qwen3-4B drops 71.0 → 68.2 on MMLU-Pro (~4% relative loss). 😬 ParoQuant fixes this! It keeps only the critical rotation pairs and fuses everything into a single kernel.

thumb_up_off_alt1,1K

chat_bubble_outline27

repeat141

shareShare

Hokin Deng

@denghokin

2 months ago

#VideoReason We are open-sourcing the entire VBVR stack to speed-up the arrival of video reasoning as the next fundamental paradigm of intelligence - 150+ synthetic generators - 1 million training clips - Cloud-scale data factory - Unified EvalKit - 100 rule-based evaluators -

thumb_up_off_alt216

chat_bubble_outline18

repeat64

shareShare

YixuanEvenXu

@yixuanevenxu

2 months ago

Recent debates highlight a key issue: how do you actually prove distillation? If you want to claim a model was distilled from your outputs, scientifically and with rigorous statistical guarantees, you should consider Antidistillation Fingerprinting (ADFP). 👇

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Ziran Yang

@__zrrr__

2 months ago

We released Goedel-Prover-V2, a state-of-the-art model for formal theorem proving at launch. Remarkably, it has remained at the top of the open-source formal theorem proving leaderboard for over six months. We have been excited to see so many folks cooking with our models.

thumb_up_off_alt197

chat_bubble_outline3

repeat38

shareShare

Yu Zhang 🐳🙇

@yzhang_cs

2 months ago

Arshia Afzal Kimi.ai Songlin Yang This is a known issue. The init logic is handled in modeling_kda cuz of dtensors so these lines do not take effect in fact

thumb_up_off_alt17

chat_bubble_outline1

repeat1

shareShare

Albert Gu

@_albertgu

2 months ago

okay this plot and discussion has blown up more than expected so let me try to leave some candid thoughts 1. i don't believe that the intent of Mayank's tweet was to claim "Mamba-2 > GDN". the primary intenet was to convey that the initialization for Mamba-2 makes a huge

thumb_up_off_alt213

chat_bubble_outline4

repeat19

shareShare

Albert Gu

@_albertgu

2 months ago

> an example of this is that in hybrid models, sometimes "stronger" linear layers can lead to overall weaker models because it incentivizes the global attention to be "lazy" some people asked about this. i think this is a somewhat folklore result that I don't have a reference

thumb_up_off_alt63

chat_bubble_outline0

repeat13

shareShare

Lianhui Qin

@lianhuiq

2 months ago

Jixuan and the team demo’d 🦐OpenClaw agents living and operating in our SimWorld, launched in minutes. 🚀🤖 🔥Our mission: make embodied agent frameworks easy for anyone to run, observe, and customize in a realistic virtual world.

thumb_up_off_alt404

chat_bubble_outline11

repeat72

shareShare

Yu Zhang 🐳🙇

@yzhang_cs

2 months ago

Behind every successful LLM is a highly organized, intellectually dense lab led by someone who keeps things open and equal. I’ve always valued this kind of environment, espc in the open-source community, and I feel so sad to see this happening now.

thumb_up_off_alt189

chat_bubble_outline2

repeat10

shareShare