huskydoge (@huskydogewoof) 's Twitter Profile
huskydoge

@huskydogewoof

Undergraduate in IEEE-CS at SJTU. Actively seeking PhD opportunities for 2025 and summer research intern

ID: 1587981954464419840

linkhttps://huskydoge.github.io/ calendar_today03-11-2022 01:36:38

9 Tweet

50 Takipçi

446 Takip Edilen

huskydoge (@huskydogewoof) 's Twitter Profile Photo

My respect for the robotics community has just reached a new level. - My main quest this year: understand, optimize model arch and algorithms better, and learn to write kernels. - Side quest: learn robotics from any robo researcher I knew!

Shengqu Cai (@prime_cai) 's Twitter Profile Photo

I agree with every single point mentioned in this blog. Let me share my own story on this too: I started my research career working on 3D/graphics in 2022-ish, back then it was the peak of NeRFs, then 3D generation (with GANs), then score distillation with T2I DMs, like

Guanya Shi (@guanyashi) 's Twitter Profile Photo

Want to reproduce the cool "wall flip" motion? Check out our previous OmniRetarget project (released in Sep 2025)! omniretarget.github.io

Hao Kang (@gt_haokang) 's Twitter Profile Photo

🔥Modifying 2 lines of code and get your agentic serving/rollout up to 3.9x faster losslessly! ⚡️Say hello to ThunderAgent, a fast, simple, and program-aware agentic Inference System. 🥇 We propose a program abstraction to schedule all GPU and CPU resources, the first

huskydoge (@huskydogewoof) 's Twitter Profile Photo

Simplicity never goes out of fashion. Cool work by Xingjian Bai Guande He Xun Huang and others! This work reminds me of some observations from follow up explorations of the video world model Pandora: arxiv.org/abs/2406.09455. - Pandora uses an (almost) causal Transformer

Simplicity never goes out of fashion.  Cool work by <a href="/SimulatedAnneal/">Xingjian Bai</a> <a href="/guande_he/">Guande He</a> <a href="/xxunhuang/">Xun Huang</a>  and others!

This work reminds me of some observations from follow up explorations of the video world model Pandora: arxiv.org/abs/2406.09455. 
- Pandora uses an (almost) causal Transformer
Nicholas Boffi (@nmboffi) 's Twitter Profile Photo

We just brought flow maps to language modeling for one-step sequence generation 💥 Discrete diffusion is not necessary -- continuous flows over one-hot encodings achieve SoTA performance and ≥8.3× faster generation 🔥 We believe this is a major step forward for discrete

Zhengyang Geng (@zhengyanggeng) 's Twitter Profile Photo

I don't want to say Matrix Decomposition is all you need for attention, but at least it is what you need lol. Take a step back to the 80s/90s. Classic Hebbian learning, Oja's rule, and Sanger's rule, as computational models for synaptic plasticity, already proved that NNs

Zhengyang Geng (@zhengyanggeng) 's Twitter Profile Photo

Video diffusion Transformers tend to split causal semantics (early layers) and rendering (later layers). Making this separation explicit, semantic generator + renderer, has clear practical benefits. Very nice paper with systematic probing and analysis! If the renderer is an MF

Zhijian Liu (@zhijianliu_) 's Twitter Profile Photo

Reasoning LLMs generate very long chains-of-thought, so even small quantization errors add up. With AWQ, Qwen3-4B drops 71.0 → 68.2 on MMLU-Pro (~4% relative loss). 😬 ParoQuant fixes this! It keeps only the critical rotation pairs and fuses everything into a single kernel.

Hokin Deng (@denghokin) 's Twitter Profile Photo

#VideoReason We are open-sourcing the entire VBVR stack to speed-up the arrival of video reasoning as the next fundamental paradigm of intelligence - 150+ synthetic generators - 1 million training clips - Cloud-scale data factory - Unified EvalKit - 100 rule-based evaluators -

YixuanEvenXu (@yixuanevenxu) 's Twitter Profile Photo

Recent debates highlight a key issue: how do you actually prove distillation? If you want to claim a model was distilled from your outputs, scientifically and with rigorous statistical guarantees, you should consider Antidistillation Fingerprinting (ADFP). 👇

Ziran Yang (@__zrrr__) 's Twitter Profile Photo

We released Goedel-Prover-V2, a state-of-the-art model for formal theorem proving at launch. Remarkably, it has remained at the top of the open-source formal theorem proving leaderboard for over six months. We have been excited to see so many folks cooking with our models.

Albert Gu (@_albertgu) 's Twitter Profile Photo

okay this plot and discussion has blown up more than expected so let me try to leave some candid thoughts 1. i don't believe that the intent of Mayank's tweet was to claim "Mamba-2 > GDN". the primary intenet was to convey that the initialization for Mamba-2 makes a huge

Albert Gu (@_albertgu) 's Twitter Profile Photo

> an example of this is that in hybrid models, sometimes "stronger" linear layers can lead to overall weaker models because it incentivizes the global attention to be "lazy" some people asked about this. i think this is a somewhat folklore result that I don't have a reference

Lianhui Qin (@lianhuiq) 's Twitter Profile Photo

Jixuan and the team demo’d 🦐OpenClaw agents living and operating in our SimWorld, launched in minutes. 🚀🤖 🔥Our mission: make embodied agent frameworks easy for anyone to run, observe, and customize in a realistic virtual world.

Yu Zhang 🐳🙇 (@yzhang_cs) 's Twitter Profile Photo

Behind every successful LLM is a highly organized, intellectually dense lab led by someone who keeps things open and equal. I’ve always valued this kind of environment, espc in the open-source community, and I feel so sad to see this happening now.