Tsun-Yi Yang 楊存毅 🇹🇼🏳️‍🌈 (@shamangary) 's Twitter Profile
Tsun-Yi Yang 楊存毅 🇹🇼🏳️‍🌈

@shamangary

A proud Taiwanese boy. @RobinAI_UK LLM research engineer. Ex-Meta. PhD in computer vision at National Taiwan University (NTU)

ID: 1537755079

calendar_today22-06-2013 02:05:34

1,1K Tweet

471 Followers

656 Following

Zhenjun Zhao (@zhenjun_zhao) 's Twitter Profile Photo

FastVGGT: Training-Free Acceleration of Visual Geometry Transformer You Shen, Zhipeng Zhang, Yansong Qu, Liujuan Cao tl;dr: token merging->VGGT without dense global attention arxiv.org/abs/2509.02560

FastVGGT: Training-Free Acceleration of Visual Geometry Transformer

You Shen, Zhipeng Zhang, Yansong Qu, Liujuan Cao

tl;dr: token merging->VGGT without dense global attention

arxiv.org/abs/2509.02560
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxestex) 's Twitter Profile Photo

sorry Meta "superintelligence" lab but Andrew Zhao et al. did this better and you don't cite them. Actually, many have (eg Bo Liu (Benjamin Liu)), whether joint maximization or minimax, verifiers or RMs. Why kang of a breakthrough? Good job evaluating on Vicuna tho, peak 2023 llamacore

sorry Meta "superintelligence" lab but <a href="/_AndrewZhao/">Andrew Zhao</a> et al. did this better and you don't cite them. Actually, many have (eg <a href="/Benjamin_eecs/">Bo Liu (Benjamin Liu)</a>), whether joint maximization or minimax, verifiers or RMs. Why kang of a breakthrough? Good job evaluating on Vicuna tho, peak 2023 llamacore
Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

Ever wonder what's really happening when we use RL to teach LLMs to reason? 🤔 The process is full of mysteries. 🤯 What causes those sudden "aha moments" in training? 📏 Why does better reasoning often lead to longer answers ("length-scaling")? 📉 Why does token entropy often

alphaXiv (@askalphaxiv) 's Twitter Profile Photo

Why Do Multimodal LLMs (MLLM) Struggle with Spatial Understanding? This research shows that MLLMs’ spatial struggles aren’t from data scarcity, but from architecture. Spatial ability relies on the vision encoder’s positional cues, so a redesign like prompt targeting is needed.

Why Do Multimodal LLMs (MLLM) Struggle with Spatial Understanding?

This research shows that MLLMs’ spatial struggles aren’t from data scarcity, but from architecture. Spatial ability relies on the vision encoder’s positional cues, so a redesign like prompt targeting is needed.
DRONEFORGE (@thedroneforge) 's Twitter Profile Photo

< Image as an IMU: Turning Motion Blur into a Velocity Sensor > In a new paper, researchers flip the script on motion blur. Instead of a problem to be fixed, they treat it as a rich signal for estimating a camera's instantaneous 6-DoF velocity From a single blurred image, their

&lt; Image as an IMU: Turning Motion Blur into a Velocity Sensor &gt;

In a new paper, researchers flip the script on motion blur. Instead of a problem to be fixed, they treat it as a rich signal for estimating a camera's instantaneous 6-DoF velocity

From a single blurred image, their
Zhenjun Zhao (@zhenjun_zhao) 's Twitter Profile Photo

SLAM-Former: Putting SLAM into One Transformer Yijun Yuan, Zhuoguang Chen, Kenan Li, Weibang Wang, Hang Zhao tl;dr: e frontend and the backend promote each other with transformer arxiv.org/abs/2509.16909

SLAM-Former: Putting SLAM into One Transformer

Yijun Yuan, Zhuoguang Chen, Kenan Li, Weibang Wang, <a href="/zhaohang0124/">Hang Zhao</a>

tl;dr: e frontend and the backend promote each other with transformer

arxiv.org/abs/2509.16909
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

I think this project could be one of those "why have we ever done this differently?!" kind of moments. Instead of doing code training by just predicting the next token in the source file, interleave that with interpreter state which also have to be predicted! Devil's in the

Kwang Moo Yi (@kwangmoo_yi) 's Twitter Profile Photo

Yang et al., "Dense Semantic Matching with VGGT Prior" Train a decoding head for semantic segmentation, with sparse GT supervision and cycle consistency --> dense non-rigid warping. Using a foundational model for "matching" for sure works better than "any" foundational model.

Yang et al., "Dense Semantic Matching with VGGT Prior"

Train a decoding head for semantic segmentation, with sparse GT supervision and cycle consistency --&gt; dense non-rigid warping. Using a foundational model for "matching" for sure works better than "any" foundational model.
martin_casado (@martin_casado) 's Twitter Profile Photo

Total insanity. This is using an adaptive LOD scheme in sparksjs (not merged yet). The entire scene has 16million splats and this is real time navigation ... 😱😱

Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

What’s preventing us from training open-source image editing models like Nano-Banana or Seedream? The main barrier is the lack of high-quality training data for image editing. Most existing image editing datasets are synthesized using weak reward models or poor quality

What’s preventing us from training open-source image editing models like Nano-Banana or Seedream?
The main barrier is the lack of high-quality training data for image editing. Most existing image editing datasets are synthesized using weak reward models or poor quality
Thomas Fel (@napoolar) 's Twitter Profile Photo

🕳️🐇Into the Rabbit Hull – Part II Continuing our interpretation of DINOv2, the second part of our study concerns the geometry of concepts and the synthesis of our findings toward a new representational phenomenology: the Minkowski Representation Hypothesis

🕳️🐇Into the Rabbit Hull – Part II

Continuing our interpretation of DINOv2, the second part of our study concerns the geometry of concepts and the synthesis of our findings toward a new representational phenomenology: 
the Minkowski Representation Hypothesis
Min-Hung (Steve) Chen (@cmhungsteven) 's Twitter Profile Photo

Current Vision-Language Models completely struggle with complex 4D dynamics. We fixed that. 🤯 🚨 Introducing 4D-RGPT: distilling perceptual knowledge directly into LLMs for precise space & time reasoning. 🎉 Excited to share our NVIDIA AI work has been accepted to #CVPR2026!