Delong Chen (陈德龙) (@delong0_0) 's Twitter Profile
Delong Chen (陈德龙)

@delong0_0

Ph.D. student @HKUST, Visiting Researcher at FAIR Paris @AIatMeta.
Working on vision-language world modeling.

ID: 4895781409

linkhttps://chendelong.world/ calendar_today12-02-2016 04:59:21

81 Tweet

217 Takipçi

424 Takip Edilen

Delong Chen (陈德龙) (@delong0_0) 's Twitter Profile Photo

I'm attending ICML'25 in Vancouver. Will present: 1) Subobject-level adaptive image token segmentation (main conference) arxiv.org/abs/2402.14327 2) WorldPrediction benchmark for world modeling and procedual planing (in Assessing World Models workshop) arxiv.org/abs/2506.04363

I'm attending ICML'25 in Vancouver. Will present: 

1) Subobject-level adaptive image token segmentation (main conference) arxiv.org/abs/2402.14327

2) WorldPrediction benchmark for world modeling and procedual planing (in Assessing World Models workshop) arxiv.org/abs/2506.04363
Pascale Fung (@pascalefung) 's Twitter Profile Photo

Our research on embodied AI agents that can perceive, learn, act and interact in the virtual and physical worlds. #metaAI #AIAgent #embodied #worldmodel #superintelligemce arxiv.org/abs/2506.22355

Federico Baldassarre (@baldassarrefe) 's Twitter Profile Photo

Say hello to DINOv3 🦖🦖🦖 A major release that raises the bar of self-supervised vision foundation models. With stunning high-resolution dense features, it’s a game-changer for vision tasks! We scaled model size and training data, but here's what makes it special 👇

Say hello to DINOv3 🦖🦖🦖

A major release that raises the bar of self-supervised vision foundation models.
With stunning high-resolution dense features, it’s a game-changer for vision tasks!

We scaled model size and training data, but here's what makes it special 👇
Delong Chen (陈德龙) (@delong0_0) 's Twitter Profile Photo

Thanks AK for sharing! More about our VLWM: - Non-pixel-generative world model that reasons in abstract semantic space - Learned from 20k hours of unlabeled egocentric / web procedural videos with 5.7M action steps - System-2 planning with reasoning by cost-guided plan

Thanks <a href="/_akhaliq/">AK</a> for sharing!

More about our VLWM:
- Non-pixel-generative world model that reasons in abstract semantic space

- Learned from 20k hours of unlabeled egocentric / web procedural videos with 5.7M action steps

- System-2 planning with reasoning by cost-guided plan
Randall Balestriero (@randall_balestr) 's Twitter Profile Photo

LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...) - 60+ arch., up to 2B params - 10+ datasets - in-domain training (>DINOv3) - corr(train loss, test perf)=95% Paper: arxiv.org/pdf/2511.08544 Code: github.com/rbalestr-lab/l…

Pascale Fung (@pascalefung) 's Twitter Profile Photo

Introducing VL-JEPA: Vision-Language Joint Embedding Predictive Architecture for streaming, live action recognition, retrieval, VQA, and classification tasks with better performance and higher efficiency than large VLMs. • VL-JEPA is the first non-generative model that can

Delong Chen (陈德龙) (@delong0_0) 's Twitter Profile Photo

Today we release WorldPrediction, a video-based benchmark for evaluating world modeling and procedural planning capabilities of different AI models (LLMs, VLMs, diffusion world models, etc.). WorldPrediction is the first benchmark that emphasizes high-level actions with

Today we release WorldPrediction, a video-based benchmark for evaluating world modeling and procedural planning capabilities of different AI models (LLMs, VLMs, diffusion world models, etc.). 

WorldPrediction is the first benchmark that emphasizes high-level actions with
Delong Chen (陈德龙) (@delong0_0) 's Twitter Profile Photo

A systematic empirical study of what makes for good JEPA planners 👍🏻 Nice work and congrats to Basile Terver ! 2025 has been an great year for JEPA and there are definitely more to come in 2026!