Yunlong Lin (@ling_yunlong) Twitter Tweets • TwiCopy

Bin Lin

7 months ago

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation. 🌟Now data, model, training & evaluation script are open-source! github.com/PKU-YuanGroup/…

thumb_up_off_alt190

chat_bubble_outline4

repeat33

shareShare

AK

@_akhaliq

6 months ago

MoVieS Motion-Aware 4D Dynamic View Synthesis in One Second

thumb_up_off_alt128

chat_bubble_outline3

repeat15

shareShare

Yunlong Lin

@ling_yunlong

6 months ago

Amazing！

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Yunlong Lin

@ling_yunlong

5 months ago

thanks for sharing!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

OpenAI

@openai

5 months ago

ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.

thumb_up_off_alt13,13K

chat_bubble_outline650

repeat2,2K

shareShare

orange.ai

@oran_ge

5 months ago

早晨起来，意外发现 Qwen3 Coder 发布了。 Qwen3 Coder 一个具备 Agent 能力的代码模型。这个模型在 Agentic Coding、Agentic Browser-Use 和 Agentic Tool-Use 上取得了开源模型的 SOTA。简单说，代码和 Agent 能力，可以和 Claude Sonnet4 相媲美。模型总参数量只有 480B，激活参数 35B。

thumb_up_off_alt292

chat_bubble_outline12

repeat46

shareShare

Ziwei Liu

@liuziwei7

5 months ago

🧠Video Thinking Test for Reasoning LLMs🧠 *Video Thinking Test* (📽️Video-TT📽️) is a holistic benchmark to assess the advanced reasoning and understanding correctness/robustness between LLMs and humans #ICCV2025 - Project: zhangyuanhan-ai.github.io/video-tt/ - Data: huggingface.co/datasets/lmms-…

thumb_up_off_alt114

chat_bubble_outline0

repeat22

shareShare

Wenhao Chai

@wenhaocha1

5 months ago

Dataset Distillation as Data Compression: A Rate-Utility Perspective arxiv.org/abs/2507.17221 Read this paper tonight, get me some sense: Dataset Distillation ≈ Visual Tokenization? Dataset Distillation: Replace full dataset with few synthetic samples Visual Tokenizer: Replace

thumb_up_off_alt45

chat_bubble_outline2

repeat6

shareShare

Alex Prompter

@alex_prompter

5 months ago

I tested ChatGPT-5 and Gemini 2.5 Pro with same critical prompts. The results will shock you. ChatGPT-5 Vs. Gemini 2.5 Pro (Video demos are included)

thumb_up_off_alt2,2K

chat_bubble_outline75

repeat225

shareShare

Aadit Sheth

@aaditsh

4 months ago

This guy literally dropped the best visual guide to LLMs you’ll ever see

thumb_up_off_alt11,11K

chat_bubble_outline90

repeat1,1K

shareShare

Owen Tian Ye

@tiny85114767

4 months ago

Introducing LucidFlux-14B — caption-free image restoration for the real world. SOTA on 6 metrics, rivaling closed-source. Built on a 12B Flux-DiT with a unified dual-branch design + adaptive temporal/depth fusion; SigLIP preserves semantics without text.

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

Enze Xie

@xieenze_jr

3 months ago

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE

thumb_up_off_alt125

chat_bubble_outline3

repeat20

shareShare

Bin Lin

@linbin46984

3 months ago

🚀 Introducing FlashI2V: The game-changer in Image-to-Video generation! 🔥 Solving conditional image leakage with Latent Shifting & Fourier Guidance. 1.3B parameters, outperforms CogVideoX1.5-5B in speed, quality & generalization. github.com/PKU-YuanGroup/…

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Tesla

@tesla

2 months ago

To push self-driving into situations wilder than reality, we built a neural network world simulator that can create entirely synthetic worlds for the Tesla to drive in. Video below is fully generated & not a real video

thumb_up_off_alt10,10K

chat_bubble_outline453

repeat1,1K

shareShare

MrNeRF

@janusch_patas

2 months ago

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models Contributions: • We propose DIFF4SPLAT, a unified diffusion-based model that directly generates deformable 3D Gaussians for controllable 4D scene synthesis. • We construct a large-scale 4D

thumb_up_off_alt14

chat_bubble_outline1

repeat6

shareShare

Kairun Wen

@kairunwen

a month ago

🦋DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Excited to share that our work in the #NeurIPS2025 ! - A large-scale 4D + instance + semantics + caption dataset with 100K in-the-wild scenes, supporting 4D world modeling by combining classic 3D

thumb_up_off_alt17

chat_bubble_outline1

repeat5

shareShare

Yunlong Lin

@ling_yunlong

22 days ago

Excited to see JarvisArt hit 700🌟! To support the community, we've released the full pipeline: 1️⃣ Multi-machine protocol (Agent ↔️ Lightroom) 2️⃣ Data construction scripts & MMArt-Bench 3️⃣ All training/inference/eval code Check it out & contribute! 🚀 🔗 github.com/LYL1015/Jarvis…

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Adobe

@adobe

21 days ago

Edit designs, images, and documents. Adobe Express, Photoshop, and Acrobat are now in ChatGPT. adobe.ly/3XNhgvO

thumb_up_off_alt441

chat_bubble_outline42

repeat74

shareShare

Yunlong Lin

@ling_yunlong

16 days ago

🎨Introducing JarvisEvo: The First Self-Evolving Photo Editing Agent! From tool-user to Creator. Moving beyond "blind" CoT to true visual perception & reflection. Project Page: jarvisevo.vercel.app Arxiv: arxiv.org/pdf/2511.23002 GitHub: github.com/LYL1015/Jarvis…

thumb_up_off_alt11

chat_bubble_outline0

repeat4

shareShare