Yunlong Lin (@ling_yunlong) 's Twitter Profile
Yunlong Lin

@ling_yunlong

XMU | AI agent | Embodied AI | Multimodal learning

ID: 1619343895611199490

linkhttps://lyl1015.github.io/ calendar_today28-01-2023 14:37:41

4 Tweet

7 Followers

20 Following

Bin Lin (@linbin46984) 's Twitter Profile Photo

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation. 🌟Now data, model, training & evaluation script are open-source! github.com/PKU-YuanGroup/…

OpenAI (@openai) 's Twitter Profile Photo

ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.

orange.ai (@oran_ge) 's Twitter Profile Photo

早晨起来,意外发现 Qwen3 Coder 发布了。 Qwen3 Coder 一个具备 Agent 能力的代码模型。 这个模型在 Agentic Coding、Agentic Browser-Use 和 Agentic Tool-Use 上取得了开源模型的 SOTA。 简单说,代码和 Agent 能力,可以和 Claude Sonnet4 相媲美。 模型总参数量只有 480B,激活参数 35B。

早晨起来,意外发现 Qwen3 Coder 发布了。

Qwen3 Coder 一个具备 Agent 能力的代码模型。
这个模型在 Agentic Coding、Agentic Browser-Use 和 Agentic Tool-Use 上取得了开源模型的 SOTA。
简单说,代码和 Agent 能力,可以和 Claude Sonnet4 相媲美。

模型总参数量只有 480B,激活参数 35B。
Ziwei Liu (@liuziwei7) 's Twitter Profile Photo

🧠Video Thinking Test for Reasoning LLMs🧠 *Video Thinking Test* (📽️Video-TT📽️) is a holistic benchmark to assess the advanced reasoning and understanding correctness/robustness between LLMs and humans #ICCV2025 - Project: zhangyuanhan-ai.github.io/video-tt/ - Data: huggingface.co/datasets/lmms-…

Wenhao Chai (@wenhaocha1) 's Twitter Profile Photo

Dataset Distillation as Data Compression: A Rate-Utility Perspective arxiv.org/abs/2507.17221 Read this paper tonight, get me some sense: Dataset Distillation ≈ Visual Tokenization? Dataset Distillation: Replace full dataset with few synthetic samples Visual Tokenizer: Replace

Alex Prompter (@alex_prompter) 's Twitter Profile Photo

I tested ChatGPT-5 and Gemini 2.5 Pro with same critical prompts. The results will shock you. ChatGPT-5 Vs. Gemini 2.5 Pro (Video demos are included)

I tested ChatGPT-5 and Gemini 2.5 Pro with same critical prompts.

The results will shock you.

ChatGPT-5      Vs.     Gemini 2.5 Pro

(Video demos are included)
Owen Tian Ye (@tiny85114767) 's Twitter Profile Photo

Introducing LucidFlux-14B — caption-free image restoration for the real world. SOTA on 6 metrics, rivaling closed-source. Built on a 12B Flux-DiT with a unified dual-branch design + adaptive temporal/depth fusion; SigLIP preserves semantics without text.

Introducing LucidFlux-14B — caption-free image restoration for the real world. SOTA on 6 metrics, rivaling closed-source. Built on a 12B Flux-DiT with a unified dual-branch design + adaptive temporal/depth fusion; SigLIP preserves semantics without text.
Enze Xie (@xieenze_jr) 's Twitter Profile Photo

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE

Bin Lin (@linbin46984) 's Twitter Profile Photo

🚀 Introducing FlashI2V: The game-changer in Image-to-Video generation! 🔥 Solving conditional image leakage with Latent Shifting & Fourier Guidance. 1.3B parameters, outperforms CogVideoX1.5-5B in speed, quality & generalization. github.com/PKU-YuanGroup/…

🚀 Introducing FlashI2V: The game-changer in Image-to-Video generation! 🔥 Solving conditional image leakage with Latent Shifting & Fourier Guidance. 1.3B parameters, outperforms CogVideoX1.5-5B in speed, quality & generalization. 

github.com/PKU-YuanGroup/…
Tesla (@tesla) 's Twitter Profile Photo

To push self-driving into situations wilder than reality, we built a neural network world simulator that can create entirely synthetic worlds for the Tesla to drive in. Video below is fully generated & not a real video

MrNeRF (@janusch_patas) 's Twitter Profile Photo

Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models Contributions: • We propose DIFF4SPLAT, a unified diffusion-based model that directly generates deformable 3D Gaussians for controllable 4D scene synthesis. • We construct a large-scale 4D

Kairun Wen (@kairunwen) 's Twitter Profile Photo

🦋DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Excited to share that our work in the #NeurIPS2025 ! - A large-scale 4D + instance + semantics + caption dataset with 100K in-the-wild scenes, supporting 4D world modeling by combining classic 3D

Yunlong Lin (@ling_yunlong) 's Twitter Profile Photo

Excited to see JarvisArt hit 700🌟! To support the community, we've released the full pipeline: 1️⃣ Multi-machine protocol (Agent ↔️ Lightroom) 2️⃣ Data construction scripts & MMArt-Bench 3️⃣ All training/inference/eval code Check it out & contribute! 🚀 🔗 github.com/LYL1015/Jarvis…

Yunlong Lin (@ling_yunlong) 's Twitter Profile Photo

🎨Introducing JarvisEvo: The First Self-Evolving Photo Editing Agent! From tool-user to Creator. Moving beyond "blind" CoT to true visual perception & reflection. Project Page: jarvisevo.vercel.app Arxiv: arxiv.org/pdf/2511.23002 GitHub: github.com/LYL1015/Jarvis…

🎨Introducing JarvisEvo: The First Self-Evolving Photo Editing Agent!  From tool-user to Creator. Moving beyond "blind" CoT to true visual perception & reflection.
Project Page: jarvisevo.vercel.app
Arxiv: arxiv.org/pdf/2511.23002
GitHub: github.com/LYL1015/Jarvis…