Tianzhe Chu @ ICLR 2025 (@tianzhec) Twitter Tweets • TwiCopy

Kyunghyun Cho

8 months ago

it feels like Yann LeCun is going through his decades-old ideas and re-introducing them one at a time 😂 was the optimal alpha = 2/3 and the optimal gamma = 1.7159? 🤣🤣🤣

it feels like <a href="/ylecun/">Yann LeCun</a> is going through his decades-old ideas and re-introducing them one at a time 😂

was the optimal alpha = 2/3 and the optimal gamma = 1.7159? 🤣🤣🤣

thumb_up_off_alt585

chat_bubble_outline12

repeat43

shareShare

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

thumb_up_off_alt8,8K

chat_bubble_outline182

repeat1,1K

shareShare

Peyman Milanfar

@docmilanfar

8 months ago

Statistical "degrees of freedom" (df) is in general not the same as "the # of parameters." The df for any 1-1 (‘image-to-image’) model ŷ =𝐟(y) : ℝⁿ → ℝⁿ is the trace of its Jacobian: df = div[ 𝐟(y) ] = Trace[ ∇ 𝐟(y) ] 1/n

thumb_up_off_alt660

chat_bubble_outline9

repeat61

shareShare

David Fan

@davidjfan

8 months ago

Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

thumb_up_off_alt452

chat_bubble_outline12

repeat93

shareShare

Peter Tong

@tongpetersb

8 months ago

Vision models have been smaller than language models; what if we scale them up? Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.

thumb_up_off_alt484

chat_bubble_outline8

repeat84

shareShare

Tianzhe Chu @ ICLR 2025

@tianzhec

7 months ago

Won’t use a model that rejected me twice anymore! AI at Meta Let’s go Qwen

Won’t use a model that rejected me twice anymore! <a href="/AIatMeta/">AI at Meta</a>
Let’s go Qwen

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Peter Tong

@tongpetersb

7 months ago

We're open-sourcing the training code for MetaMorph! MetaMorph offers a lightweight framework for turning LLMs into unified multimodal models: (multimodal) tokens -> transformers -> diffusion -> pixel! This is our best take on unified modeling as of November 2024, and

thumb_up_off_alt198

chat_bubble_outline4

repeat39

shareShare

Tianzhe Chu @ ICLR 2025

@tianzhec

7 months ago

Will be at ICLR 2025! No paper No plan With camera V me $5 you can get an edited portrait plus an ig follower. Tariffed 245% if you pay by Zella

thumb_up_off_alt28

chat_bubble_outline1

repeat0

shareShare

Druv Pai

@druv_pai

7 months ago

I'm at ICLR this week! I'll be presenting ToST, a (provably) computationally efficient high-performance deep architecture derived from information theory and convex analysis principles. 📅 Saturday April 26, 10AM-12:30PM 📌 Hall 3 + Hall 2B #145 💡Awarded a Spotlight! (1/3)

thumb_up_off_alt40

chat_bubble_outline1

repeat16

shareShare

Tianzhe Chu @ ICLR 2025

@tianzhec

7 months ago

#ICML2025 +1 🤔Seems nobody really cares paper admission except my mom—asked about the status every several weeks

thumb_up_off_alt76

chat_bubble_outline4

repeat2

shareShare

Chun-Hsiao (Daniel) Yeh

@danielyehhh

6 months ago

❗️❗️ Can MLLMs understand scenes from multiple camera viewpoints — like humans? 🧭 We introduce All-Angles Bench — 2,100+ QA pairs on multi-view scenes. 📊 We evaluate 27 top MLLMs, including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o. 🌐 Project: danielchyeh.github.io/All-Angles-Ben…

thumb_up_off_alt78

chat_bubble_outline2

repeat27

shareShare

Kai He

@kai__he

5 months ago

🚀 Introducing UniRelight, a general-purpose relighting framework powered by video diffusion models. 🌟UniRelight jointly models the distribution of scene intrinsics and illumination, enabling high-quality relighting and intrinsic decomposition from a single image or video.

thumb_up_off_alt163

chat_bubble_outline8

repeat42

shareShare

Two Minute Papers

@twominutepapers

5 months ago

NVIDIA’s AI watched 150,000 videos… and learned to relight scenes incredibly well! No game engine. No 3D software. And it has an amazing cat demo. 🐱💡 Hold on to your papers! Full video: youtube.com/watch?v=yRk6vG…

thumb_up_off_alt32

chat_bubble_outline1

repeat13

shareShare

Jyo Pari

@jyo_pari

2 months ago

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

thumb_up_off_alt487

chat_bubble_outline5

repeat78

shareShare

Jasper

@zjasper666

2 months ago

GAUSS: General Assessment of Underlying Structured Skills in Mathematics We’re excited to launch GAUSS, a next-generation math AI benchmark built to overcome the limitations of low skill resolution in today’s benchmarks. What it does GAUSS profiles LLMs across 12 cognitive

thumb_up_off_alt95

chat_bubble_outline5

repeat17

shareShare

Conference on Parsimony and Learning (CPAL)

@cpalconf

2 months ago

Calling all parsimony and learning researchers 🚨🚨 The 3rd annual CPAL will be held in Tübingen Germany March 23–26, 2026! Check out this year's website for all the details cpal.cc

thumb_up_off_alt15

chat_bubble_outline0

repeat7

shareShare

Saining Xie

@sainingxie

a month ago

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

thumb_up_off_alt1,1K

chat_bubble_outline55

repeat321

shareShare