Licheng Liu (@liulicheng10) 's Twitter Profile
Licheng Liu

@liulicheng10

Maths @ Imperial , intern @ NU MLL lab, lichengliu03.github.io, applying for '26 fall phd

views are my own

ID: 1503078169691435008

calendar_today13-03-2022 18:39:15

112 Tweet

61 Takipçi

131 Takip Edilen

Manling Li (@manlingli_) 's Twitter Profile Photo

World Model Reasoning for VLM Agents (NeurIPS 2025, Score 5544) We release VAGEN to teach VLMs to build internal world models via visual state reasoning: - StateEstimation: what is the current state? - TransitionModeling: what is next? MDP → POMDP shift to handle the partial

Dwarkesh Patel (@dwarkesh_sp) 's Twitter Profile Photo

The Andrej Karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self

Manling Li (@manlingli_) 's Twitter Profile Photo

VLAs, VLMs, LLMs, and Vision Foundation Models for Embodied Agents! There are just so many new updates in recent months! We have updated our tutorial, come and join us if you would like to discuss the latest advances! Room: 306B Time: 1pm-5pm Slides: …models-meet-embodied-agents.github.io

VLAs, VLMs, LLMs, and Vision Foundation Models for Embodied Agents!

There are just so many new updates in recent months!

We have updated our tutorial, come and join us if you would like to discuss the latest advances!

Room: 306B
Time: 1pm-5pm
Slides: …models-meet-embodied-agents.github.io
Ziqian Zhong (@fjzzq2002) 's Twitter Profile Photo

New research with Aditi Raghunathan, Nicholas Carlini and Anthropic! We built ImpossibleBench to measure reward hacking in LLM coding agents 🤖, by making benchmark tasks impossible and seeing whether models game tests or follow specs. (1/9)

New research with <a href="/AdtRaghunathan/">Aditi Raghunathan</a>, Nicholas Carlini and Anthropic!

We built ImpossibleBench to measure reward hacking in LLM coding agents 🤖, by making benchmark tasks impossible and seeing whether models game tests or follow specs. (1/9)
GLADIA Research Lab (@gladialab) 's Twitter Profile Photo

LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)

LLMs are injective and invertible.

In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space.

(1/6)
John Yang (@jyangballin) 's Twitter Profile Photo

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile Photo

RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵

RL is bounded by finite data😣?
Introducing RLVE: RL with Adaptive Verifiable Environments

We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model

💡find supervision signals right at the LM capability frontier + scale them

🔗in🧵
Micah Goldblum (@micahgoldblum) 's Twitter Profile Photo

An LLM-generated paper is in the top 17% of ICLR submissions in terms of average reviewer score, having received two 8's. The paper has tons of BS jargon and hallucinated references. Fortunately, one reviewer actually looked at the paper and gave it a zero. 1/3

Aditya Ramesh (@model_mechanic) 's Twitter Profile Photo

The value of fast iteration in AI is overrated. The best results are obtained by knowing the right things to do and doing each thing with neurotic precision and attention to detail.

Manling Li (@manlingli_) 's Twitter Profile Photo

Spatial intelligence has long been one of the biggest bottleneck for VLMs. Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric

Spatial intelligence has long been one of the biggest bottleneck for VLMs. 

Two years ago in Sept 2023, when I just started my postdoc, I still remember vividly how we are excited about GPT-4V and how our “What GPT-4V still can’t do” slides were completely dominated by geometric
Rulin Shao (@rulinshao) 's Twitter Profile Photo

🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀 The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics: - co-evolve with the policy model -

🔥Thrilled to introduce DR Tulu-8B, an open long-form Deep Research model that matches OpenAI DR 💪Yes, just 8B! 🚀

The secret? We present Reinforcement Learning with Evolving Rubrics (RLER) for long-form non-verifiable DR tasks! Our rubrics:
- co-evolve with the policy model
-
Manling Li (@manlingli_) 's Twitter Profile Photo

While discussing spatial intelligence of "VLMs", wanted to share an interesting finding we have in ICML25 paper: We actually opens the black box of why VLMs fail at even the simplest spatial question "where is A to B" - 90% of tokens are visual, yet they get only ~10% of the

While discussing spatial intelligence of "VLMs", wanted to share an interesting finding we have in ICML25 paper:

We actually opens the black box of why VLMs fail at even the simplest spatial question "where is A to B"

- 90% of tokens are visual, yet they get only ~10% of the
Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

We are hiring OSU NLP Group! - frontier agent/LLM research - hundreds of H100s/A100s + clouds - uncapped LLM APIs Join us if you have bold ideas about the future of AI