Lin Zheng (@linzhengisme) 's Twitter Profile
Lin Zheng

@linzhengisme

Ph.D. student @ HKU

ID: 1311227528963477504

linkhttps://lzhengisme.github.io/ calendar_today30-09-2020 08:53:16

100 Tweet

334 Followers

339 Following

Qian Liu (@sivil_taram) 's Twitter Profile Photo

Wrapped up a SWE-Perf website redesign using Qwen3-Coder on AnyCoder (huggingface.co/spaces/akhaliq…). The process was incredibly fast and great! One question for Qwen devs, though: did you pretrain a secret love for the color purple into the coder's persona? 😉

Wrapped up a SWE-Perf website redesign using Qwen3-Coder on AnyCoder (huggingface.co/spaces/akhaliq…). The process was incredibly fast and great!

One question for Qwen devs, though: did you pretrain a secret love for the color purple into the coder's persona? 😉
Fan Zhou✈️ICLR2025 (@fazhou_998) 's Twitter Profile Photo

Qwen3-Coder-Flash (size == 30B-A3B) just landed. Qwen Code (0.0.1-alpha.12) also picked up a few upgrades. Tiny ≠ trivial, and lightweight ≠ light-headed. Will keep iterating and aim for better agentic coding.

Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

I feel like this completely flew under the radar despite being a huge deal for discrete diffusion models: DremOn is a 7B dLLM that can do variable length generation, solving something that has been a huge challenge! The idea is clever: Let's just randomly insert <|delete|>

I feel like this completely flew under the radar despite being a huge deal for discrete diffusion models:
DremOn is a 7B dLLM that can do variable length generation, solving something that has been a huge challenge!

The idea is clever: Let's just randomly insert &lt;|delete|&gt;
Jinjie Ni @ ICLR'25 🇸🇬 (@nijinjie) 's Twitter Profile Photo

Token crisis: solved. ✅ We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3× data potential. > A 1B DLM trained on just 1B tokens

Token crisis: solved. ✅

We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs.

Findings:
&gt;  DLMs beat AR when tokens are limited, with &gt;3× data potential.
&gt;  A 1B DLM trained on just 1B tokens
Yiheng Xu✈️ICLR2025 (@yihengxu_) 's Twitter Profile Photo

Excited to see Qwen3-Coder 480B as the default model for AK’s anycoder — thanks! Gave it a one-shot prompt to build an interactive Win95 desktop, and it just works!

Excited to see Qwen3-Coder 480B as the default model for <a href="/_akhaliq/">AK</a>’s anycoder — thanks! Gave it a one-shot prompt to build an interactive Win95 desktop, and it just works!
Sansa Gong (@sansa19739319) 's Twitter Profile Photo

1–2 years ago, when I first started training text diffusion models, I had this empirical feeling that they could handle more epochs of training data. It’s great to now see the community sharing experiment logs using the "LM as physics" research approach.🤗

Jiaxin Shi (@thjashin) 's Twitter Profile Photo

To be fair, I’m not saying there is no hope - it’s just that there is no evidence that the cross point exists in the non-overfitting regime.

Xinyu Yang (@xinyu2ml) 's Twitter Profile Photo

What’s particularly striking is that 1B unique tokens trained for 96 epochs can match the performance of 96B unique tokens trained for a single epoch. At first glance, this seems counterintuitive. However, if we randomly mask tokens during training, a sequence of length L can

Wenhao Chai (@wenhaocha1) 's Twitter Profile Photo

GPT-5, think more. In our latest LiveCodeBench Pro tests for Competitive Programming, GPT-5 Thinking hit a true 0→1 moment in 2025 Q1 set, the only model to crack the hard split, and this wasn’t even GPT-5 Thinking Pro. Average response length exceeded 100,000 tokens, which is

GPT-5, think more.

In our latest LiveCodeBench Pro tests for Competitive Programming, GPT-5 Thinking hit a true 0→1 moment in 2025 Q1 set, the only model to crack the hard split, and this wasn’t even GPT-5 Thinking Pro. Average response length exceeded 100,000 tokens, which is
Tianbao Xie (@tianbaox) 's Twitter Profile Photo

Where are our computer‑use agents (CUA) standing on OSWorld‑Verified? Potentially already ~80%. We made this analysis, which summarizes the latest OSWorld-Verified submissions with 27 models evaluated over 369 tasks, and conducted a case study on the o3+Jedi-7B approach to

Where are our computer‑use agents (CUA) standing on OSWorld‑Verified? Potentially already ~80%.

We made this analysis, which summarizes the latest OSWorld-Verified submissions with 27 models evaluated over 369 tasks, and conducted a case study on the o3+Jedi-7B approach to
Xinyuan Wang (@xywang626) 's Twitter Profile Photo

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] arxiv.org/abs/2508.09123 📌

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data.

🔗 [Paper] arxiv.org/abs/2508.09123 
📌
Tianbao Xie (@tianbaox) 's Twitter Profile Photo

Someone asked me: "MCP is so hot right now. If the human-computer interface completely changes in the future to just a chat box, wouldn't your computer use work become useless?" I said: "Maybe, that's possible in one timeline. But it's also possible things will be different

Tao Yu (@taoyds) 's Twitter Profile Photo

As computer-use agents (CUAs) handle critical digital tasks, open research is key to study their capabilities, risks. 🚀After a year, we release OpenCUA: 1) largest CUA dataset/tool, 2) training recipe, 3) ~SOTA model on OSWorld. Released to drive transparent,safe CUA research!

As computer-use agents (CUAs) handle critical digital tasks, open research is key to study their capabilities, risks.

🚀After a year, we release OpenCUA: 1) largest CUA dataset/tool, 2) training recipe, 3) ~SOTA model on OSWorld.

Released to drive transparent,safe CUA research!
XLANG NLP Lab (@xlangnlp) 's Twitter Profile Photo

Check out our latest open-source project, OpenCUA, for Computer Use Agents (CUAs)! Find the code, annotation tool, the largest CUA dataset, scalable training recipe, and state-of-the-art model on OSWorld at: opencua.xlang.ai.

Yiheng Xu✈️ICLR2025 (@yihengxu_) 's Twitter Profile Photo

Qwen Code began as a fun side project with Fan Zhou and Binyuan Hui to explore Qwen3 Coder’s terminal-based agentic skills. Right before launch, we decided to share it as a preview with free quota alongside Qwen3 Coder, so the community could test its agent skills in the wild.