jinyang (patrick) li (@jinyang34647007) 's Twitter Profile
jinyang (patrick) li

@jinyang34647007

CS PhD student @HKUniversity. Previously M.S. in @Columbia. Intern at @MSFTResearch, prev. at @AlibabaGroup.
LLM, Interactive Semantic Parsing, text-to-SQL.

ID: 1463518977376673794

linkhttps://jinyang-li.me/ calendar_today24-11-2021 14:45:05

614 Tweet

1,1K Followers

1,1K Following

Fan Zhou✈️ICLR2025 (@fazhou_998) 's Twitter Profile Photo

If you're looking to boost your model’s math reasoning ability, don’t miss out on MegaMath —— The Largest Open Math Pre-training in the world! 🧠 Need from-scratch training? Use MegaMath. 🔁 Need continual pre-training? Use MegaMath. 🧬 Need high-quality mid-training? Use

If you're looking to boost your model’s math reasoning ability, don’t miss out on MegaMath —— The Largest Open Math Pre-training in the world!

🧠 Need from-scratch training? Use MegaMath. 
🔁 Need continual pre-training? Use MegaMath. 
🧬 Need high-quality mid-training? Use
𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8) 's Twitter Profile Photo

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs New benchmark from Cohere compares training-free sparse attention across: - 7B → 72B models - 16K → 128K tokens - up to 95% sparsity Findings: - larger, sparser models outperform smaller dense ones on long

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

New benchmark from Cohere compares training-free sparse attention across:
- 7B → 72B models
- 16K → 128K tokens
- up to 95% sparsity

Findings:
- larger, sparser models outperform smaller dense ones on long
cohere (@cohere) 's Twitter Profile Photo

Command A, our state-of-the-art generative model, is now the highest-scoring generalist LLM on the Bird Bench leaderboard for SQL! It outperforms other systems that rely on extensive scaffolding to tackle these SQL benchmarks, and instead delivers these results out-of-the-box,

Command A, our state-of-the-art generative model, is now the highest-scoring generalist LLM on the Bird Bench leaderboard for SQL!  

It outperforms other systems that rely on extensive scaffolding to tackle these SQL benchmarks, and instead delivers these results out-of-the-box,
chang ma (@ma_chang_nlp) 's Twitter Profile Photo

We are kicking off a series of seminars at HKUNLP. Siyan Zhao will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT (Thursday 5.8 8pm PDT). Link to talk: hku.zoom.us/j/97925412724?…

We are kicking off a series of seminars at <a href="/hkunlp2020/">HKUNLP</a>.  <a href="/siyan_zhao/">Siyan Zhao</a> will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT  (Thursday 5.8 8pm PDT). Link to talk: hku.zoom.us/j/97925412724?…
Zhewei Yao (@yao_zhewei) 's Twitter Profile Photo

🚀 Big news! Our collab w/ Snowflake, UCSD & UMD topped the BIRD leaderboard — beating prior SOTA by 2.8% in Text-to-SQL reasoning! RL was tough, but worth it. 📢 Best model coming soon. #AI #LLM #TextToSQL #ReinforcementLearning #Snowflake #UCSD #UMD #NLP #BIRDLeaderboard

🚀 Big news! Our collab w/ Snowflake, UCSD &amp; UMD topped the BIRD leaderboard — beating prior SOTA by 2.8% in Text-to-SQL reasoning! RL was tough, but worth it.
📢 Best model coming soon.
#AI #LLM #TextToSQL #ReinforcementLearning #Snowflake #UCSD #UMD #NLP #BIRDLeaderboard
Junxian He (@junxian_he) 's Twitter Profile Photo

As we all know large reasoning models are extremely overthinking even when answering 1+1. Many approaches have been proposed to mitigate this issue, but it doesn't need to be that complex -- we (and several concurrent works) found that just continuing standard RL training while

As we all know large reasoning models are extremely overthinking even when answering 1+1. Many approaches have been proposed to mitigate this issue, but it doesn't need to be that complex -- we (and several concurrent works) found that just  continuing standard RL training while
Bowen Li (@bowenli2121) 's Twitter Profile Photo

🤔 Have we really made great progress on software engineering tasks? 🚀 Introducing SWE-bench-Live, a live-updatable benchmark for real-world bug fixing. 😺 Even the best combo, OpenHands + Claude 3.7 Sonnet, sees a major performance drop! 👉 swe-bench-live.github.io 🧵 1/4

🤔 Have we really made great progress on software engineering tasks?
🚀 Introducing SWE-bench-Live, a live-updatable benchmark for real-world bug fixing.
😺 Even the best combo, OpenHands + Claude 3.7 Sonnet, sees a major performance drop! 
👉 swe-bench-live.github.io

🧵 1/4
Tianbao Xie (@tianbaox) 's Twitter Profile Photo

🚀 OSWorld gets a major upgrade! OSWorld-Verified: 15 months community feedback → 300+ fixes (ambiguity, graders…), 50x faster eval through AWS parallelization More apple-to-apple comparison for reliable CUA evaluation ✨ 👇xlang.ai/blog/osworld-v…

Binyuan Hui (@huybery) 's Twitter Profile Photo

Qwen3-VL is finally released and open-sourced, available in both Thinking and Instruct versions! This time, we’ve placed special emphasis on strengthening Visual Agent and Visual Coding, which are crucial steps toward building a true Digital Agent 🚀

OpenBMB (@openbmb) 's Twitter Profile Photo

How do you make LLMs both long-context capable and super fast? Meet InfLLM-V2 from Tsinghua x OpenBMB — a breakthrough dense-sparse switchable attention system that: 1⃣ Seamlessly adapts from short to long sequences 2⃣Runs 4× faster than dense attention 3⃣Keeps 99%+ accuracy

How do you make LLMs both long-context capable and super fast?

Meet InfLLM-V2 from Tsinghua x OpenBMB —
a breakthrough dense-sparse switchable attention system that:
1⃣ Seamlessly adapts from short to long sequences
2⃣Runs 4× faster than dense attention
3⃣Keeps 99%+ accuracy
Xidulu (@xidulu) 's Twitter Profile Photo

Muon saves optimizer memory overhead from 3 x model_size to approximately 2 x model_size, can we further reduce it to 1 - 1.5 model size?

jinyang (patrick) li (@jinyang34647007) 's Twitter Profile Photo

Excited to share a major update to BIRD-SQL! Our team with efforts from global engineers, experts, and students has completed comprehensive quality control, releasing bird-sql-dev-1106: huggingface.co/datasets/birds… For new submissions using this cleaner split, please indicate this

jinyang (patrick) li (@jinyang34647007) 's Twitter Profile Photo

Tried out Gemini-3-Pro on our BIRD-SQL Verfied Dev + Hidden Test sets. It’s the first general model which can break 70 🚀, getting surprisingly close to specialized SFT/RL models from Databricks and Snowflake . Really impressive generalist performance 🔍✨ BTW, Gemini-SQL

Tried out Gemini-3-Pro on our BIRD-SQL Verfied Dev + Hidden Test sets. It’s the first general model which can break 70 🚀, getting surprisingly close to specialized SFT/RL  models from <a href="/databricks/">Databricks</a> and <a href="/Snowflake/">Snowflake</a> . Really impressive generalist performance 🔍✨

BTW, Gemini-SQL
Alibaba Tongyi_Lab (@labtongyi96898) 's Twitter Profile Photo

XiYan-SQL is an innovative natural language–to–SQL conversion framework designed to address the performance challenges large language models face in SQL generation tasks. XiYan-SQL just hit #1 across all open BIRD-CRITIC (SWE-SQL) leaderboards — including BIRD-CRITIC-1.0-Open,

XiYan-SQL is an innovative natural language–to–SQL conversion framework designed to address the performance challenges large language models face in SQL generation tasks.
XiYan-SQL just hit #1 across all open BIRD-CRITIC (SWE-SQL) leaderboards — including BIRD-CRITIC-1.0-Open,
Tengfei Wang (@dylantfwang) 's Twitter Profile Photo

🎮Get a first look at Tencent HY World 1.5 (WorldPlay)! 🎮 Our newest world model with real-time interaction and long-term memory. It’s going *open-source* tomorrow.

Qian Liu (@sivil_taram) 's Twitter Profile Photo

Most recent LLM+RL work focuses on clipping for stability, there's another path → better baselines! Check out our new method which introduces 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗼𝗸𝗲𝗻 𝗯𝗮𝘀𝗲𝗹𝗶𝗻𝗲 for more stable RL training 🔥

Nan HUO (@nanhuo9637) 's Twitter Profile Photo

Excited to share that our work "BIRD-Interact" has been accepted to ICLR 2026! 🎉 In this project, we introduce a dynamic, interactive Text-to-SQL environment, testing whether LLM agents can handle ambiguous, evolving database tasks like a real DBA. 🤖 🚀 Here is our project

Excited to share that our work "BIRD-Interact" has been accepted to ICLR 2026! 🎉

In this project, we introduce a dynamic, interactive Text-to-SQL environment, testing whether LLM agents can handle ambiguous, evolving database tasks like a real DBA. 🤖

🚀 Here is our project
Yujia Qin@ICLR2025 (@tsingyoga) 's Twitter Profile Photo

Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation Right now it's only available in CN now, and will soon be ready globally. seed.bytedance.com/en/seed2

Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation
Right now it's only available in CN now, and will soon be ready globally.

seed.bytedance.com/en/seed2