jinyang (patrick) li (@jinyang34647007) Twitter Tweets • TwiCopy

Fan Zhou✈️ICLR2025

a year ago

If you're looking to boost your model’s math reasoning ability, don’t miss out on MegaMath —— The Largest Open Math Pre-training in the world! 🧠 Need from-scratch training? Use MegaMath. 🔁 Need continual pre-training? Use MegaMath. 🧬 Need high-quality mid-training? Use

thumb_up_off_alt31

chat_bubble_outline1

repeat9

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

a year ago

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs New benchmark from Cohere compares training-free sparse attention across: - 7B → 72B models - 16K → 128K tokens - up to 95% sparsity Findings: - larger, sparser models outperform smaller dense ones on long

thumb_up_off_alt98

chat_bubble_outline2

repeat26

shareShare

cohere

@cohere

a year ago

Command A, our state-of-the-art generative model, is now the highest-scoring generalist LLM on the Bird Bench leaderboard for SQL! It outperforms other systems that rely on extensive scaffolding to tackle these SQL benchmarks, and instead delivers these results out-of-the-box,

thumb_up_off_alt173

chat_bubble_outline4

repeat18

shareShare

chang ma

@ma_chang_nlp

a year ago

We are kicking off a series of seminars at HKUNLP. Siyan Zhao will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT (Thursday 5.8 8pm PDT). Link to talk: hku.zoom.us/j/97925412724?…

We are kicking off a series of seminars at <a href="/hkunlp2020/">HKUNLP</a>. <a href="/siyan_zhao/">Siyan Zhao</a> will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT (Thursday 5.8 8pm PDT). Link to talk: hku.zoom.us/j/97925412724?…

thumb_up_off_alt37

chat_bubble_outline0

repeat13

shareShare

Zhewei Yao

@yao_zhewei

a year ago

🚀 Big news! Our collab w/ Snowflake, UCSD & UMD topped the BIRD leaderboard — beating prior SOTA by 2.8% in Text-to-SQL reasoning! RL was tough, but worth it. 📢 Best model coming soon. #AI #LLM #TextToSQL #ReinforcementLearning #Snowflake #UCSD #UMD #NLP #BIRDLeaderboard

thumb_up_off_alt32

chat_bubble_outline1

repeat9

shareShare

Junxian He

@junxian_he

a year ago

As we all know large reasoning models are extremely overthinking even when answering 1+1. Many approaches have been proposed to mitigate this issue, but it doesn't need to be that complex -- we (and several concurrent works) found that just continuing standard RL training while

thumb_up_off_alt88

chat_bubble_outline1

repeat20

shareShare

Bowen Li

@bowenli2121

a year ago

🤔 Have we really made great progress on software engineering tasks? 🚀 Introducing SWE-bench-Live, a live-updatable benchmark for real-world bug fixing. 😺 Even the best combo, OpenHands + Claude 3.7 Sonnet, sees a major performance drop! 👉 swe-bench-live.github.io 🧵 1/4

thumb_up_off_alt82

chat_bubble_outline6

repeat23

shareShare

Tianbao Xie

@tianbaox

8 months ago

🚀 OSWorld gets a major upgrade! OSWorld-Verified: 15 months community feedback → 300+ fixes (ambiguity, graders…), 50x faster eval through AWS parallelization More apple-to-apple comparison for reliable CUA evaluation ✨ 👇xlang.ai/blog/osworld-v…

thumb_up_off_alt134

chat_bubble_outline7

repeat29

shareShare

Binyuan Hui

@huybery

7 months ago

Qwen3-VL is finally released and open-sourced, available in both Thinking and Instruct versions! This time, we’ve placed special emphasis on strengthening Visual Agent and Visual Coding, which are crucial steps toward building a true Digital Agent 🚀

thumb_up_off_alt221

chat_bubble_outline3

repeat21

shareShare

jinyang (patrick) li

@jinyang34647007

7 months ago

Yeah, we do make efforts in benchmarking and setup-ing text-to-sql env towards real DBA cycle step by step😃~

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

OpenBMB

@openbmb

6 months ago

How do you make LLMs both long-context capable and super fast? Meet InfLLM-V2 from Tsinghua x OpenBMB — a breakthrough dense-sparse switchable attention system that: 1⃣ Seamlessly adapts from short to long sequences 2⃣Runs 4× faster than dense attention 3⃣Keeps 99%+ accuracy

thumb_up_off_alt104

chat_bubble_outline1

repeat22

shareShare

Xidulu

@xidulu

5 months ago

Muon saves optimizer memory overhead from 3 x model_size to approximately 2 x model_size, can we further reduce it to 1 - 1.5 model size?

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

jinyang (patrick) li

@jinyang34647007

5 months ago

Excited to share a major update to BIRD-SQL! Our team with efforts from global engineers, experts, and students has completed comprehensive quality control, releasing bird-sql-dev-1106: huggingface.co/datasets/birds… For new submissions using this cleaner split, please indicate this

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

jinyang (patrick) li

@jinyang34647007

5 months ago

Tried out Gemini-3-Pro on our BIRD-SQL Verfied Dev + Hidden Test sets. It’s the first general model which can break 70 🚀, getting surprisingly close to specialized SFT/RL models from Databricks and Snowflake . Really impressive generalist performance 🔍✨ BTW, Gemini-SQL

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Junyang Lin

@justinlin610

5 months ago

i'll be talking about qwen3-coder at 1530 at hall g. feel free to join!

thumb_up_off_alt140

chat_bubble_outline5

repeat6

shareShare

Alibaba Tongyi_Lab

@labtongyi96898

4 months ago

XiYan-SQL is an innovative natural language–to–SQL conversion framework designed to address the performance challenges large language models face in SQL generation tasks. XiYan-SQL just hit #1 across all open BIRD-CRITIC (SWE-SQL) leaderboards — including BIRD-CRITIC-1.0-Open,

thumb_up_off_alt122

chat_bubble_outline1

repeat13

shareShare

Tengfei Wang

@dylantfwang

4 months ago

🎮Get a first look at Tencent HY World 1.5 (WorldPlay)! 🎮 Our newest world model with real-time interaction and long-term memory. It’s going *open-source* tomorrow.

thumb_up_off_alt821

chat_bubble_outline33

repeat127

shareShare

Qian Liu

@sivil_taram

4 months ago

Most recent LLM+RL work focuses on clipping for stability, there's another path → better baselines! Check out our new method which introduces 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗼𝗸𝗲𝗻 𝗯𝗮𝘀𝗲𝗹𝗶𝗻𝗲 for more stable RL training 🔥

thumb_up_off_alt83

chat_bubble_outline3

repeat8

shareShare

Nan HUO

@nanhuo9637

3 months ago

Excited to share that our work "BIRD-Interact" has been accepted to ICLR 2026! 🎉 In this project, we introduce a dynamic, interactive Text-to-SQL environment, testing whether LLM agents can handle ambiguous, evolving database tasks like a real DBA. 🤖 🚀 Here is our project

thumb_up_off_alt9

chat_bubble_outline1

repeat2

shareShare

Yujia Qin@ICLR2025

@tsingyoga

2 months ago

Happy CNY! We are glad to introduce our latest language model Seed-2.0. We make great progress (agent, reasoning, vision understanding, etc.) since Seed-1.8 without any distillation Right now it's only available in CN now, and will soon be ready globally. seed.bytedance.com/en/seed2

thumb_up_off_alt181

chat_bubble_outline8

repeat23

shareShare