Nanbeige (@nanbeige) 's Twitter Profile
Nanbeige

@nanbeige

Nanbeige LLM Lab

ID: 1740367494102351872

linkhttps://huggingface.co/Nanbeige calendar_today28-12-2023 13:42:07

45 Tweet

1,1K Followers

202 Following

Nanbeige (@nanbeige) 's Twitter Profile Photo

Thanks for your attention! We updated Nanbeige-16B-Chat(huggingface.co/Nanbeige/Nanbe…) last month, you can try it!

Nanbeige (@nanbeige) 's Twitter Profile Photo

Nanbeige2-8B-Chat(huggingface.co/Nanbeige/Nanbe…) in AlpacaEval Leaderboard(tatsu-lab.github.io/alpaca_eval/)

Nanbeige2-8B-Chat(huggingface.co/Nanbeige/Nanbe…) in AlpacaEval Leaderboard(tatsu-lab.github.io/alpaca_eval/)
Nanbeige (@nanbeige) 's Twitter Profile Photo

Nanbeige2-8B-Chat(huggingface.co/Nanbeige/Nanbe…) achieved the highest score under 10B models in FlagEval(flageval.baai.ac.cn/#/trending)

Nanbeige2-8B-Chat(huggingface.co/Nanbeige/Nanbe…) achieved the highest score under 10B models in FlagEval(flageval.baai.ac.cn/#/trending)
Nanbeige (@nanbeige) 's Twitter Profile Photo

Nanbeige Plus Chat(huggingface.co/spaces/Nanbeig…) made a high score in AlpacaEval(tatsu-lab.github.io/alpaca_eval/).

Nanbeige (@nanbeige) 's Twitter Profile Photo

We published our model Nanbeige2-16B-Chat(huggingface.co/Nanbeige/Nanbe…) with MT-Bench 8.6, AlpacaEval2.0 LC WinRate 43% and AlignBench 7.62. And a new open source model with the context window of 1 million tokens is on the road. Enjoy :-)

Nanbeige (@nanbeige) 's Twitter Profile Photo

Nanbiege2-16B-Chat(huggingface.co/Nanbeige/Nanbe…) in FlagEval's opensource model Leaderboard(flageval.baai.ac.cn/#/leaderboard)

Nanbiege2-16B-Chat(huggingface.co/Nanbeige/Nanbe…) in FlagEval's opensource model Leaderboard(flageval.baai.ac.cn/#/leaderboard)
Nanbeige (@nanbeige) 's Twitter Profile Photo

Nanbiege2-16B-Chat(huggingface.co/Nanbeige/Nanbe…) in Opencompass 24-05 opensource model Leaderboard (rank.opencompass.org.cn/home)

Nanbiege2-16B-Chat(huggingface.co/Nanbeige/Nanbe…) in Opencompass 24-05 opensource model Leaderboard (rank.opencompass.org.cn/home)
Nanbeige (@nanbeige) 's Twitter Profile Photo

Nanbeige2-16B-Chat(huggingface.co/Nanbeige/Nanbe…)made a high score in OpenCompass Leaderboard of May (Subject Part) compared with other opensource models.

Nanbeige (@nanbeige) 's Twitter Profile Photo

We published our new model Nanbeige4-3B-Thinking-2511(huggingface.co/Nanbeige/Nanbe…), which achieved state-of-the-art (SOTA) results among models smaller than 32B parameters on Arena-Hard-V2 and BFCL-V4.

We published our new model Nanbeige4-3B-Thinking-2511(huggingface.co/Nanbeige/Nanbe…), which achieved state-of-the-art (SOTA) results among models smaller than 32B parameters on Arena-Hard-V2 and BFCL-V4.
Tiezhen WANG (@xianbao_qian) 's Twitter Profile Photo

China’s depth of STEM talent is the ultimate refutation of the "concentration of power." After Xiaomi, RedNote, Meituan (Chinese DoorDash) and many others, now BOSS Zhipin (a ~$10B mkt cap recruiting app) have also joined the game and open-sourced a small yet powerful model.

China’s depth of STEM talent is the ultimate refutation of the "concentration of power."

After Xiaomi, RedNote, Meituan (Chinese DoorDash) and many others, now BOSS Zhipin (a ~$10B mkt cap recruiting app) have also joined the game and open-sourced a small yet powerful model.
Privacy AI - offline models & remote AI client (@best_privacy_ai) 's Twitter Profile Photo

Adina Yakup Nanbeige Tested Nanbeige4-3B-Thinking(Q3_K_S) locally in Privacy AI with on-device tool calling (search_web). Performance on iOS is excellent. At 3B, it’s lightweight enough to serve as a practical daily offline assistant, yet still handles reasoning and tool use reliably. Congrats to

ModelScope (@maasai42) 's Twitter Profile Photo

🤖Meet Nanbeige4-3B from Boss Zhipin—a 3B-parameter LLM that outperforms Qwen3-32B on math (AIME), science (GPQA), and tool calling (BFCL-V4), while matching Qwen3-30B-A3B on human preference alignment (Arena-Hard-V2). How? ✅ 23T tokens of ultra-curated data ✅ Fine-grained WSD

🤖Meet Nanbeige4-3B from Boss Zhipin—a 3B-parameter LLM that outperforms Qwen3-32B on math (AIME), science (GPQA), and tool calling (BFCL-V4), while matching Qwen3-30B-A3B on human preference alignment (Arena-Hard-V2).

How?
✅ 23T tokens of ultra-curated data
✅ Fine-grained WSD
Nanbeige (@nanbeige) 's Twitter Profile Photo

In the Berkeley Function Calling Leaderboard(gorilla.cs.berkeley.edu/leaderboard.ht…), Nanbeige4-3B-Thinking-2511(huggingface.co/Nanbeige/Nanbe…) ranks 25th overall, ranking among the top 10 open-source models and outperforming Qwen3-32B, despite it's only a 3B model.

In the Berkeley Function Calling Leaderboard(gorilla.cs.berkeley.edu/leaderboard.ht…), Nanbeige4-3B-Thinking-2511(huggingface.co/Nanbeige/Nanbe…) ranks 25th overall, ranking among the top 10 open-source models and outperforming Qwen3-32B, despite it's only a 3B model.
N8 Programs (@n8programs) 's Twitter Profile Photo

Intriguing new model called 'Nanbeige/Nanbeige4.1-3B' released, appears to be *extremely* SOTA for its size range. So much so that I question if benchmaxxed. But Nanbeige appears to be a small but real lab out of China so I have faith! Quite exciting - will test.

Intriguing new model called 'Nanbeige/Nanbeige4.1-3B' released, appears to be *extremely* SOTA for its size range. So much so that I question if benchmaxxed. But <a href="/nanbeige/">Nanbeige</a> appears to be a small but real lab out of China so I have faith! Quite exciting - will test.
Privacy AI - offline models & remote AI client (@best_privacy_ai) 's Twitter Profile Photo

Nanbeige Congrats! Now everyone with iOS devices can try the Nanbeige4.1-3B model immediately on their phone. This model excels at tool calling and tends to output many thinking tokens, which requires a large context window. I set 12K context on my iPhone 16 Pro Max with 8K max output,

Nanbeige (@nanbeige) 's Twitter Profile Photo

N8 Programs Thank you again for your interest! We hope the model will attract wider attention and be tested by the community to evaluate its performance. The technical report will be released tomorrow—stay tuned! 🌟