Chen Qian (@chenmoneyq) 's Twitter Profile
Chen Qian

@chenmoneyq

AI Engineer@Databricks, maintainer of DSPy. I am passionate about open source, and building AI products.

ID: 799385380810211328

calendar_today17-11-2016 22:55:01

46 Tweet

420 Followers

81 Following

Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
Lysandre (@lysandrejik) 's Twitter Profile Photo

I have bittersweet news to share. Yesterday we merged a PR deprecating TensorFlow and Flax support in transformers. Going forward, we're focusing all our efforts on PyTorch to remove a lot of the bloating in the transformers library. Expect a simpler toolkit, across the board.

I have bittersweet news to share.

Yesterday we merged a PR deprecating TensorFlow and Flax support in transformers.

Going forward, we're focusing all our efforts on PyTorch to remove a lot of the bloating in the transformers library. Expect a simpler toolkit, across the board.
Chen Qian (@chenmoneyq) 's Twitter Profile Photo

Is LLaDA (github.com/ML-GSAI/LLaDA) doing a similar thing as Gemini Diffusion? Gemini Diffusion is doing a pretty solid work on the tasks I tried, so I am wondering when we can have a standard text diffusion model in the open source community.

Anmol Gulati (@anmol01gulati) 's Twitter Profile Photo

XBench just released an Agentic eval leaderboard: xbench.org that seems to capture both frontier capabilities and real world jobs and usecases, which seems to tick most of the boxes!

Chen Qian (@chenmoneyq) 's Twitter Profile Photo

We are adding more real-world examples to DSPy tutorials: dspy.ai/tutorials/real…, please check them out and let us know what else you want to see and learn! We welcome contributions! Create a feature request and describe what you want to ship, we can discuss from there!

Mayee Chen (@mayeechen) 's Twitter Profile Photo

LLMs often generate correct answers but struggle to select them. Weaver tackles this by combining many weak verifiers (reward models, LM judges) into a stronger signal using statistical tools from Weak Supervision—matching o3-mini-level accuracy with much cheaper models! 📊

LLMs often generate correct answers but struggle to select them. Weaver tackles this by combining many weak verifiers (reward models, LM judges) into a stronger signal using statistical tools from Weak Supervision—matching o3-mini-level accuracy with much cheaper models! 📊
MLflow (@mlflow) 's Twitter Profile Photo

In this clip from Data+AI Summit, Chen Qian talks about the release of DSPy 3, which brings production-ready capabilities, seamless #MLflow integration, streaming and async support, and advanced optimizers like Simba. Chen also explains how DSPy 3 streamlines prompt engineering

Chen Qian (@chenmoneyq) 's Twitter Profile Photo

I do feel the trend of myself becoming "lazy" these years. It doesn't necessarily mean my productivity is dropping, but I do feel less passionate about diving deep to learn a new framework/language/algorithm because I can rely on the LLMs to coach me.

Chen Qian (@chenmoneyq) 's Twitter Profile Photo

This is an underrated DSPy module in my opinion, and I am happy to see the community finds it out. Also probably my bad not to provide concrete use cases for it...

alphaXiv (@askalphaxiv) 's Twitter Profile Photo

"How Many Instructions Can LLMs Follow at Once?" In this paper they found that leading LLMs can satisfy only about 68% of 500 concurrent instructions, showing a bias toward earlier instructions.

"How Many Instructions Can LLMs Follow at Once?"

In this paper they found that leading LLMs can satisfy only about 68% of 500 concurrent instructions, showing a bias toward earlier instructions.
Chen Qian (@chenmoneyq) 's Twitter Profile Photo

Cool and inspiring direction! I feel a bit strange about the evaluation part though. For example, pushing HotPotQA to 27.72% accuracy is not very exciting.. in addition, using GPT-2 as the baseline (table 8) is odd.