Sang Michael Xie (@sangmichaelxie) 's Twitter Profile
Sang Michael Xie

@sangmichaelxie

Research Scientist at Meta GenAI / LLaMA. AI + ML + NLP + data. Prev: CS PhD @StanfordAILab @StanfordNLP @Stanford, @GoogleAI Brain/DeepMind

ID: 1133476937668542465

linkhttp://cs.stanford.edu/~eix calendar_today28-05-2019 20:55:35

365 Tweet

3,3K Takipçi

727 Takip Edilen

Helen Qu (@_helenqu) 's Twitter Profile Photo

today, gen AI performance is surprisingly robust to new data/tasks, even beating specialized models! the secret: training on large-scale unlabeled data. what can we as scientists learn from this? some thoughts on robustness & the power of the unlabeled data you already have:

today, gen AI performance is surprisingly robust to new data/tasks, even beating specialized models! the secret: training on large-scale unlabeled data.

what can we as scientists learn from this?

some thoughts on robustness & the power of the unlabeled data you already have:
Sang Michael Xie (@sangmichaelxie) 's Twitter Profile Photo

Connect Later, our targeted fine-tuning method for robust+accurate models, tops the WILDS leaderboard for iWildCam and Camelyon17 and achieves SoTA on astronomical time-series tasks (3 very different domains)! arxiv.org/abs/2402.03325

Stephan Xie (@stephofx) 's Twitter Profile Photo

belated but I'm excited to start a PhD at Machine Learning Dept. at Carnegie Mellon this fall as a NSF fellow!! I'm incredibly grateful to my mentors and advisors Aaron Roth, Kevin He, and Yi Xing for all their guidance and support along the way 😀

Ruoxi Jia (@ruoxijia) 's Twitter Profile Photo

Thrilled to be in Vienna for our ICLR workshop, Navigating and Addressing Data Problems for Foundation Models. Starting Saturday at 8:50 AM, our program features keynote talks, best paper presentations, a poster session, and a panel discussion. Explore the full schedule here!

Thrilled to be in Vienna for our ICLR workshop, Navigating and Addressing Data Problems for Foundation Models. Starting Saturday at 8:50 AM, our program features keynote talks, best paper presentations, a poster session, and a panel discussion. Explore the full schedule here!
Cartesia (@cartesia_ai) 's Twitter Profile Photo

Today, we’re excited to release the first step in our mission to build real time multimodal intelligence for every device: Sonic, a blazing fast  (🚀 135ms model latency), lifelike generative voice model and API. Read cartesia.ai/blog/sonic and try Sonic play.cartesia.ai

Today, we’re excited to release the first step in our mission to build real time multimodal intelligence for every device: Sonic, a blazing fast  (🚀 135ms model latency), lifelike generative voice model and API.

Read cartesia.ai/blog/sonic and try Sonic play.cartesia.ai
Guilherme Penedo (@gui_penedo) 's Twitter Profile Photo

We are (finally) releasing the 🍷 FineWeb technical report! In it, we detail and explain every processing decision we took, and we also introduce our newest dataset: 📚 FineWeb-Edu, a (web only) subset of FW filtered for high educational content. Link: hf.co/spaces/Hugging…

We are (finally) releasing the 🍷 FineWeb technical report!

In it, we detail and explain every processing decision we took, and we also introduce our newest dataset: 📚 FineWeb-Edu, a (web only) subset of FW filtered for high educational content.

Link: hf.co/spaces/Hugging…
Sang Michael Xie (@sangmichaelxie) 's Twitter Profile Photo

LMs can learn to design novel quantum experiments with only synthetic data. Problem: given quantum states of a few sizes N=0,1,2, output corresponding experiments to generate the states for N=0,1,2,3,4,… Key for extrapolation: output Python code to generate exps for any N!

Tri Dao (@tri_dao) 's Twitter Profile Photo

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS! 1/

FlashAttention is widely used to accelerate Transformers, already making attention 4-8x faster, but has yet to take advantage of modern GPUs. We’re releasing FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util), and FP8 gets close to 1.2 PFLOPS!
1/
Fahim Tajwar (@fahimtajwar10) 's Twitter Profile Photo

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited. Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot. 🧵 1/n

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited.

Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot.

🧵 1/n
Percy Liang (@percyliang) 's Twitter Profile Photo

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
Fahim Tajwar (@fahimtajwar10) 's Twitter Profile Photo

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers?

Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training!

🧵 1/n
Sukjun (June) Hwang (@sukjun_hwang) 's Twitter Profile Photo

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

Sang Michael Xie (@sangmichaelxie) 's Twitter Profile Photo

It’s easy to make VLMs do worse - just add an object that co-occurs rarely in training data. It’s a tight correlation (I don’t often see r=0.97+) for CLIP and VLMs built on CLIP They also bias towards “yes” over “no” for highly co-occurring concepts, regardless of the prompt

It’s easy to make VLMs do worse - just add an object that co-occurs rarely in training data. It’s a tight correlation (I don’t often see r=0.97+) for CLIP and VLMs built on CLIP

They also bias towards “yes” over “no” for highly co-occurring concepts, regardless of the prompt
Karan Goel (@krandiash) 's Twitter Profile Photo

We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -

Sang Michael Xie (@sangmichaelxie) 's Twitter Profile Photo

personal update: I’ve joined OpenAI recently, working in the intersection of synthetic data and RL. It’s been fun to shift my focus after working on pretraining for a while and take a broader perspective. So many big strides to be taken! I’m grateful for the chance to work with