Simran Kaur (@kaur_simran25) 's Twitter Profile
Simran Kaur

@kaur_simran25

PhD Student @PrincetonCS. Previously @acmi_lab and undergrad @SCSatCMU.

ID: 1529665454595354624

calendar_today26-05-2022 03:27:34

21 Tweet

262 Followers

318 Following

Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

Excited to announce that my first published paper (!!) will be a spotlight at the #NeurIPS2022 Higher-Order Optimization workshop on Dec 2nd! Huge thanks to my co-authors Simran Kaur Tanya Marwah Saurabh Garg Zachary Lipton, paper thread coming soon! order-up-ml.github.io/papers/

Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

1/n ‼️ Our spotlight (and now BEST POSTER!) work from the Higher Order Optimization workshop at #NeurIPS2022 is now on arxiv! Paper 📖: arxiv.org/abs/2211.15853 w/Simran Kaur Tanya Marwah Saurabh Garg Zachary Lipton

Zachary Novack @ICLR2025 🇸🇬 (@zacknovack) 's Twitter Profile Photo

Our work on understanding the mechanisms behind implicit regularization in SGD was just accepted to #ICLR2023 ‼️ Huge thanks to my collaborators Simran Kaur Tanya Marwah Saurabh Garg Zachary Lipton 🙂 Check out the thread below for more info:

Simran Kaur (@kaur_simran25) 's Twitter Profile Photo

Excited to share our latest work: Skill-Mix, a new take on LLM evaluation that tests a model's ability to combine basic language skills! Check out the Skill-Mix demo here: huggingface.co/spaces/dingliy…

Sadhika Malladi (@sadhikamalladi) 's Twitter Profile Photo

Blog post about how to scale training runs to highly distributed settings (i.e., large batch sizes)! Empirical insights from my long-ago work on stochastic differential equations (SDEs). Written to be accessible - give it a shot! cs.princeton.edu/~smalladi/blog…

Sanjeev Arora (@prfsanjeevarora) 's Twitter Profile Photo

1/ New instruction-following dataset INSTRUCT-SKILLMIX! Supervised fine-tuning (SFT) with just 2K-4K (query, answer) pairs gives small “base LLMs” Mistral v0.2 7B and LLaMA3 8B performance rivalling some frontier models (AlpacaEval 2.0 score). No RL, no expensive human data.

1/ New instruction-following dataset INSTRUCT-SKILLMIX! Supervised fine-tuning (SFT) with  just 2K-4K (query, answer) pairs gives  small “base LLMs”  Mistral v0.2 7B and LLaMA3 8B performance rivalling some frontier models (AlpacaEval 2.0 score). No RL, no expensive human data.
Xingyu Zhu (@xingyuzhu_) 's Twitter Profile Photo

Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens? We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”!

Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens?

We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”!
Abhishek Panigrahi (@abhishek_034) 's Twitter Profile Photo

🎉Excited to present 2 papers at #ICLR2025 in Singapore! 🧠 Progressive distillation induces an implicit curriculum 📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218 🖼️ Poster: Sat, 10:00am–12:30pm (#632) ⚙️ Efficient stagewise pretraining via progressive subnetworks 🖼️ Poster:

🎉Excited to present 2 papers at #ICLR2025 in Singapore!

đź§  Progressive distillation induces an implicit curriculum  
📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218  
🖼️ Poster: Sat, 10:00am–12:30pm (#632)

⚙️ Efficient stagewise pretraining via progressive subnetworks  
🖼️ Poster: