Simran Kaur (@kaur_simran25) Twitter Tweets • TwiCopy

Simran Kaur

@kaur_simran25

+ Follow

PhD Student @PrincetonCS. Previously @acmi_lab and undergrad @SCSatCMU.

ID: 1529665454595354624

calendar_today26-05-2022 03:27:34

21 Tweet

262 Followers

318 Following

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

3 years ago

Excited to announce that my first published paper (!!) will be a spotlight at the #NeurIPS2022 Higher-Order Optimization workshop on Dec 2nd! Huge thanks to my co-authors Simran Kaur Tanya Marwah Saurabh Garg Zachary Lipton, paper thread coming soon! order-up-ml.github.io/papers/

thumb_up_off_alt19

chat_bubble_outline0

repeat5

shareShare

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

3 years ago

1/n ‼️ Our spotlight (and now BEST POSTER!) work from the Higher Order Optimization workshop at #NeurIPS2022 is now on arxiv! Paper 📖: arxiv.org/abs/2211.15853 w/Simran Kaur Tanya Marwah Saurabh Garg Zachary Lipton

thumb_up_off_alt23

chat_bubble_outline1

repeat7

shareShare

Zachary Novack @ICLR2025 🇸🇬

@zacknovack

3 years ago

Our work on understanding the mechanisms behind implicit regularization in SGD was just accepted to #ICLR2023 ‼️ Huge thanks to my collaborators Simran Kaur Tanya Marwah Saurabh Garg Zachary Lipton 🙂 Check out the thread below for more info:

thumb_up_off_alt44

chat_bubble_outline2

repeat6

shareShare

Simran Kaur

@kaur_simran25

2 years ago

Excited to share our latest work: Skill-Mix, a new take on LLM evaluation that tests a model's ability to combine basic language skills! Check out the Skill-Mix demo here: huggingface.co/spaces/dingliy…

thumb_up_off_alt15

chat_bubble_outline0

repeat1

shareShare

Sadhika Malladi

@sadhikamalladi

2 years ago

Blog post about how to scale training runs to highly distributed settings (i.e., large batch sizes)! Empirical insights from my long-ago work on stochastic differential equations (SDEs). Written to be accessible - give it a shot! cs.princeton.edu/~smalladi/blog…

thumb_up_off_alt380

chat_bubble_outline8

repeat73

shareShare

Sanjeev Arora

@prfsanjeevarora

a year ago

1/ New instruction-following dataset INSTRUCT-SKILLMIX! Supervised fine-tuning (SFT) with just 2K-4K (query, answer) pairs gives small “base LLMs” Mistral v0.2 7B and LLaMA3 8B performance rivalling some frontier models (AlpacaEval 2.0 score). No RL, no expensive human data.

thumb_up_off_alt152

chat_bubble_outline4

repeat18

shareShare

Xingyu Zhu

@xingyuzhu_

10 months ago

Kids use open textbooks for homework. Can LLM training benefit from "helpful textbooks" in context with no gradients computed on these tokens? We call this Context-Enhanced Learning – it can exponentially accelerate training while avoiding verbatim memorization of “textbooks”!

thumb_up_off_alt185

chat_bubble_outline7

repeat20

shareShare

Abhishek Panigrahi

@abhishek_034

8 months ago

🎉Excited to present 2 papers at #ICLR2025 in Singapore! 🧠 Progressive distillation induces an implicit curriculum 📢 Oral: Sat, 4:30–4:42pm @ Garnet 216–218 🖼️ Poster: Sat, 10:00am–12:30pm (#632) ⚙️ Efficient stagewise pretraining via progressive subnetworks 🖼️ Poster:

thumb_up_off_alt59

chat_bubble_outline2

repeat10

shareShare