Shoaib Ahmed Siddiqui (@shoaibasiddiqui) 's Twitter Profile
Shoaib Ahmed Siddiqui

@shoaibasiddiqui

PhD student @CambridgeMLG | Ex-intern @MSR @NVIDIA @DFKI | Primarily interested in SSL, LLMs, data auditing, and empirical theory of deep learning

ID: 3124646343

linkhttp://shoaibahmed.github.io calendar_today28-03-2015 20:01:10

134 Tweet

690 Takipçi

4,4K Takip Edilen

Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🚀 Exciting findings from our recent work on depth pruning in LLMs! 1️⃣ MMLU isn't fully indicative; reasoning tasks like GSM8k drop sharply. 2️⃣ Attention layers can be pruned more than MLP layers with less impact. 3️⃣ Output target loss compression (Shapley metric) performs best.

Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🚀 We've pruned LLaMa3.1 down to 4B parameters, delivering a smaller and more efficient model! Based on our recent paper: arxiv.org/abs/2407.14679 📖 Learn all about it in our blog: developer.nvidia.com/blog/how-to-pr… 🔗 META's announcement: ai.meta.com/blog/nvidia-ll… 👐 Checkpoints at HF this

🚀 We've pruned LLaMa3.1 down to 4B parameters, delivering a smaller and more efficient model! Based on our recent paper: arxiv.org/abs/2407.14679

📖 Learn all about it in our blog: developer.nvidia.com/blog/how-to-pr… 
🔗 META's announcement: ai.meta.com/blog/nvidia-ll…
👐 Checkpoints at HF this
Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🌟 The best 8B Base model via pruning and distillation! 🚀 Introducing Mistral-NeMo-Minitron-8B-Base model we derived from the recent Mistral-NeMo-12B. Our recipe: finetune teacher on 100B tokens, prune to 8B params, run teacher-student distillation on <400B tokens. Result: the

🌟 The best 8B Base model via pruning and distillation!

🚀 Introducing Mistral-NeMo-Minitron-8B-Base model we derived from the recent Mistral-NeMo-12B.
Our recipe: finetune teacher on 100B tokens, prune to 8B params, run teacher-student distillation on &lt;400B tokens.
Result: the
Kamyar Azizzadenesheli (@azizzadenesheli) 's Twitter Profile Photo

AI+Weather/Climate A thorough study of problem design in AI+Weather/Climate. As a new field, there has been an urgent need to establish the importance of design components in weather+AI. What matters, by how much, and with what cost. A study that we aim to address. We study

AI+Weather/Climate

A thorough study of problem design in AI+Weather/Climate.

As a new field, there has been an urgent need to establish the importance of design components in weather+AI.  What matters, by how much, and with what cost. A study that we aim to address. 

We study
Cambridge MLG (@cambridgemlg) 's Twitter Profile Photo

✨Applications are now open for PhDs at the Cambridge Machine Learning Group!✨ We're looking for outstanding candidates interested in fundamental ML research and applications to scientific domains! More info: mlg.eng.cam.ac.uk/phd_programme_… 🧵Find more about PIs & focus areas below!

Ali Shahin Shamsabadi (@alishahinshams1) 's Twitter Profile Photo

Intern position at Brave : brave.com/careers/ My team is looking for strong students interested in private, secure and trustworthy ML. Feel free to email me with the subject line "Brave Internship 2025" and highlight your 3 most significant publications on these topics.

Max Nadeau (@maxnadeau_) 's Twitter Profile Photo

🧵 Announcing Open Philanthropy's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

🧵 Announcing <a href="/open_phil/">Open Philanthropy</a>'s Technical AI Safety RFP!

We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.
David Krueger (@davidskrueger) 's Twitter Profile Photo

POSTER 1) Protecting against simultaneous data poisoning attacks Neel Alex, Shoaib Ahmed Siddiqui, et al. We introduce a more realistic setting: training data is poisoned in multiple ways. Existing methods fail, but our defense based on training dynamics works arxiv.org/abs/2408.13221

POSTER 1) Protecting against simultaneous data poisoning attacks Neel Alex, <a href="/ShoaibASiddiqui/">Shoaib Ahmed Siddiqui</a>, et al.

We introduce a more realistic setting: training data is poisoned in multiple ways.
Existing methods fail, but our defense based on training dynamics works
arxiv.org/abs/2408.13221