Shruti Singh @ ACL 2024 (@shruti_rsingh) 's Twitter Profile
Shruti Singh @ ACL 2024

@shruti_rsingh

Representation Learning for Scientific Literature | #NLProc | Fulbright fellow @yale | CS Ph.D. Student @iitgn | Past @daiictofficial

ID: 2722176900

linkhttp://shruti-singh.github.io calendar_today10-08-2014 18:26:56

194 Tweet

247 Followers

1,1K Following

Tengyu Ma (@tengyuma) 's Twitter Profile Photo

Adam, a 9-yr old optimizer, is the go-to for training LLMs (eg, GPT-3, OPT, LLAMA). Introducing Sophia, a new optimizer that is 2x faster than Adam on LLMs. Just a few more lines of code could cut your costs from $2M to $1M (if scaling laws hold). arxiv.org/abs/2305.14342 🧵⬇️

Adam, a 9-yr old optimizer, is the go-to for training LLMs (eg, GPT-3, OPT, LLAMA).

Introducing Sophia, a new optimizer that is 2x faster than Adam on LLMs. Just a few more lines of code could cut your costs from $2M to $1M (if scaling laws hold).

arxiv.org/abs/2305.14342 🧵⬇️
Eden Marco (@edenemarco177) 's Twitter Profile Photo

1/13 🧵💡 Ever wondered how to handle token limitations of LLMs? Here's one strategy of the "map-reduce" technique implemented in LangChain 🦜🔗 Let's deep dive! Harrison Chase 's your PR is under review again😎

Gowthami Somepalli (@gowthami_s) 's Twitter Profile Photo

A 5 day event on LLMs. Line up looks interesting. (All talks are streamed live on YouTube) simons.berkeley.edu/workshops/larg… #machinelearning

A 5 day event on LLMs. Line up looks interesting. (All talks are streamed live on YouTube) 

simons.berkeley.edu/workshops/larg…

#machinelearning
Yixin Liu (@yixinliu17) 's Twitter Profile Photo

Excited to share our work on "Benchmarking Generation and Evaluation Capabilities of LLMs for Instruction Controllable Summarization"! As LLMs excel in generic summarization, we must explore more complex task settings.🧵 arxiv.org/abs/2311.09184 Equal contribution Alex Fabbri

Excited to share our work on "Benchmarking Generation and Evaluation Capabilities of LLMs for Instruction Controllable Summarization"! As LLMs excel in generic summarization, we must explore more complex task settings.🧵
arxiv.org/abs/2311.09184
Equal contribution <a href="/alexfabbri4/">Alex Fabbri</a>
NeurIPS Conference (@neuripsconf) 's Twitter Profile Photo

**Test of Time** Distributed Representations of Words and Phrases and their Compositionality **Outstanding Main Track Papers** Privacy Auditing with One (1) Training Run Are Emergent Abilities of Large Language Models a Mirage?

Mayank (@mayank_iitgn) 's Twitter Profile Photo

#GandhiPedia is getting inaugurated today. A great moment for us to celebrate the technology and collaborative efforts from diverse set of experts. A huge shout out for IIT Kharagpur🇮🇳, IIT Gandhinagar, National Council of Science Museums-NCSM teams.

Cognitive Sciences @ IITGN (@cogsiitgn) 's Twitter Profile Photo

Attention cognitive science and neuroscience enthusiasts! Deadline extension Alert! The application deadline for IITGN’s MSc programme in Cognitive Science has now been extended till Jan 24. Don’t miss out on this opportunity and apply now! Click here : lnkd.in/ddjA2QUK

Attention cognitive science and neuroscience enthusiasts! Deadline extension Alert! The application deadline for IITGN’s MSc programme in Cognitive Science has now been extended till Jan 24. Don’t miss out on this opportunity and apply now!
Click here : lnkd.in/ddjA2QUK
Nipun Batra (@nipun_batra) 's Twitter Profile Photo

Air pollution is a major problem in India potentially harming life expectancy by several years in India. Come help us! We have a couple of JRF openings in our lab CSE@IITGN IIT Gandhinagar 1. Air Quality exposure estimation and forecasting: drive.google.com/file/d/16P-Y8D… 2. Active

Pulkit Agrawal (@pulkitology) 's Twitter Profile Photo

Presenting a method for training models from SCRATCH using LoRA: 💡20x reduction in communication 💡3x savings in memory - Find out more: minyoungg.github.io/LTE/ - Code available to try out - Scaling to larger models ongoing - arxiv.org/pdf/2402.16828… led by Jacob Huh!

Presenting a method for training models from SCRATCH using LoRA: 
💡20x reduction in communication 
💡3x savings in memory

- Find out more: minyoungg.github.io/LTE/ 
- Code available to try out
- Scaling to larger models ongoing
- arxiv.org/pdf/2402.16828…

led by Jacob Huh!
Leshem Choshen C U @ ICLR 🤖🤗 (@lchoshen) 's Twitter Profile Photo

How to make LMs learn an abstract language representation from learning 2 languages on the same concept? A shared token helps, sharing output language helps more arxiv.org/abs/2404.12444 Tianze Hua Tian Yun Brown NLP (no, I won't put it at the top hype-seekers)

How to make LMs learn an abstract language representation from learning 2 languages on the same concept?
A shared token helps, sharing output language helps more

arxiv.org/abs/2404.12444

Tianze Hua <a href="/tianyunnn/">Tian Yun</a> 
<a href="/Brown_NLP/">Brown NLP</a> (no, I won't put it at the top hype-seekers)
Sedrick Keh (@sedrickkeh2) 's Twitter Profile Photo

📢 Releasing TRI's open-source Mamba-7B trained on 1.2T tokens of RefinedWeb! Mamba-7B is the largest fully recurrent Mamba model trained and is a state-of-the-art recurrent LLM. 🚀🚀🚀 huggingface.co/TRI-ML/mamba-7…

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference arxiv.org/abs/2404.19737

Meta presents Better &amp; Faster Large Language Models via Multi-token Prediction

- training language models to predict multiple future tokens at once results in higher sample efficiency
- up to 3x faster at inference

arxiv.org/abs/2404.19737
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Meta presents Iterative Reasoning Preference Optimization Increasing accuracy for Llama-2-70B-Chat: - 55.6% -> 81.6% on GSM8K - 12.5% -> 20.8% on MATH - 77.8% -> 86.7% on ARC-Challenge arxiv.org/abs/2404.19733

Meta presents Iterative Reasoning Preference Optimization

Increasing accuracy for Llama-2-70B-Chat: 
- 55.6% -&gt; 81.6% on GSM8K
- 12.5% -&gt; 20.8% on MATH
- 77.8% -&gt; 86.7% on ARC-Challenge

arxiv.org/abs/2404.19733
Yi Tay (@yitayml) 's Twitter Profile Photo

New paper from Reka 🔥 (yes an actual paper). This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today. The fun part here is that we constructed it by trying to

New paper from <a href="/RekaAILabs/">Reka</a> 🔥 (yes an actual paper).

This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today. 

The fun part here is that we constructed it by trying to
Neeldhara 🐦|🐘 (@neeldhara) 's Twitter Profile Photo

📣 Please make note of the #FSTTCS2024 call for papers: three weeks to the abstract submission deadline! The PC chairs have already lined up a phenomenal set of invited talks. 🤩 Do help us spread the word and plan to submit and/or attend. Links in the next tweet. (1/2)

📣 Please make note of the #FSTTCS2024 call for papers: three weeks to the abstract submission deadline! The PC chairs have already lined up a phenomenal set of invited talks. 🤩

Do help us spread the word and plan to submit and/or attend. Links in the next tweet.

(1/2)
Kejian Shi (@shi_kejian) 's Twitter Profile Photo

Introducing SciRIFF, a toolkit to enhance LLM instruction-following over scientific literature. 137k expert demonstrations in 5 categories: IE, summarization, QA, entailment, and classification; models up to 70b and code to science-tune your checkpoints included! Read more in 🧵:

Introducing SciRIFF, a toolkit to enhance LLM instruction-following over scientific literature. 137k expert demonstrations in 5 categories: IE, summarization, QA, entailment, and classification; models up to 70b and code to science-tune your checkpoints included! Read more in 🧵:
Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

It’s Sunday morning we have some time with the coffee so let me tell you about some of our recent surprising journey in synthetic data and small language models. This post is prompted by the coming release of an instant, in-browser model called SmolLM360 (link at the end) The

It’s Sunday morning we have some time with the coffee so let me tell you about some of our recent surprising journey in synthetic data and small language models.

This post is prompted by the coming release of an instant, in-browser model called SmolLM360 (link at the end)

The
Jimmy Lin (@lintool) 's Twitter Profile Photo

For vector search, practitioners kinda know that for small corpora, don't bother with HNSW indexing, just brute-force it. However, guidance is mostly hand wavy... until now. I ran some experiments for you on BEIR and wrote it up. arxiv.org/abs/2409.06464 You're welcome.

Mayank (@mayank_iitgn) 's Twitter Profile Photo

🌟 Join the Lingo IITGN as a PhD student! 🌟 Dive into cutting-edge NLP research, including Multilingual & Multimodal LLMs and SLMs.💻 Access SOTA computing & GPUs🔥 Strong funding & industry connections 📈 Proven publication record 🚀 Shape impactful products with us! #NLP #LLM