Shruti Singh @ ACL 2024 (@shruti_rsingh) Twitter Tweets • TwiCopy

Tengyu Ma

3 years ago

Adam, a 9-yr old optimizer, is the go-to for training LLMs (eg, GPT-3, OPT, LLAMA). Introducing Sophia, a new optimizer that is 2x faster than Adam on LLMs. Just a few more lines of code could cut your costs from $2M to $1M (if scaling laws hold). arxiv.org/abs/2305.14342 🧵⬇️

thumb_up_off_alt3,3K

chat_bubble_outline96

repeat621

shareShare

Eden Marco

@edenemarco177

2 years ago

1/13 🧵💡 Ever wondered how to handle token limitations of LLMs? Here's one strategy of the "map-reduce" technique implemented in LangChain 🦜🔗 Let's deep dive! Harrison Chase 's your PR is under review again😎

thumb_up_off_alt368

chat_bubble_outline16

repeat60

shareShare

Gowthami Somepalli

@gowthami_s

2 years ago

A 5 day event on LLMs. Line up looks interesting. (All talks are streamed live on YouTube) simons.berkeley.edu/workshops/larg… #machinelearning

thumb_up_off_alt174

chat_bubble_outline1

repeat28

shareShare

Instruction Workshop, NeurIPS 2023

@itif_workshop

2 years ago

Excited to announce the Workshop on 👷🛠️🤖 Instruction Tuning & Instruction Following 👷🛠️🤖 (ITIF) at #NeurIPS2023! 📅 Join us on Dec 15 in New Orleans 🛠️ Submit by Oct 1 👷 See speaker lineup 🔗an-instructive-workshop.github.io 1/

thumb_up_off_alt63

chat_bubble_outline1

repeat17

shareShare

Yixin Liu

@yixinliu17

2 years ago

Excited to share our work on "Benchmarking Generation and Evaluation Capabilities of LLMs for Instruction Controllable Summarization"! As LLMs excel in generic summarization, we must explore more complex task settings.🧵 arxiv.org/abs/2311.09184 Equal contribution Alex Fabbri

thumb_up_off_alt57

chat_bubble_outline1

repeat12

shareShare

NeurIPS Conference

@neuripsconf

2 years ago

**Test of Time** Distributed Representations of Words and Phrases and their Compositionality **Outstanding Main Track Papers** Privacy Auditing with One (1) Training Run Are Emergent Abilities of Large Language Models a Mirage?

thumb_up_off_alt91

chat_bubble_outline1

repeat9

shareShare

Mayank

@mayank_iitgn

2 years ago

#GandhiPedia is getting inaugurated today. A great moment for us to celebrate the technology and collaborative efforts from diverse set of experts. A huge shout out for IIT Kharagpur🇮🇳, IIT Gandhinagar, National Council of Science Museums-NCSM teams.

thumb_up_off_alt26

chat_bubble_outline2

repeat9

shareShare

Cognitive Sciences @ IITGN

@cogsiitgn

2 years ago

Attention cognitive science and neuroscience enthusiasts! Deadline extension Alert! The application deadline for IITGN’s MSc programme in Cognitive Science has now been extended till Jan 24. Don’t miss out on this opportunity and apply now! Click here : lnkd.in/ddjA2QUK

thumb_up_off_alt16

chat_bubble_outline1

repeat6

shareShare

Nipun Batra

@nipun_batra

2 years ago

Air pollution is a major problem in India potentially harming life expectancy by several years in India. Come help us! We have a couple of JRF openings in our lab CSE@IITGN IIT Gandhinagar 1. Air Quality exposure estimation and forecasting: drive.google.com/file/d/16P-Y8D… 2. Active

thumb_up_off_alt54

chat_bubble_outline1

repeat17

shareShare

Pulkit Agrawal

@pulkitology

2 years ago

Presenting a method for training models from SCRATCH using LoRA: 💡20x reduction in communication 💡3x savings in memory - Find out more: minyoungg.github.io/LTE/ - Code available to try out - Scaling to larger models ongoing - arxiv.org/pdf/2402.16828… led by Jacob Huh!

thumb_up_off_alt384

chat_bubble_outline6

repeat56

shareShare

Leshem Choshen C U @ ICLR 🤖🤗

@lchoshen

2 years ago

How to make LMs learn an abstract language representation from learning 2 languages on the same concept? A shared token helps, sharing output language helps more arxiv.org/abs/2404.12444 Tianze Hua Tian Yun Brown NLP (no, I won't put it at the top hype-seekers)

thumb_up_off_alt38

chat_bubble_outline2

repeat6

shareShare

Sedrick Keh

@sedrickkeh2

2 years ago

📢 Releasing TRI's open-source Mamba-7B trained on 1.2T tokens of RefinedWeb! Mamba-7B is the largest fully recurrent Mamba model trained and is a state-of-the-art recurrent LLM. 🚀🚀🚀 huggingface.co/TRI-ML/mamba-7…

thumb_up_off_alt270

chat_bubble_outline13

repeat50

shareShare

Aran Komatsuzaki

@arankomatsuzaki

2 years ago

Meta presents Better & Faster Large Language Models via Multi-token Prediction - training language models to predict multiple future tokens at once results in higher sample efficiency - up to 3x faster at inference arxiv.org/abs/2404.19737

thumb_up_off_alt882

chat_bubble_outline16

repeat133

shareShare

Aran Komatsuzaki

@arankomatsuzaki

2 years ago

Meta presents Iterative Reasoning Preference Optimization Increasing accuracy for Llama-2-70B-Chat: - 55.6% -> 81.6% on GSM8K - 12.5% -> 20.8% on MATH - 77.8% -> 86.7% on ARC-Challenge arxiv.org/abs/2404.19733

thumb_up_off_alt469

chat_bubble_outline5

repeat92

shareShare

Yi Tay

@yitayml

2 years ago

New paper from Reka 🔥 (yes an actual paper). This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today. The fun part here is that we constructed it by trying to

New paper from <a href="/RekaAILabs/">Reka</a> 🔥 (yes an actual paper).

This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today.

The fun part here is that we constructed it by trying to

thumb_up_off_alt565

chat_bubble_outline22

repeat85

shareShare

Neeldhara 🐦|🐘

@neeldhara

a year ago

📣 Please make note of the #FSTTCS2024 call for papers: three weeks to the abstract submission deadline! The PC chairs have already lined up a phenomenal set of invited talks. 🤩 Do help us spread the word and plan to submit and/or attend. Links in the next tweet. (1/2)

thumb_up_off_alt32

chat_bubble_outline1

repeat19

shareShare

Kejian Shi

@shi_kejian

a year ago

Introducing SciRIFF, a toolkit to enhance LLM instruction-following over scientific literature. 137k expert demonstrations in 5 categories: IE, summarization, QA, entailment, and classification; models up to 70b and code to science-tune your checkpoints included! Read more in 🧵:

thumb_up_off_alt106

chat_bubble_outline1

repeat31

shareShare

Thomas Wolf

@thom_wolf

a year ago

It’s Sunday morning we have some time with the coffee so let me tell you about some of our recent surprising journey in synthetic data and small language models. This post is prompted by the coming release of an instant, in-browser model called SmolLM360 (link at the end) The

thumb_up_off_alt510

chat_bubble_outline14

repeat113

shareShare

Jimmy Lin

@lintool

a year ago

For vector search, practitioners kinda know that for small corpora, don't bother with HNSW indexing, just brute-force it. However, guidance is mostly hand wavy... until now. I ran some experiments for you on BEIR and wrote it up. arxiv.org/abs/2409.06464 You're welcome.

thumb_up_off_alt261

chat_bubble_outline9

repeat48

shareShare

Mayank

@mayank_iitgn

a year ago

🌟 Join the Lingo IITGN as a PhD student! 🌟 Dive into cutting-edge NLP research, including Multilingual & Multimodal LLMs and SLMs.💻 Access SOTA computing & GPUs🔥 Strong funding & industry connections 📈 Proven publication record 🚀 Shape impactful products with us! #NLP #LLM

thumb_up_off_alt10

chat_bubble_outline0

repeat4

shareShare