Harpreet Singh (@harpreetmann24) 's Twitter Profile
Harpreet Singh

@harpreetmann24

ID: 1414330830575521794

calendar_today11-07-2021 21:08:48

52 Tweet

4 Followers

761 Following

Vaibhav Adlakha (@vaibhav_adlakha) 's Twitter Profile Photo

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N Paper: arxiv.org/abs/2404.05961

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N

Paper: arxiv.org/abs/2404.05961
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

New open model from Mistral AI! 🧠 Yesterday night, Mistral released Mixtral 8x22B a 176B MoE via magnet link. 🔗🤯 What we know so far: 🧮 176B MoE with ~40B active 📜 context length of 65k tokens. 🪨 Base model can be fine-tuned 👀 ~260GB VRAM in fp16, 73GB in int4 📜 Apache

New open model from <a href="/MistralAI/">Mistral AI</a>! 🧠 Yesterday night, Mistral released Mixtral 8x22B a 176B MoE via magnet link. 🔗🤯

What we know so far:
🧮 176B MoE with ~40B active
📜 context length of 65k tokens.
🪨 Base model can be fine-tuned
👀 ~260GB VRAM in fp16, 73GB in int4
📜 Apache
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

We can do it! 🙌 First open LLM outperforms OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B! 🤯 TL;DR; 🧮 Mixtral 8x22B based (141B-A40 MoE) 🔓 Apache 2.0 license 🤖 First > 9.00 on MT-Bench with an open LLM 🧬 Used multi-step

We can do it! 🙌 First open LLM outperforms <a href="/OpenAI/">OpenAI</a> GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B! 🤯

TL;DR;
🧮 Mixtral 8x22B based (141B-A40 MoE)
🔓 Apache 2.0 license
🤖 First &gt; 9.00 on MT-Bench with an open LLM
🧬 Used multi-step
Ai2 (@allen_ai) 's Twitter Profile Photo

Announcing our latest addition to the OLMo family, OLMo 1.7!🎉Our team's efforts to improve data quality, training procedures and model architecture have led to a leap in performance. See how OLMo 1.7 stacks up against its peers and peek into the technical details on the blog:

Announcing our latest addition to the OLMo family, OLMo 1.7!🎉Our team's efforts to improve data quality, training procedures and model architecture have led to a leap in performance. See how OLMo 1.7 stacks up against its peers and peek into the technical details on the blog:
AI at Meta (@aiatmeta) 's Twitter Profile Photo

Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3

Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes?? Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation

Sebastian Raschka (@rasbt) 's Twitter Profile Photo

"... do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no ... Thus, despite its recurrent formulation, the 'state' in an SSM is an illusion" 🎤✋🔥 arxiv.org/abs/2404.08819

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Easily Fine-tune AI at Meta Llama 3 70B! 🦙 I am excited to share a new guide on how to fine-tune Llama 3 70B with PyTorch FSDP, Q-Lora, and Flash Attention 2 (SDPA) using Hugging Face build for consumer-size GPUs (4x 24GB). 🚀 Blog: philschmid.de/fsdp-qlora-lla… The blog covers: 👨‍💻

Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Just learned that the RedPajama-V2 pretraining dataset is actually 30T tokens. 2x the size used for Llama 3 🤯 github.com/togethercomput…

Just learned that the RedPajama-V2 pretraining dataset is actually 30T tokens. 2x the size used for Llama 3 🤯
github.com/togethercomput…
Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion) Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B. Despite being small enough to be deployed on a phone (according to the technical

Extended Brain (@extended_brain) 's Twitter Profile Photo

arxiv.org/abs/2404.16811 "our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle."

Eugene Yan (@eugeneyan) 's Twitter Profile Photo

Here's an engaging intro to evals by Midnight Maniac Sri and Wil Chung. They've clearly put a lot of care and effort into it, where the content is well organized with plenty of illustrations throughout. Across 60 pages, they explain model vs. system evals, vibe checks and property-based

Here's an engaging intro to evals by <a href="/sridatta/">Midnight Maniac Sri</a> and <a href="/iamwil/">Wil Chung</a>. They've clearly put a lot of care and effort into it, where the content is well organized with plenty of illustrations throughout.

Across 60 pages, they explain model vs. system evals,  vibe checks and property-based
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

The most comprehensive overview of LLM-as-a-Judge! READ IT‼️ "Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)” summarizes and analyzes two dozen papers on different LLM as Judge approaches.🤯 TL;DR; ⚖️ Direct scoring is suitable for objective evaluations, while

The most comprehensive overview of LLM-as-a-Judge! READ IT‼️ "Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)” summarizes and analyzes two dozen papers on different LLM as Judge approaches.🤯

TL;DR;
⚖️ Direct scoring is suitable for objective evaluations, while
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Thats Big! ⛰️ FRAMES released by Google AI! FRAMES is a comprehensive evaluation dataset designed to test Retrieval-Augmented Generation (RAG) Applications on factuality, retrieval accuracy, and reasoning. It includes multi-hop questions that demand sophisticated retrieval and

Thats Big! ⛰️ FRAMES released by <a href="/GoogleAI/">Google AI</a>! FRAMES is a comprehensive evaluation dataset designed to test Retrieval-Augmented Generation (RAG) Applications on factuality, retrieval accuracy, and reasoning. It includes multi-hop questions that demand sophisticated retrieval and
Eugene Yan (@eugeneyan) 's Twitter Profile Photo

♥️ this writeup from Anthropic for so many reasons: • Reiterating bm25 + semantic retrieval is standard RAG • Not just sharing what worked but also what didn't work • Evals on various data (code, fiction, arXiv) + embeddings • Breaking down gains from each step More of

♥️ this writeup from <a href="/AnthropicAI/">Anthropic</a> for so many reasons:
• Reiterating bm25 + semantic retrieval is standard RAG
• Not just sharing what worked but also what didn't work
• Evals on various data (code, fiction, arXiv) + embeddings
• Breaking down gains from each step

More of
merve (@mervenoyann) 's Twitter Profile Photo

Microsoft released a groundbreaking model that can be used for web automation, with MIT license 🔥👏 OmniParser is a state-of-the-art UI parsing/understanding model that outperforms GPT4V in parsing. 👏