Harpreet Singh (@harpreetmann24) Twitter Tweets • TwiCopy

Vaibhav Adlakha

a year ago

We introduce LLM2Vec, a simple approach to transform any decoder-only LLM into a text encoder. We achieve SOTA performance on MTEB in the unsupervised and supervised category (among the models trained only on publicly available data). 🧵1/N Paper: arxiv.org/abs/2404.05961

thumb_up_off_alt877

chat_bubble_outline13

repeat169

shareShare

Philipp Schmid

@_philschmid

a year ago

New open model from Mistral AI! 🧠 Yesterday night, Mistral released Mixtral 8x22B a 176B MoE via magnet link. 🔗🤯 What we know so far: 🧮 176B MoE with ~40B active 📜 context length of 65k tokens. 🪨 Base model can be fine-tuned 👀 ~260GB VRAM in fp16, 73GB in int4 📜 Apache

New open model from <a href="/MistralAI/">Mistral AI</a>! 🧠 Yesterday night, Mistral released Mixtral 8x22B a 176B MoE via magnet link. 🔗🤯

What we know so far:
🧮 176B MoE with ~40B active
📜 context length of 65k tokens.
🪨 Base model can be fine-tuned
👀 ~260GB VRAM in fp16, 73GB in int4
📜 Apache

thumb_up_off_alt284

chat_bubble_outline11

repeat66

shareShare

Philipp Schmid

@_philschmid

a year ago

We can do it! 🙌 First open LLM outperforms OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B! 🤯 TL;DR; 🧮 Mixtral 8x22B based (141B-A40 MoE) 🔓 Apache 2.0 license 🤖 First > 9.00 on MT-Bench with an open LLM 🧬 Used multi-step

We can do it! 🙌 First open LLM outperforms <a href="/OpenAI/">OpenAI</a> GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B! 🤯

TL;DR;
🧮 Mixtral 8x22B based (141B-A40 MoE)
🔓 Apache 2.0 license
🤖 First > 9.00 on MT-Bench with an open LLM
🧬 Used multi-step

thumb_up_off_alt375

chat_bubble_outline12

repeat78

shareShare

Ai2

@allen_ai

a year ago

Announcing our latest addition to the OLMo family, OLMo 1.7!🎉Our team's efforts to improve data quality, training procedures and model architecture have led to a leap in performance. See how OLMo 1.7 stacks up against its peers and peek into the technical details on the blog:

thumb_up_off_alt168

chat_bubble_outline13

repeat47

shareShare

AI at Meta

@aiatmeta

a year ago

Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3

thumb_up_off_alt5,5K

chat_bubble_outline344

repeat1,1K

shareShare

Philipp Schmid

@_philschmid

a year ago

Meta Llama 3 70B Instruct in Hugging Chat! Go have fun! huggingface.co/chat/models/me… huggingface.co/chat/models/me…

thumb_up_off_alt165

chat_bubble_outline7

repeat45

shareShare

Thomas Wolf

@thom_wolf

a year ago

Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes?? Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat290

shareShare

Sebastian Raschka

@rasbt

a year ago

"... do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no ... Thus, despite its recurrent formulation, the 'state' in an SSM is an illusion" 🎤✋🔥 arxiv.org/abs/2404.08819

thumb_up_off_alt206

chat_bubble_outline4

repeat34

shareShare

Philipp Schmid

@_philschmid

a year ago

Easily Fine-tune AI at Meta Llama 3 70B! 🦙 I am excited to share a new guide on how to fine-tune Llama 3 70B with PyTorch FSDP, Q-Lora, and Flash Attention 2 (SDPA) using Hugging Face build for consumer-size GPUs (4x 24GB). 🚀 Blog: philschmid.de/fsdp-qlora-lla… The blog covers: 👨‍💻

thumb_up_off_alt657

chat_bubble_outline17

repeat150

shareShare

Sebastian Raschka

@rasbt

a year ago

Just learned that the RedPajama-V2 pretraining dataset is actually 30T tokens. 2x the size used for Llama 3 🤯 github.com/togethercomput…

thumb_up_off_alt181

chat_bubble_outline15

repeat22

shareShare

Sebastian Raschka

@rasbt

a year ago

Phi-3 has "only" been trained on 5x fewer tokens than Llama 3 (3.3 trillion instead of 15 trillion) Phi-3-mini less has "only" 3.8 billion parameters, less than half the size of Llama 3 8B. Despite being small enough to be deployed on a phone (according to the technical

thumb_up_off_alt485

chat_bubble_outline26

repeat82

shareShare

Extended Brain

@extended_brain

a year ago

arxiv.org/abs/2404.16811 "our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle."

thumb_up_off_alt0

chat_bubble_outline0

repeat1

shareShare

William Fedus

@liamfedus

a year ago

After years eclipsed by its big brothers, gpt-2 resurgant? 🤔

thumb_up_off_alt24

chat_bubble_outline2

repeat3

shareShare

Eugene Yan

@eugeneyan

a year ago

Here's an engaging intro to evals by Midnight Maniac Sri and Wil Chung. They've clearly put a lot of care and effort into it, where the content is well organized with plenty of illustrations throughout. Across 60 pages, they explain model vs. system evals, vibe checks and property-based

Here's an engaging intro to evals by <a href="/sridatta/">Midnight Maniac Sri</a> and <a href="/iamwil/">Wil Chung</a>. They've clearly put a lot of care and effort into it, where the content is well organized with plenty of illustrations throughout.

Across 60 pages, they explain model vs. system evals, vibe checks and property-based

thumb_up_off_alt352

chat_bubble_outline2

repeat52

shareShare

Philipp Schmid

@_philschmid

a year ago

The most comprehensive overview of LLM-as-a-Judge! READ IT‼️ "Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)” summarizes and analyzes two dozen papers on different LLM as Judge approaches.🤯 TL;DR; ⚖️ Direct scoring is suitable for objective evaluations, while

thumb_up_off_alt256

chat_bubble_outline3

repeat60

shareShare

Philipp Schmid

@_philschmid

a year ago

Thats Big! ⛰️ FRAMES released by Google AI! FRAMES is a comprehensive evaluation dataset designed to test Retrieval-Augmented Generation (RAG) Applications on factuality, retrieval accuracy, and reasoning. It includes multi-hop questions that demand sophisticated retrieval and

Thats Big! ⛰️ FRAMES released by <a href="/GoogleAI/">Google AI</a>! FRAMES is a comprehensive evaluation dataset designed to test Retrieval-Augmented Generation (RAG) Applications on factuality, retrieval accuracy, and reasoning. It includes multi-hop questions that demand sophisticated retrieval and

thumb_up_off_alt360

chat_bubble_outline3

repeat79

shareShare

Eugene Yan

@eugeneyan

a year ago

♥️ this writeup from Anthropic for so many reasons: • Reiterating bm25 + semantic retrieval is standard RAG • Not just sharing what worked but also what didn't work • Evals on various data (code, fiction, arXiv) + embeddings • Breaking down gains from each step More of

♥️ this writeup from <a href="/AnthropicAI/">Anthropic</a> for so many reasons:
• Reiterating bm25 + semantic retrieval is standard RAG
• Not just sharing what worked but also what didn't work
• Evals on various data (code, fiction, arXiv) + embeddings
• Breaking down gains from each step

More of

thumb_up_off_alt750

chat_bubble_outline5

repeat86

shareShare

merve

@mervenoyann

10 months ago

Microsoft released a groundbreaking model that can be used for web automation, with MIT license 🔥👏 OmniParser is a state-of-the-art UI parsing/understanding model that outperforms GPT4V in parsing. 👏

thumb_up_off_alt3,3K

chat_bubble_outline30

repeat386

shareShare