Yw (@yongchaonlp) 's Twitter Profile
Yw

@yongchaonlp

ID: 1479008899311280130

calendar_today06-01-2022 08:36:37

41 Tweet

19 Takipçi

81 Takip Edilen

Omar Khattab (@lateinteraction) 's Twitter Profile Photo

🚨Introducing the 𝗗𝗦𝗣 compiler (v0.1)🚨 Describe complex interactions between retrieval models & LMs at a high level. Let 𝗗𝗦𝗣 compile your program into a much *cheaper* version! e.g., powerful multi-hop search with ada (or T5) instead of davinci🧵 github.com/stanfordnlp/dsp

Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

DetectGPT by ⁦@_eric_mitchell_⁩, ⁦Yoonho Lee, ⁦Alexander Khazatsky, ⁦Christopher Manning⁩ & ⁦Chelsea Finn⁩ can determine with up to 95% accuracy whether a particular large language model wrote that essay or social media post. hai.stanford.edu/news/human-wri…

Thomas Simonini (@thomassimonini) 's Twitter Profile Photo

The 8th unit (part 1) of the Hugging Face Deep Reinforcement Learning course has been published 🥳 You’ll learn the theory behind Proximal Policy Optimization and code it from scratch with PyTorch🔥 Start Learning now 👉huggingface.co/deep-rl-course… 1/2

The 8th unit (part 1) of the <a href="/huggingface/">Hugging Face</a> Deep Reinforcement Learning course has been published 🥳

You’ll learn the theory behind Proximal Policy Optimization and code it from scratch with PyTorch🔥

Start Learning now 👉huggingface.co/deep-rl-course…

1/2
AK (@_akhaliq) 's Twitter Profile Photo

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models abs: arxiv.org/abs/2302.07388

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

abs: arxiv.org/abs/2302.07388
Philipp Schmid (@_philschmid) 's Twitter Profile Photo

🚨Attention #NLP enthusiasts! We just published a new blog post on how to fine-tune FLAN-T5-XXL using DeepSpeed & Hugging Face Transformers! 🚀 👉 philschmid.de/fine-tune-flan… We ran a series of experiments to help you choose the right hardware setup.🤖💻

Loreto Parisi (@loretoparisi) 's Twitter Profile Photo

Building NLP systems with Stanford NLP Group DSP (Demonstrate–Search–Predict framework) can outperform GPT-3.5 by up to 120% 🚄🚗 github.com/stanfordnlp/dsp

Guillaume Lample @ NeurIPS 2024 (@guillaumelample) 's Twitter Profile Photo

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at research.facebook.com/publications/l… 1/n

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters.
LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B.
The weights for all models are open and available at research.facebook.com/publications/l…
1/n
John Nay (@johnjnay) 's Twitter Profile Photo

Improving Science by Supervising LLMs -Making LLM reasoning states transparent is safer, but can be worse perf -Decompose task > expose LLM execution traces for human input > iterate -Beats baselines on science tasks. Paper arxiv.org/abs/2301.01751 Code github.com/oughtinc/ice

Improving Science by Supervising LLMs

-Making LLM reasoning states transparent is safer, but can be worse perf
-Decompose task &gt; expose LLM execution traces for human input &gt; iterate

-Beats baselines on science tasks.

Paper arxiv.org/abs/2301.01751
Code github.com/oughtinc/ice
John Nay (@johnjnay) 's Twitter Profile Photo

A Retrieval-Augmented LLM R&D Platform -End-to-end Q&A toolkit -Create custom apps w/ trainable retrievers & readers for deployment -Easy training, inference & eval of SoTA (ColBERT, DPR) Paper arxiv.org/abs/2301.09715 Models huggingface.co/PrimeQA Code github.com/primeqa

A Retrieval-Augmented LLM R&amp;D Platform

-End-to-end Q&amp;A toolkit
-Create custom apps w/ trainable retrievers &amp; readers for deployment
-Easy training, inference &amp; eval of SoTA (ColBERT, DPR)

Paper arxiv.org/abs/2301.09715
Models huggingface.co/PrimeQA
Code github.com/primeqa
AK (@_akhaliq) 's Twitter Profile Photo

Language Is Not All You Need: Aligning Perception with Language Models introduce KOSMOS-12, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) abs: arxiv.org/abs/2302.14045

Language Is Not All You Need: Aligning Perception with Language Models

introduce KOSMOS-12, a Multimodal Large Language Model (MLLM) that can perceive
general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) 

abs: arxiv.org/abs/2302.14045
Charly Wargnier (@datachaz) 's Twitter Profile Photo

Check out Google AI's #Flan5 LLM in action! 🤯 One of the best #opensource LLMs available today, great for text summarization and Q&A tasks! Thanks to LangChain & multitask prompts, I was able to upload my own PDFs for question-answering! 🔥 🔗 to the notebook below ↓

Check out <a href="/GoogleAI/">Google AI</a>'s #Flan5 LLM in action! 🤯

One of the best #opensource LLMs available today, great for text summarization and Q&amp;A tasks!

Thanks to <a href="/LangChainAI/">LangChain</a> &amp; multitask prompts, I was able to upload my own PDFs for question-answering! 🔥

🔗 to the notebook below ↓
Yi Tay (@yitayml) 's Twitter Profile Photo

New open source Flan-UL2 20B checkpoints :) - Truly open source 😎 No forms! 🤭 Apache license 🔥 - Best OS model on MMLU/Big-Bench hard 🤩 - Better than Flan-T5 XXL & competitive to Flan-PaLM 62B. - Size ceiling of Flan family just got higher! Blog: yitay.net/blog/flan-ul2-…

clem 🤗 (@clementdelangue) 's Twitter Profile Photo

Now you can interact with 100k+ open-source models - including Stable Diffusion, bioGPT, Flan, Bloom,... - and your own private models, in JS! Let's build AI better together!

Now you can interact with 100k+ open-source models - including Stable Diffusion, bioGPT, Flan, Bloom,... - and your own private models, in JS!

Let's build AI better together!
Sam Witteveen (@sam_witteveen) 's Twitter Profile Photo

The new Flan-UL2 20B released this week is a big step forward in open sourced models. I put together a Colab rli.to/TA7Y3 walking through loading the Hugging Face version in 8bit for people to play with and try the full 2048 token span. Congrats to Yi Tay and co.

Helen Toner (@hlntnr) 's Twitter Profile Photo

If you spend much time on AI twitter, you might have seen this tentacle monster hanging around. But what is it, and what does it have to do with ChatGPT? It's kind of a long story. But it's worth it! It even ends with cake 🍰 THREAD:

If you spend much time on AI twitter, you might have seen this tentacle monster hanging around. But what is it, and what does it have to do with ChatGPT?

It's kind of a long story. But it's worth it! It even ends with cake 🍰

THREAD:
Jason Wei (@_jasonwei) 's Twitter Profile Photo

I’m hearing chatter of PhD students not knowing what to work on. My take: as LLMs are deployed IRL, the importance of studying how to use them will increase. Some good directions IMO (no training): 1. prompting 2. evals 3. LM interfaces 4. safety 5. understanding LMs 6. emergence

John Nay (@johnjnay) 's Twitter Profile Photo

Automatic Reasoning & Tool-Use of LLMs -Retrieves examples of reasoning & tool use from task library -Writes its own program, pauses when tool call is encountered, integrates output -Much better than few-shot prompting Paper arxiv.org/abs/2303.09014 Code github.com/bhargaviparanj…

Automatic Reasoning &amp; Tool-Use of LLMs

-Retrieves examples of reasoning &amp; tool use from task library
-Writes its own program, pauses when tool call is encountered, integrates output

-Much better than few-shot prompting

Paper arxiv.org/abs/2303.09014
Code github.com/bhargaviparanj…
John Nay (@johnjnay) 's Twitter Profile Photo

Active LLM Retrieval Augmented Generation -Iteratively uses a prediction of upcoming sentence to anticipate future content which is used as query to retrieve relevant docs to regenerate sentence -On 4 long-form generation tasks: superior / competitive arxiv.org/abs/2305.06983

Active LLM Retrieval Augmented Generation

-Iteratively uses a prediction of upcoming sentence to anticipate future content
which is used as query to retrieve relevant docs to regenerate sentence

-On 4 long-form generation tasks: superior / competitive

arxiv.org/abs/2305.06983