Instruction Workshop, NeurIPS 2023 (@itif_workshop) 's Twitter Profile
Instruction Workshop, NeurIPS 2023

@itif_workshop

The official account of the 1st Workshop on Instruction Tuning and Instruction Following (ITIF), colocated with NeurIPS, in December 2023.

ID: 1689312241542430721

linkhttps://an-instructive-workshop.github.io/ calendar_today09-08-2023 16:27:08

162 Tweet

261 Followers

26 Following

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

📢 Check out Anthony Chen and my invited talk at the at the USC ISI Natural Language Seminar: 📜 "The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI" youtube.com/watch?v=np9HeJ… Thank you Justin Cho 조현동 for hosting!

SambaNova Systems (@sambanovaai) 's Twitter Profile Photo

🚀🌟🚀Excited to announce Samba-CoE v0.2, which outperforms DBRX by Databricks Mosaic Research and Databricks, Mixtral-8x7B from Mistral AI, and Grok-1 by Grok at a breakneck speed of 330 tokens/s. These breakthrough speeds were achieved without sacrificing precision and only on 8 sockets,

🚀🌟🚀Excited to announce Samba-CoE v0.2, which outperforms DBRX by <a href="/DbrxMosaicAI/">Databricks Mosaic Research</a> and <a href="/databricks/">Databricks</a>, Mixtral-8x7B from <a href="/MistralAI/">Mistral AI</a>, and Grok-1 by <a href="/grok/">Grok</a> at a breakneck speed of 330 tokens/s. 
These breakthrough speeds were achieved without sacrificing precision and only on 8 sockets,
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

📢 Want to automatically generate your bibtex for 1000s of Hugging Face text datasets? Minh Chien Vu just added this feature + data summaries for: ➡️ huge collections like Flan, P3, Aya... ➡️ popular OpenAI-generated datasets ➡️ ~2.5k+ datasets & growing 🔗:

📢 Want to automatically generate your bibtex for 1000s of <a href="/huggingface/">Hugging Face</a> text datasets?

<a href="/chien_vu1692/">Minh Chien Vu</a> just added this feature + data summaries for:

➡️ huge collections like Flan, P3, Aya...
➡️ popular OpenAI-generated datasets
➡️ ~2.5k+ datasets &amp; growing

🔗:
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

Excited to see our 🍮Flan-Palm🌴 work finally published in Journal of Machine Learning Research 2024! Looking back, I see this work as pushing hard on scaling: post-training data, models, prompting, & eval. We brought together the methods and findings of many awesome prior works, scaled them up, and

Jack Jingyu Zhang (@jackjingyuzhang) 's Twitter Profile Photo

Thanks @elvis for sharing our work! 🤔 LLMs often generate fluent but hallucinated text. How can we reliably ✨verify✨ their correctness against trusted sources? We tackle the verifiability goal by aligning LLMs to generate verbatim quotes from their pre-training data 📚.

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

A 🧵 on my favorite, influential works on "Data Measurements" 🚂 Datasets drive AI progress 📚 But... massive datasets remain impenetrable & poorly understood for *years* 🔍 Data forensics uncover their mysteries 1/

Seungone Kim (@seungonekim) 's Twitter Profile Photo

#NLProc Introducing 🔥Prometheus 2, an open-source LM specialized on evaluating other language models. ✅Supports both direct assessment & pairwise ranking. ✅ Improved evaluation capabilities compared to its predecessor. ✅Can assess based on user-defined evaluation criteria.

#NLProc
Introducing 🔥Prometheus 2, an open-source LM specialized on evaluating other language models.

✅Supports both direct assessment &amp; pairwise ranking.
✅ Improved evaluation capabilities compared to its predecessor.
✅Can assess based on user-defined evaluation criteria.
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

🚨 New #ICML2024 position piece. The most overlooked risks of AI stem from autonomous weaponry For 4 reasons: 1⃣ Arms race w/ ⬇️ human oversight 2⃣ Reduces cost of starting conflicts 3⃣ Evades accountability 4⃣ Battlefield errors aren’t considered costly See our work led by

🚨 New #ICML2024 position piece.

The most overlooked risks of AI stem from autonomous weaponry

For 4 reasons:
1⃣ Arms race w/ ⬇️ human oversight
2⃣ Reduces cost of starting conflicts
3⃣ Evades accountability
4⃣ Battlefield errors aren’t considered costly

See our work led by
Seungone Kim (@seungonekim) 's Twitter Profile Photo

🤔How can we systematically assess an LM's proficiency in a specific capability without using summary measures like helpfulness or simple proxy tasks like multiple-choice QA? Introducing the ✨BiGGen Bench, a benchmark that directly evaluates nine core capabilities of LMs.

🤔How can we systematically assess an LM's proficiency in a specific capability without using summary measures like helpfulness or simple proxy tasks like multiple-choice QA?

Introducing the ✨BiGGen Bench, a benchmark that directly evaluates nine core capabilities of LMs.
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

✨New Preprint ✨ How are shifting norms on the web impacting AI? We find: 📉 A rapid decline in the consenting data commons (the web) ⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic) ⛔️ Robots.txt preference protocols

✨New Preprint ✨ How are shifting norms on the web impacting AI?

We find:

📉 A rapid decline in the consenting data commons (the web)

⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic)

⛔️ Robots.txt preference protocols
Nayan Saxena (@saxenanayan) 's Twitter Profile Photo

✨Incredibly proud to share our new paper led by Massachusetts Institute of Technology (MIT) MIT Media Lab showing a rapid decline in consenting data for AI, asymmetries in data access by company (🔻26% OpenAI, 🔻13% Anthropic), and inefficiencies in robots.txt preference protocols. dataprovenance.org/consent-in-cri…

✨Incredibly proud to share our new paper led by <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a> <a href="/medialab/">MIT Media Lab</a> showing a rapid decline in consenting data for AI, asymmetries in data access by company (🔻26% OpenAI, 🔻13% Anthropic), and inefficiencies in robots.txt preference protocols. 

dataprovenance.org/consent-in-cri…
Minh Chien Vu (@chien_vu1692) 's Twitter Profile Photo

The Data Provenance Initiative led by Massachusetts Institute of Technology (MIT) MIT Media Lab is releasing a large-scale audit of 1800+ LLM training datasets! We found significant data access asymmetries by the company (🔻26% OpenAI, 🔻13% Anthropic). See Shayne Longpre's thread for more ⬇️ x.com/ShayneRedford/…

Daphne Ippolito (@daphneipp) 's Twitter Profile Photo

In the past, I've studied how curation decisions for pre-training data influence what LMs are good and bad at. In our new preprint, we look at how the fabric of the internet (the primary source of most of these datasets), is itself changing, and the effects this might have.

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

Headed to 🛬🇦🇹 Vienna #ICML2024 Reach out if you'd like to chat or catch up! Work together w/ collaborators: - A Safe Harbor for AI Evaluation ⛴️(arxiv.org/abs/2403.04893) -- Tuesday 10:30 am Oral - On the Societal Impact of Open Foundation Models (arxiv.org/abs/2403.07918) --

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

📢 AI is increasingly (mis)used in the context of autonomous weaponry. Fantastic to see this covered by Catherine Caruso in Harvard Medical School news. Also see the #ICML2024 Oral led by Riley Simmons-Edler @RyanBadman1 and Kanaka Rajan.

Shayne Longpre (@shayneredford) 's Twitter Profile Photo

Honored for the Data Provenance Initiative to be awarded the Infrastructure Grant Award, by Mozilla! 🎉🎉🎉 As part of this grant, we were invited to present at MozFest House Amsterdam, where we gave an early look at trends in the AI data supply chain: 📽️

Honored for the Data Provenance Initiative to be awarded the Infrastructure Grant Award, by <a href="/mozilla/">Mozilla</a>! 🎉🎉🎉

As part of this grant, we were invited to present at MozFest House Amsterdam, where we gave an early look at trends in the AI data supply chain:

📽️
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

📢 Excited to see our piece the "Data Provenance Initiative: A large-scale audit of dataset licensing and attribution in AI" now in: 📜 nature Machine Intelligence ➡️ nature.com/articles/s4225… 🗞️Massachusetts Institute of Technology (MIT) News ➡️ news.mit.edu/2024/study-lar… 1/