Manan Dey (@manandey) 's Twitter Profile
Manan Dey

@manandey

ID: 1614570612

linkhttps://scholar.google.co.in/citations?user=39CsfP8AAAAJ&hl=en calendar_today23-07-2013 06:59:06

22 Tweet

110 Followers

1,1K Following

Shanya Sharma (@evolvedeve) 's Twitter Profile Photo

Really happy that our (me and Manan Dey) paper has been accepted NeurIPS Conference 2019 Workshop on AI for Social Good". We'll be discussing about the effect of YouTube videos on viewer's mental health. You can read more about our work at thechange.world #AI4Good #NeurIPS

Shanya Sharma (@evolvedeve) 's Twitter Profile Photo

I'll be presenting our poster on assessing viewer's mental health by analysing YouTube videos at AI for Social Good workshop at #NeurIPS2019! Drop by if you’re around! Poster sessions at 9:35-10:30 AM and 3:30-4:15 PM East MR11,12 - with Manan Dey

I'll be presenting our poster on assessing viewer's mental health by analysing YouTube videos at AI for Social Good workshop at #NeurIPS2019! Drop by if you’re around! Poster sessions at 9:35-10:30 AM and 3:30-4:15 PM East MR11,12 - with <a href="/manandey/">Manan Dey</a>
Shanya Sharma (@evolvedeve) 's Twitter Profile Photo

I'm really happy to share that our work on evaluating gender bias in NLI systems has been accepted at #NeurIPS2020 Workshop on Dataset Curation and Security. Joint work with amazing collaborators Manan Dey and Koustuv Sinha. More details coming soon!

Shanya Sharma (@evolvedeve) 's Twitter Profile Photo

Hi #NeurIPS2020! I and Manan Dey will be presenting our poster on *Evaluating Gender Bias in NLI* at the Workshop on Dataset Curation and Security today (11th Dec) at 2:30 PM EST. Drop by if you're around :) cc: Koustuv Sinha Gather Town (Poster 19) neurips.gather.town/app/A4yaHmXq3U…

Hi #NeurIPS2020! I and <a href="/manandey/">Manan Dey</a> will be presenting our poster on *Evaluating Gender Bias in NLI* at the Workshop on Dataset Curation and Security today (11th Dec) at 2:30 PM EST. Drop by if you're around :)
cc: <a href="/koustuvsinha/">Koustuv Sinha</a> 

Gather Town (Poster 19)
neurips.gather.town/app/A4yaHmXq3U…
BigScience Research Workshop (@bigsciencew) 's Twitter Profile Photo

First modeling paper out of BigScience is here! T0 shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller! Model: huggingface.co/bigscience/T0pp Repo: github.com/bigscience-wor… Paper: arxiv.org/abs/2110.08207

First modeling paper out of BigScience is here!

T0 shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller!

Model: huggingface.co/bigscience/T0pp
Repo: github.com/bigscience-wor…
Paper: arxiv.org/abs/2110.08207
Victor Sanh (@sanhestpasmoi) 's Twitter Profile Photo

We’ve seen crazy interest in T0++ (pronounced "T Zero Plus Plus"), and almost 10’000 queries to the model since we announced it 3 days ago. Probably the most hilariously decisive prediction from the model (courtesy of Philipp Schmid): 1/N

We’ve seen crazy interest in T0++ (pronounced "T Zero Plus Plus"), and almost 10’000 queries to the model since we announced it 3 days ago. 

Probably the most hilariously decisive prediction from the model (courtesy of <a href="/_philschmid/">Philipp Schmid</a>):
1/N
Sabrina J. Mielke (@sjmielke) 's Twitter Profile Photo

Tokenization—the least interesting #NLProc topic? Hell no! We, members of the @BigScienceW tokenization group are proud to present: ✨Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP✨ arxiv.org/abs/2112.10508 What's in it? [1/10]

Tokenization—the least interesting #NLProc topic? Hell no! We, members of the @BigScienceW tokenization group are proud to present:

✨Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP✨
arxiv.org/abs/2112.10508

What's in it? [1/10]
BigScience Research Workshop (@bigsciencew) 's Twitter Profile Photo

We are releasing PromptSource, a toolkit for creating, sharing, and using natural language prompts. We used it to create the largest open-source collection of English prompts: 2,000 prompts for 170 datasets! 📄 arxiv.org/abs/2202.01279 💻 github.com/bigscience-wor…

We are releasing PromptSource, a toolkit for creating, sharing, and using natural language prompts.

We used it to create the largest open-source collection of English prompts: 2,000 prompts for 170 datasets!

📄 arxiv.org/abs/2202.01279
💻 github.com/bigscience-wor…
Saulnier Lucile (@lucilesaulnier) 's Twitter Profile Photo

🧐🕵️I am looking for the best possible open source tool to do memory profiling! I would like to know what part of my python code is causing these memory usage spikes that don't necessarily come from the Python interpreter. Looking forward to reading your recommendations! 🤗

🧐🕵️I am looking for the best possible open source tool to do memory profiling! 

I would like to know what part of my python code is causing these memory usage spikes that don't necessarily come from the Python interpreter.

Looking forward to reading your recommendations! 🤗
Koustuv Sinha (@koustuvsinha) 's Twitter Profile Photo

New paper alert! 🎉 Turns out you can reduce the gender biases your translation models just using relevant contexts, purely during inference! Checkout this cool work led by Shanya Sharma and Manan Dey! arxiv.org/abs/2205.10762 [1/4]

New paper alert! 🎉 Turns out you can reduce the gender biases your translation models just using relevant contexts, purely during inference! Checkout this cool work led by <a href="/evolvedeve/">Shanya Sharma</a> and <a href="/manandey/">Manan Dey</a>! arxiv.org/abs/2205.10762 [1/4]
BigScience Research Workshop (@bigsciencew) 's Twitter Profile Photo

BLOOM is here. The largest open-access multilingual language model ever. Read more about it or get it at bigscience.huggingface.co/blog/bloom hf.co/bigscience/blo…

BLOOM is here. The largest open-access multilingual language model ever. Read more about it or get it at
bigscience.huggingface.co/blog/bloom
hf.co/bigscience/blo…
Shanya Sharma (@evolvedeve) 's Twitter Profile Photo

✨Our work "How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts" got accepted at the Findings on EMNLP 2022!✨ Joint work with Manan Dey and our awesome mentor Koustuv Sinha 🎉

✨Our work "How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts" got accepted at the Findings on EMNLP 2022!✨

Joint work with <a href="/manandey/">Manan Dey</a> and our awesome mentor <a href="/koustuvsinha/">Koustuv Sinha</a> 🎉
BigCode (@bigcodeproject) 's Twitter Profile Photo

Announcing a holiday gift: 🎅SantaCoder - a 1.1B multilingual LM for code that outperforms much larger open-source models on both left-to-right generation and infilling! Demo: hf.co/spaces/bigcode… Paper: hf.co/datasets/bigco… Attribution: hf.co/spaces/bigcode… A🧵:

Announcing a holiday gift: 🎅SantaCoder - a 1.1B multilingual LM for code that outperforms much larger open-source models on both left-to-right generation and infilling!

Demo: hf.co/spaces/bigcode…
Paper: hf.co/datasets/bigco…
Attribution: hf.co/spaces/bigcode…

A🧵:
BigCode (@bigcodeproject) 's Twitter Profile Photo

Introducing: 💫StarCoder StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass@1 on HumanEval and act as a Tech Assistant. Try it here: shorturl.at/cYZ06r Release thread🧵

Introducing: 💫StarCoder

StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass@1 on HumanEval and act as a Tech Assistant.

Try it here: shorturl.at/cYZ06r

Release thread🧵
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

✨New Preprint ✨ How are shifting norms on the web impacting AI? We find: 📉 A rapid decline in the consenting data commons (the web) ⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic) ⛔️ Robots.txt preference protocols

✨New Preprint ✨ How are shifting norms on the web impacting AI?

We find:

📉 A rapid decline in the consenting data commons (the web)

⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic)

⛔️ Robots.txt preference protocols
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

✨New Report✨ Our data ecosystem audit across text, speech, and video (✏️,📢,📽️) finds: 📈 Rising reliance on web, synthetic, and YouTube data. 🛑 80%+ datasets carry hidden restrictions. 🌍 Relative representation in languages and creators has not improved for 10+ yrs.

Caiming Xiong (@caimingxiong) 's Twitter Profile Photo

Testing LLMs' reasoning skills is tough—human evaluations are expensive, data contamination is common, and LLM judges can be biased. We propose StructTest, the first benchmark that checks how well LLMs follow complex instructions and create structured outputs. It uses a

Testing LLMs' reasoning skills is tough—human evaluations are expensive, data contamination is common, and LLM judges can be biased. We propose StructTest, the first benchmark that checks how well LLMs follow complex instructions and create structured outputs. It uses a
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

Thrilled our global data ecosystem audit was accepted to #ICLR2025! Empirically, we find: 1⃣ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024). 2⃣ YouTube is now 70%+ of speech/video data but could block third-party collection. 3⃣ <0.2% of data from

Thrilled our global data ecosystem audit was accepted to #ICLR2025!

Empirically, we find:

1⃣ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024).

2⃣ YouTube is now 70%+ of speech/video data but could block third-party collection.

3⃣ &lt;0.2% of data from