Manan Dey (@manandey) Twitter Tweets • TwiCopy

Shanya Sharma

6 years ago

Really happy that our (me and Manan Dey) paper has been accepted NeurIPS Conference 2019 Workshop on AI for Social Good". We'll be discussing about the effect of YouTube videos on viewer's mental health. You can read more about our work at thechange.world #AI4Good #NeurIPS

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Shanya Sharma

@evolvedeve

6 years ago

I'll be presenting our poster on assessing viewer's mental health by analysing YouTube videos at AI for Social Good workshop at #NeurIPS2019! Drop by if you’re around! Poster sessions at 9:35-10:30 AM and 3:30-4:15 PM East MR11,12 - with Manan Dey

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

Shanya Sharma

@evolvedeve

5 years ago

I'm really happy to share that our work on evaluating gender bias in NLI systems has been accepted at #NeurIPS2020 Workshop on Dataset Curation and Security. Joint work with amazing collaborators Manan Dey and Koustuv Sinha. More details coming soon!

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Shanya Sharma

@evolvedeve

5 years ago

Hi #NeurIPS2020! I and Manan Dey will be presenting our poster on *Evaluating Gender Bias in NLI* at the Workshop on Dataset Curation and Security today (11th Dec) at 2:30 PM EST. Drop by if you're around :) cc: Koustuv Sinha Gather Town (Poster 19) neurips.gather.town/app/A4yaHmXq3U…

Hi #NeurIPS2020! I and <a href="/manandey/">Manan Dey</a> will be presenting our poster on *Evaluating Gender Bias in NLI* at the Workshop on Dataset Curation and Security today (11th Dec) at 2:30 PM EST. Drop by if you're around :)
cc: <a href="/koustuvsinha/">Koustuv Sinha</a>

Gather Town (Poster 19)
neurips.gather.town/app/A4yaHmXq3U…

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

BigScience Research Workshop

@bigsciencew

4 years ago

First modeling paper out of BigScience is here! T0 shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller! Model: huggingface.co/bigscience/T0pp Repo: github.com/bigscience-wor… Paper: arxiv.org/abs/2110.08207

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat300

shareShare

Victor Sanh

@sanhestpasmoi

4 years ago

We’ve seen crazy interest in T0++ (pronounced "T Zero Plus Plus"), and almost 10’000 queries to the model since we announced it 3 days ago. Probably the most hilariously decisive prediction from the model (courtesy of Philipp Schmid): 1/N

thumb_up_off_alt239

chat_bubble_outline6

repeat42

shareShare

Sabrina J. Mielke

@sjmielke

4 years ago

Tokenization—the least interesting #NLProc topic? Hell no! We, members of the @BigScienceW tokenization group are proud to present: ✨Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP✨ arxiv.org/abs/2112.10508 What's in it? [1/10]

thumb_up_off_alt661

chat_bubble_outline15

repeat132

shareShare

BigScience Research Workshop

@bigsciencew

4 years ago

We are releasing PromptSource, a toolkit for creating, sharing, and using natural language prompts. We used it to create the largest open-source collection of English prompts: 2,000 prompts for 170 datasets! 📄 arxiv.org/abs/2202.01279 💻 github.com/bigscience-wor…

thumb_up_off_alt376

chat_bubble_outline4

repeat89

shareShare

Saulnier Lucile

@lucilesaulnier

4 years ago

🧐🕵️I am looking for the best possible open source tool to do memory profiling! I would like to know what part of my python code is causing these memory usage spikes that don't necessarily come from the Python interpreter. Looking forward to reading your recommendations! 🤗

thumb_up_off_alt150

chat_bubble_outline11

repeat21

shareShare

Koustuv Sinha

@koustuvsinha

4 years ago

New paper alert! 🎉 Turns out you can reduce the gender biases your translation models just using relevant contexts, purely during inference! Checkout this cool work led by Shanya Sharma and Manan Dey! arxiv.org/abs/2205.10762 [1/4]

thumb_up_off_alt21

chat_bubble_outline2

repeat3

shareShare

BigScience Research Workshop

@bigsciencew

3 years ago

BLOOM is here. The largest open-access multilingual language model ever. Read more about it or get it at bigscience.huggingface.co/blog/bloom hf.co/bigscience/blo…

thumb_up_off_alt2,2K

chat_bubble_outline29

repeat779

shareShare

Shanya Sharma

@evolvedeve

3 years ago

✨Our work "How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts" got accepted at the Findings on EMNLP 2022!✨ Joint work with Manan Dey and our awesome mentor Koustuv Sinha 🎉

thumb_up_off_alt20

chat_bubble_outline3

repeat2

shareShare

BigCode

@bigcodeproject

3 years ago

Announcing a holiday gift: 🎅SantaCoder - a 1.1B multilingual LM for code that outperforms much larger open-source models on both left-to-right generation and infilling! Demo: hf.co/spaces/bigcode… Paper: hf.co/datasets/bigco… Attribution: hf.co/spaces/bigcode… A🧵:

thumb_up_off_alt835

chat_bubble_outline9

repeat199

shareShare

BigCode

@bigcodeproject

3 years ago

Introducing: 💫StarCoder StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass@1 on HumanEval and act as a Tech Assistant. Try it here: shorturl.at/cYZ06r Release thread🧵

thumb_up_off_alt2,2K

chat_bubble_outline75

repeat649

shareShare

Shayne Longpre

@shayneredford

a year ago

✨New Preprint ✨ How are shifting norms on the web impacting AI? We find: 📉 A rapid decline in the consenting data commons (the web) ⚖️ Differing access to data by company, due to crawling restrictions (e.g.🔻26% OpenAI, 🔻13% Anthropic) ⛔️ Robots.txt preference protocols

thumb_up_off_alt237

chat_bubble_outline11

repeat94

shareShare

Shayne Longpre

@shayneredford

a year ago

✨New Report✨ Our data ecosystem audit across text, speech, and video (✏️,📢,📽️) finds: 📈 Rising reliance on web, synthetic, and YouTube data. 🛑 80%+ datasets carry hidden restrictions. 🌍 Relative representation in languages and creators has not improved for 10+ yrs.

thumb_up_off_alt86

chat_bubble_outline1

repeat43

shareShare

Caiming Xiong

@caimingxiong

8 months ago

Testing LLMs' reasoning skills is tough—human evaluations are expensive, data contamination is common, and LLM judges can be biased. We propose StructTest, the first benchmark that checks how well LLMs follow complex instructions and create structured outputs. It uses a

thumb_up_off_alt146

chat_bubble_outline3

repeat37

shareShare

Shayne Longpre

@shayneredford

8 months ago

Thrilled our global data ecosystem audit was accepted to #ICLR2025! Empirically, we find: 1⃣ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024). 2⃣ YouTube is now 70%+ of speech/video data but could block third-party collection. 3⃣ <0.2% of data from

thumb_up_off_alt75

chat_bubble_outline4

repeat23

shareShare