Athiya Deviyani (@athiyad) 's Twitter Profile
Athiya Deviyani

@athiyad

PhD @LTIatCMU on evaluation and trustworthy ML/NLP + ML @NetflixResearch, prev AI&CS @InfAtEd, @Google, @YouTube, @Apple. views are personal 👩🏻‍💻🇮🇩

ID: 117998279

linkhttp://athiyadeviyani.github.io calendar_today27-02-2010 08:06:16

15,15K Tweet

12,12K Followers

1,1K Following

jack morris (@jxmnop) 's Twitter Profile Photo

take a year out of your life and read these two textbooks cover-to-cover. you will already know more than 90% of people in AI

take a year out of your life and read these two textbooks cover-to-cover.  you will already know more than 90% of people in AI
cephalopod (@macrocephalopod) 's Twitter Profile Photo

The new Pope is from Chicago and has a math degree, which goes to show that you can still make a success of your life even if you don’t pass the first round interview at Jump

The new Pope is from Chicago and has a math degree, which goes to show that you can still make a success of your life even if you don’t pass the first round interview at Jump
Hary Susanto (@hafilova) 's Twitter Profile Photo

Gak banyak horor yang bisa ngasih impact gede, sampe-sampe saya butuh waktu untuk recovery mental setelah menontonnya. Ya, Bring Her Back adalah salah satu yang mampu memberikan pengalaman itu, pengalama yang gak cuman bikin bergidik ngeri dan gak nyaman layaknya sajian horor

Gak banyak horor yang bisa ngasih impact gede, sampe-sampe saya butuh waktu untuk recovery mental setelah menontonnya. Ya, Bring Her Back adalah salah satu yang mampu memberikan pengalaman itu, pengalama yang gak cuman bikin bergidik ngeri dan gak nyaman layaknya sajian horor
Bardienus Duisterhof (@bduisterhof) 's Twitter Profile Photo

Imagine if robots could fill in the blanks in cluttered scenes. ✨ Enter RaySt3R: a single masked RGB-D image in, complete 3D out. It infers depth, object masks, and confidence for novel views, and merges the predictions into a single point cloud. rayst3r.github.io

Joseph Imperial (@josephimperial_) 's Twitter Profile Photo

NeurIPS D&B track in a nutshell: (1) An LLM-generated benchmark dataset (2) used to test performance of LLMs (3) evaluated via LLM-as-a-judge

NeurIPS D&B track in a nutshell:

(1) An LLM-generated benchmark dataset
(2) used to test performance of LLMs
(3) evaluated via LLM-as-a-judge
Danny To Eun Kim (@teknology.bsky.social) (@teknologyy) 's Twitter Profile Photo

🧵Working with #MCP or building a modular #RAG system, but not sure which rankers to use from your pool? 📊 Rank the Rankers⚡Route smart. This paper shows how. 👨‍🔬 w/ Fernando Diaz Fernando Diaz 💻 Code: github.com/kimdanny/Starl… Paper: arxiv.org/abs/2506.13743

Peng Qi (@qi2peng2) 's Twitter Profile Photo

Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of

Danny To Eun Kim (@teknology.bsky.social) (@teknologyy) 's Twitter Profile Photo

🤔ChatGPT shows shopping results. Perplexity shows ads. What’s the future of advertisement in conversational search engines? We explore 1) how ads can be seamlessly integrated into LLM-generated responses, 2) and how to detect them. arxiv.org/abs/2507.00509 #CLEF #touche

Vilém Zouhar (@zouharvi) 's Twitter Profile Photo

You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅 We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️ (random is still a devilishly good baseline)

You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅

We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️
(random is still a devilishly good baseline)