Arvindh Arun (@arvindh__a) 's Twitter Profile
Arvindh Arun

@arvindh__a

jack of some, trying to be master of one. @ELLISforEurope @MPI_IS PhD student @Uni_stuttgart @EdinburghUni | Foundation Models {for, using} Knowledge Graphs

ID: 1160063338669211648

linkhttps://arvindh75.github.io/ calendar_today10-08-2019 05:40:27

29 Tweet

101 Followers

456 Following

antonio vergari - hiring PhD students (@tetraduzione) 's Twitter Profile Photo

Arvindh Arun Michael Galkin Sumit Kumar Bo Xiong yeah, reporting a single lucky shot can change perception of progress by a lot. I am also always skeptical of drawing conclusions only by looking at improvements over big averages: you can get much larger gains on certain KGs (which ofc can benefit from semantics)

<a href="/arvindh__a/">Arvindh Arun</a> <a href="/michael_galkin/">Michael Galkin</a> <a href="/sumitkk01010/">Sumit Kumar</a> <a href="/BoXiongs/">Bo Xiong</a> yeah, reporting a single lucky shot can change perception of progress by a lot.

I am also always skeptical of drawing conclusions only by looking at improvements over big averages: you can get much larger gains on certain KGs (which ofc can benefit from semantics)
Shashwat Goel (@shashwatgoel7) 's Twitter Profile Photo

There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer

There's been a hole at the heart of #LLM evals, and we can now fix it.

📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations.

❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer
Nikhil Chandak (@nikhilchandak29) 's Twitter Profile Photo

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯

Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision
Akshit (@akshitwt) 's Twitter Profile Photo

i will be at #icml2025 next week to present our paper below (Tue, 15 Jul 11 am)! i would love to chat with people interested in graph learning, GNNs, LLM evaluations and trustworthy ML. i am also on the lookout for PhD positions next cycle and would love to chat about such

i will be at #icml2025 next week to present our paper below (Tue, 15 Jul 11 am)!
i would love to chat with people interested in graph learning, GNNs, LLM evaluations and trustworthy ML.

i am also on the lookout for PhD positions next cycle and would love to chat about such
Arvindh Arun (@arvindh__a) 's Twitter Profile Photo

I will be at #ICML2025 🇨🇦🍁next week to present our work on unlearning in GNNs (Poster session 1 East, 15 Jul at 1100) Looking forward to chat with people working on Foundation Models for (knowledge) graphs & LLM interp and evals folks! 🌐: cognac-gnn-unlearning.github.io

I will be at #ICML2025 🇨🇦🍁next week to present our work on unlearning in GNNs (Poster session 1 East, 15 Jul at 1100)
 
Looking forward to chat with people working on Foundation Models for (knowledge) graphs &amp; LLM interp and evals folks!

🌐: cognac-gnn-unlearning.github.io
Arvindh Arun (@arvindh__a) 's Twitter Profile Photo

Cursor for me is exponentially more helpful if I’m working on something new from scratch or even more helpful for experimenting with an unfamiliar codebase. Benchmarking developers maintaining “their own” repo completely misses this dimension!

Arvindh Arun (@arvindh__a) 's Twitter Profile Photo

Been working on llm evals these past couple of months. Two bitter lessons that I’ve picked up, 1. There is never enough compute/API creds 2. There is no hiding from “The bitter lesson” Can’t wait to share more of our findings (and their implications) in the coming weeks!