Arvindh Arun (@arvindh__a) Twitter Tweets • TwiCopy

Arvindh Arun

@arvindh__a

+ Follow

jack of some, trying to be master of one. @ELLISforEurope @MPI_IS PhD student @Uni_stuttgart @EdinburghUni | Foundation Models {for, using} Knowledge Graphs

ID: 1160063338669211648

linkhttps://arvindh75.github.io/ calendar_today10-08-2019 05:40:27

29 Tweet

101 Takipçi

456 Takip Edilen

antonio vergari - hiring PhD students

@tetraduzione

7 months ago

Arvindh Arun Michael Galkin Sumit Kumar Bo Xiong yeah, reporting a single lucky shot can change perception of progress by a lot. I am also always skeptical of drawing conclusions only by looking at improvements over big averages: you can get much larger gains on certain KGs (which ofc can benefit from semantics)

<a href="/arvindh__a/">Arvindh Arun</a> <a href="/michael_galkin/">Michael Galkin</a> <a href="/sumitkk01010/">Sumit Kumar</a> <a href="/BoXiongs/">Bo Xiong</a> yeah, reporting a single lucky shot can change perception of progress by a lot.

I am also always skeptical of drawing conclusions only by looking at improvements over big averages: you can get much larger gains on certain KGs (which ofc can benefit from semantics)

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

Arvindh Arun

@arvindh__a

6 months ago

AGI arrived in sf (that too in a Prius)! time to wrap it up 🫡

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Shashwat Goel

@shashwatgoel7

5 months ago

There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer

thumb_up_off_alt229

chat_bubble_outline11

repeat37

shareShare

Nikhil Chandak

@nikhilchandak29

5 months ago

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision

thumb_up_off_alt62

chat_bubble_outline3

repeat18

shareShare

Akshit

@akshitwt

5 months ago

i will be at #icml2025 next week to present our paper below (Tue, 15 Jul 11 am)! i would love to chat with people interested in graph learning, GNNs, LLM evaluations and trustworthy ML. i am also on the lookout for PhD positions next cycle and would love to chat about such

thumb_up_off_alt90

chat_bubble_outline7

repeat7

shareShare

Arvindh Arun

@arvindh__a

5 months ago

I will be at #ICML2025 🇨🇦🍁next week to present our work on unlearning in GNNs (Poster session 1 East, 15 Jul at 1100) Looking forward to chat with people working on Foundation Models for (knowledge) graphs & LLM interp and evals folks! 🌐: cognac-gnn-unlearning.github.io

thumb_up_off_alt29

chat_bubble_outline0

repeat4

shareShare

Arvindh Arun

@arvindh__a

5 months ago

Cursor for me is exponentially more helpful if I’m working on something new from scratch or even more helpful for experimenting with an unfamiliar codebase. Benchmarking developers maintaining “their own” repo completely misses this dimension!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Arvindh Arun

@arvindh__a

5 months ago

Vancouver is breathtaking — just like our poster!

thumb_up_off_alt54

chat_bubble_outline1

repeat2

shareShare

Arvindh Arun

@arvindh__a

4 months ago

Been working on llm evals these past couple of months. Two bitter lessons that I’ve picked up, 1. There is never enough compute/API creds 2. There is no hiding from “The bitter lesson” Can’t wait to share more of our findings (and their implications) in the coming weeks!

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare