Leshem Choshen @LREC πŸ€–πŸ€—(@LChoshen) 's Twitter Profileg
Leshem Choshen @LREC πŸ€–πŸ€—

@LChoshen

πŸ₯‡ Collaborative LLMs (co-created model merging)
πŸ₯ˆ Opinionatedly sharing #ML & #NLP
πŸ₯‰ We owe science an alternative hype

@IBMResearch & @MIT_CSAIL

ID:1006797311593377792

linkhttps://ktilana.wixsite.com/leshem-choshen calendar_today13-06-2018 07:15:59

7,5K Tweets

3,6K Followers

610 Following

Leshem Choshen @LREC πŸ€–πŸ€—(@LChoshen) 's Twitter Profile Photo

Reading a paper on Arxiv\scholar?
Want to know (or give credit) to the authors?

Created a small script you can add to tapermonkey to search for the authors
greasyfork.org/es/scripts/494…

P.S. I have no javascript skills, and LMs have very little...

account_circle
Alex Warstadt(@a_stadt) 's Twitter Profile Photo

How would you test your BabyLM?

We have a new and improved eval pipeline for round 2 of the BabyLM competition, but we're NOT done adding to it.

If you have an idea for a new eval, reach out, open a pull request, or even submit a writeup to our new 'paper track'!

account_circle
Tom Kocmi(@KocmiTom) 's Twitter Profile Photo

Some metrics are completely useless to evaluate unrelated systems (like LLM vs. NMT). For example, +2 BLEU gain for unrelated systems is about as good as a coin toss (~55%). While the same gain for related systems (e.g. baseline vs. improved model) is about 90% accurate as humans

Some metrics are completely useless to evaluate unrelated systems (like LLM vs. NMT). For example, +2 BLEU gain for unrelated systems is about as good as a coin toss (~55%). While the same gain for related systems (e.g. baseline vs. improved model) is about 90% accurate as humans
account_circle
Leshem Choshen @LREC πŸ€–πŸ€—(@LChoshen) 's Twitter Profile Photo

Are we seeing the second wave?
Linguistic and cultural diversity of LLMs?

Technologies in NLP are always years ahead in English, and then slowly catching up elsewhere recent events might hint we start a second phase?

From today:
x.com/LChoshen/statu…
x.com/LChoshen/statu…

account_circle
Leshem Choshen @LREC πŸ€–πŸ€—(@LChoshen) 's Twitter Profile Photo

GPT3.5 utterly fails on ~300 Polish medical exams,
But GPT4 passes 75%!
arxiv.org/abs/2405.01589

I wonder, how culturally different Polish exams are from US ones
Sadly, a deep read will tell you the exams are in English...

account_circle