Oren Sultan (@oren_sultan) 's Twitter Profile
Oren Sultan

@oren_sultan

AI Researcher @Lightricks, CS PhD Candidate #AI #NLP @HebrewU, advised by @HyadataLab 🇮🇱 | prev. @TU_Muenchen 🇩🇪 @UniMelb 🇦🇺

ID: 1423192726670135300

linkhttp://www.orensultan.com calendar_today05-08-2021 08:02:52

576 Tweet

888 Takipçi

689 Takip Edilen

Michael Hassid (@michaelhassid) 's Twitter Profile Photo

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

The longer reasoning LLM thinks - the more likely to be correct, right?

Apparently not.

Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”.

Link: arxiv.org/abs/2505.17813

1/n
Iddo Yosha (@iddoyosha) 's Twitter Profile Photo

🚨 Happy to share our #Interspeech2025 paper! "WhiStress: Enriching Transcriptions with Sentence Stress Detection" Sentence stress is a word-level prosodic cue that marks contrast or intent. WhiStress detects it alongside transcription—no alignment needed. Paper, code, demo 👇

Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

🚨 New paper! We present CHIMERA — a KB of 28K+ scientific idea recombinations 💡 It captures how researchers blend concepts or take inspiration across fields, enabling: 1. Meta-science 2. Training models to predict new combos noy-sternlicht.github.io/CHIMERA-Web 👇 Findings & data:

Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

🔔 New Paper! We propose a challenging new benchmark for LLM judges: Evaluating debate speeches. Are they comparable to humans? Well... it’s debatable. 🤔 noy-sternlicht.github.io/Debatable-Inte… 👇 Here are our findings:

Or Tal (@or__tal) 's Twitter Profile Photo

Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗huggingface.co/spaces/ortal16… 📄arxiv.org/abs/2506.08570 1/6

Which modeling to choose for text-to-music generation?
We run a head-to-head comparison to figure it out.
Same data, same architecture - AR vs FM.
👇 If you care about fidelity, speed, control, or editing see this thread.
🔗huggingface.co/spaces/ortal16…
📄arxiv.org/abs/2506.08570
1/6
נדב הר-טוב (@nadavhartuv) 's Twitter Profile Photo

🚨 New paper alert! PAST: phonetic-acoustic speech tokenizer – just got accepted to Interspeech 2025 🎉 It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder. 🔗pages.cs.huji.ac.il/adiyoss-lab/PA… 👇 If you’re into speech LMs, keep reading!

🚨 New paper alert!
PAST: phonetic-acoustic speech tokenizer – just got accepted to Interspeech 2025 🎉
It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder.

🔗pages.cs.huji.ac.il/adiyoss-lab/PA…
👇 If you’re into speech LMs, keep reading!
Eliya Habba (@eliyahabba) 's Twitter Profile Photo

Presenting my poster : 🕊️ DOVE - A large-scale multi-dimensional predictions dataset towards meaningful LLM evaluation, Monday 18:00 Vienna, #ACL2025 Come chat about LLM evaluation, prompt sensitivity, and our 250M COLLECTION OF MODEL OUTPUTS!

Presenting my poster :
🕊️ DOVE - A large-scale multi-dimensional predictions dataset towards meaningful LLM evaluation, Monday 18:00 Vienna, 
#ACL2025

Come chat about LLM evaluation, prompt sensitivity, and our 250M COLLECTION OF MODEL OUTPUTS!
Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

🚨 Benchmarks tell us which model is better — but not why it fails. For developers, this means tedious, manual error analysis. We're bridging that gap. Meet CLEAR: an open-source tool for actionable error analysis of LLMs. 🧵👇

🚨 Benchmarks tell us which model is better — but not why it fails.

For developers, this means tedious, manual error analysis. We're bridging that gap.

Meet CLEAR: an open-source tool for actionable error analysis of LLMs.

🧵👇