Roy Bar Haim (@roybarhaim) 's Twitter Profile
Roy Bar Haim

@roybarhaim

Senior Technical Staff Member, NLP at IBM Research AI. Opinions are my own

ID: 1058255708

linkhttps://research.ibm.com/people/roy-bar-haim calendar_today03-01-2013 16:54:36

24 Tweet

42 Followers

36 Following

Roy Bar Haim (@roybarhaim) 's Twitter Profile Photo

NAACL'21 main conference is starting today! Meet our researchers and recruiting team at the IBM Research virtual booth: underline.io/events/122/exp…, and learn more about IBM Research's presence at @NAACLHLT, careers and booth schedule at ibm.biz/naacl2021

Orith Toledo-Ronen (@orithtoledo) 's Twitter Profile Photo

Interested in TARGETED #SentimentAnalysis beyond restaurant reviews? In #NAACL2022 we suggest a robust multi-domain model relying on self-training, with no extra annotation -- arxiv.org/abs/2205.03804 Orith Toledo-Ronen matan orbach Yoav Katz Noam Slonim 🟢 #NLProc #IBMResearch (1/5)

Interested in TARGETED #SentimentAnalysis beyond restaurant reviews? In #NAACL2022 we suggest a robust multi-domain model relying on self-training, with no extra annotation -- arxiv.org/abs/2205.03804
<a href="/OrithToledo/">Orith Toledo-Ronen</a> <a href="/MatanOrbach/">matan orbach</a> <a href="/YoavKatz73/">Yoav Katz</a> <a href="/noamslonim/">Noam Slonim 🟢</a>
#NLProc #IBMResearch 
(1/5)
Roy Bar Haim (@roybarhaim) 's Twitter Profile Photo

NAACL 2022 is starting on Sunday! Visit our website ibm.biz/naacl2022 to learn about the exciting NLP work from IBM Research that will be presented at this conference. IBM Research NAACL HLT 2027 #NAACL2022

Avi Sil (@aviaviavi__) 's Twitter Profile Photo

Welcome PrimeQA at #NAACL2022! Replicate the state-of-the-art on multilingual open QA quickly! Here’s a new open-source repo in collab with with Stanford NLP Group, Hugging Face, Uni Stuttgart @NLPIllinois1. Link: github.com/primeqa/primeqa Talk to me or read: research.ibm.com/blog/primeqa-f… 🧵

Welcome PrimeQA at #NAACL2022! Replicate the state-of-the-art on multilingual open QA quickly! Here’s a new open-source repo in collab with with <a href="/stanfordnlp/">Stanford NLP Group</a>, <a href="/huggingface/">Hugging Face</a>, <a href="/Uni_Stuttgart/">Uni Stuttgart</a> @NLPIllinois1. Link: github.com/primeqa/primeqa Talk to me or read: 
research.ibm.com/blog/primeqa-f… 🧵
Eyal Shnarch (@eyalshnarch) 's Twitter Profile Photo

Want to build a text classifier in a few hours? Even if you don’t have any: labeled data #machineLearning knowledge programing skills Label Sleuth ibm.biz/label-sleuth a new open-source no-code system for annotations 🧵 IBM Research University of Notre Dame Stanford Human-Computer Interaction Group UT Dallas #NLProc

Want to build a text classifier in a few hours?

Even if you don’t have any:
labeled data
#machineLearning knowledge
programing skills

Label Sleuth ibm.biz/label-sleuth a new open-source no-code system for annotations 🧵 <a href="/IBMResearch/">IBM Research</a> <a href="/NotreDame/">University of Notre Dame</a> <a href="/StanfordHCI/">Stanford Human-Computer Interaction Group</a> UT Dallas #NLProc
Arie Cattan (@ariecattan) 's Twitter Profile Photo

Curious to see how can we summarize opinions beyond plain text summaries? Check out our #ACL2023 paper: From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization with Lilach Eden, yoav kantor Roy Bar Haim from IBM Research IBM BIU NLP >>

Argument Mining (@argminingorg) 's Twitter Profile Photo

We are excited to announce that the Argument Mining workshop will take place at #ACL2024 in Bangkok, Thailand. For more info see our website at argmining-org.github.io/2024/

Argument Mining (@argminingorg) 's Twitter Profile Photo

We are happy to announce two shared tasks for ArgMining 2024: 1) Perspective Argument Retrieval organized by Neele Falk and Andreas Waldis. 2) DialAM-2024 organized by Ramon Ruiz-Dolz, John Lawrence, Ella Schad, and Chris Reed.

Argument Mining (@argminingorg) 's Twitter Profile Photo

The call for papers for the 11th Workshop on Argument Mining #argminig_2024 is now out: argmining-org.github.io/2024/index.htm…

Argument Mining (@argminingorg) 's Twitter Profile Photo

ArgMining 2024 ended with a great photo of its wonderful community. Kudos to all of your great ideas, contributions, and help in organizing.

ArgMining 2024 ended with a great photo of its wonderful community. Kudos to all of your great ideas, contributions, and help in organizing.
Ariel Gera (@arielgera2) 's Twitter Profile Photo

Say I want to compare system qualities - pick between 2 configurations, or rank a whole bunch of models. I'll use LLM-as-a-judge, right? 🧑🏻‍⚖️ But how do I know the LLM judge is up to the task? Who is a good judge for ranking systems? Enter our new paper!✨🧵 arxiv.org/abs/2412.09569

Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

New preprint! ✨ Interested in LLM-as-a-Judge? Want to get the best judge for ranking your system? our new work is just for you: "JuStRank: Benchmarking LLM Judges for System Ranking" 🕺💃 arxiv.org/abs/2412.09569

Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

Survey on Evaluation of LLM-based Agents 🤖 Our paper is the first to provide a comprehensive overview of LLM-based agent evaluation 📜 Paper: arxiv.org/pdf/2503.16416

Survey on Evaluation of LLM-based Agents 🤖

Our paper is the first to provide a comprehensive overview of LLM-based agent evaluation 📜

Paper: arxiv.org/pdf/2503.16416
Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

Interested in Agent Evaluation? 🤖 We’re excited to launch our new repo: “Evaluation of LLM-based Agents: A Reading List” 📚 Browse benchmarks, methods, and frameworks from our recent survey. 👉 Explore & Contribute: github.com/Asaf-Yehudai/L… #LLMAgents #AgentEvaluation

Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

🔔 New Paper! We propose a challenging new benchmark for LLM judges: Evaluating debate speeches. Are they comparable to humans? Well... it’s debatable. 🤔 noy-sternlicht.github.io/Debatable-Inte… 👇 Here are our findings:

elvis (@omarsar0) 's Twitter Profile Photo

Evaluating LLM-based Agents This report has a comprehensive list of methods for evaluating AI Agents. Don't ignore evals. If done right, they are a game-changer. Highly recommend it to AI devs. (bookmark it)

Evaluating LLM-based Agents

This report has a comprehensive list of methods for evaluating AI Agents. 

Don't ignore evals. If done right, they are a game-changer.

Highly recommend it to AI devs. (bookmark it)
Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

🚨 Benchmarks tell us which model is better — but not why it fails. For developers, this means tedious, manual error analysis. We're bridging that gap. Meet CLEAR: an open-source tool for actionable error analysis of LLMs. 🧵👇

🚨 Benchmarks tell us which model is better — but not why it fails.

For developers, this means tedious, manual error analysis. We're bridging that gap.

Meet CLEAR: an open-source tool for actionable error analysis of LLMs.

🧵👇