Ebtesam (@ebtesamdotpy) 's Twitter Profile
Ebtesam

@ebtesamdotpy

AI tools for SE research | CS PhD @GeorgeMasonU @INSPIREDLabGMU | Prev @MSFTResearch

ID: 1451703423271845889

linkhttp://mason.gmu.edu/~ehaque4 calendar_today23-10-2021 00:23:37

33 Tweet

101 Followers

169 Following

François Chollet (@fchollet) 's Twitter Profile Photo

When I got started with programming, I debugged using printf() statements. Today, I debug with print() statements. The purpose of debugging is to correct your mental model of what your code does, and no tool can do that for you. The best any tool can do is provide visibility

MIT CSAIL (@mit_csail) 's Twitter Profile Photo

Happy birthday to Python creator Guido van Rossum. The open source language was named after comedy troupe Monty Python: bit.ly/2B8R7h6 Image v/Midjourney

Happy birthday to Python creator Guido van Rossum. The open source language was named after comedy troupe Monty Python: bit.ly/2B8R7h6

Image v/Midjourney
will depue (in singapore for ICLR) (@willdepue) 's Twitter Profile Photo

I feel like large language model feels a bit reductive when GPT-2 is in the same class as GPT-4. gigantic language models? enormous language models? big ass language models? Nimitz-class language models? better suggestions needed

INSPIRED Lab @ GMU (@inspiredlabgmu) 's Twitter Profile Photo

🚨 Inclusive tech research alert! 🚨 Are you a tech user who identifies as BIPOC (bit.ly/BIPOC_defined)? Or a researcher/practitioner who uses data in your work? Share your experiences in our 20 min. survey→​go.gmu.edu/EngagingTheMar… IRBNet #: 1945546-2 #data #tech #trust

Edward Grefenstette (@egrefen) 's Twitter Profile Photo

Instead, evaluation processes should track the diverse notions of extrinsic utility which are to be found in both everyday usage of our technology today, but also anticipating how people might use technology tomorrow.

Upol Ehsan (@upolehsan) 's Twitter Profile Photo

Is hallucination in LLMs inevitable even with an idealized model architecture and perfect training data? This work argues YES and offers a formal proof. Let's dig in ⤵ 🧵1/n

Is hallucination in LLMs inevitable even with an idealized model architecture and perfect training data? 

This work argues YES and offers a formal proof. 

Let's dig in ⤵ 

🧵1/n
Ishan (@radshaan) 's Twitter Profile Photo

If you get frequent urges to go deep into a subject, do not ignore them Pick a weekend, stop everything else, and give in to the urge Fresh insights await at the other end

Jiaxin Pei (@jiaxin_pei) 's Twitter Profile Photo

It's common to add personas in system prompts, assuming this can help LLMs. However, through analyzing 162 roles x 4 LLMs x 2410 questions, we show that adding a persona mostly has *no* statistically significant difference from the no-persona setting. If there is a difference, it

Diomidis Spinellis (@coolsweng) 's Twitter Profile Photo

Long overdue, a paper finally exposes the Emperor's New “Threats to Validity” Clothes in empirical software engineering research. Even better, it provides suggestions for improving the state of practice.

Long overdue, a paper finally exposes the Emperor's New “Threats to Validity” Clothes in empirical software engineering research. Even better, it provides suggestions for improving the state of practice.
Hamel Husain (@hamelhusain) 's Twitter Profile Photo

New post re: Devin (the AI SWE). We couldn't find many reviews of people using it for real tasks, so we went MKBHD mode and put Devin through its paces. We documented our findings here. Would love to know if others have had a different experience. answer.ai/posts/2025-01-…

New post re: Devin (the AI SWE).  We couldn't find many reviews of people using it for real tasks, so we went MKBHD mode and put Devin through its paces.

We documented our findings here.  Would love to know if others have had a different experience.

answer.ai/posts/2025-01-…
Nabeel S. Qureshi (@nabeelqu) 's Twitter Profile Photo

For the confused, it's actually super easy: - GPT 4.5 is the new Claude 3.6 (aka 3.5) - Claude 3.7 is the new o3-mini-high - Claude Code is the new Cursor - Grok is the new Perplexity - o1 pro is the 'smartest', except for o3, which backs Deep Research Obviously. Keep up.

Sebastian Raschka (@rasbt) 's Twitter Profile Photo

As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (arxiv.org/abs/2504.05185) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL

As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (arxiv.org/abs/2504.05185) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL
Jia-Bin Huang (@jbhuang0604) 's Twitter Profile Photo

Scrolling the AI news timeline as a researcher feels like a teenager browsing Instagram: "Everyone else has figured everything out!" Reliable home robots imminent, 100× productivity AI agents, insane visual generation ... Exciting, but anxiety-inducing. What am I doing? 😬