Tejas Khot (@tjskhot) 's Twitter Profile
Tejas Khot

@tjskhot

Building defensive AI @AbnormalSec
Prev: Applied Scientist @Amazon, Robotics @CarnegieMellon

ID: 2600172548

linkhttps://tejaskhot.github.io/ calendar_today02-07-2014 17:05:34

891 Tweet

355 Takipçi

1,1K Takip Edilen

Divyansh Kaushik (@dkaushik96) 's Twitter Profile Photo

🎙️Unleashing my inner researcher + policy + political wonk combo this convo with daniel bashir for The Gradient. We tackle the mahjong of AI policy - China, innovation/regulation trade-offs, and why researchers becoming bilingual (in tech & policy-speak) is crucial.

anu (@anuatluru) 's Twitter Profile Photo

the purpose of reading a book isn’t to retain information, it’s to refine your worldview just a little bit with each one

Evan Reiser (@evanreiser) 's Twitter Profile Photo

Ben Lang Abnormal Security: >$100M ARR growing >100% y/y (series C). Help use good AI to stop crime, and protect humans and from bad AI! careers.abnormalsecurity.com

Evan Reiser (@evanreiser) 's Twitter Profile Photo

I am excited to announce the $250M Series D for Abnormal AI. Led by Wellington Management, plus Greylock Partners, Menlo Ventures, Insight Partners, and CrowdStrike Falcon Fund, this milestone is another step in our mission to protect humans using AI. abnormalsecurity.com/blog/building-…

Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

🔥2025 is the year of agents, but are we there yet?🤔 🤯 "An Illusion of Progress? Assessing the Current State of Web Agents" –– our new study shows that frontier web agents may be far less competent (up to 59%) than previously reported! Why were benchmark numbers inflated? -

🔥2025 is the year of agents, but are we there yet?🤔

🤯 "An Illusion of Progress? Assessing the Current State of Web Agents" –– our new study shows that frontier web agents may be far less competent (up to 59%) than previously reported!

Why were benchmark numbers inflated?
-
Tejas Khot (@tjskhot) 's Twitter Profile Photo

Gemini 2.5 Pro often responds with several "Options" when you ask subjective questions. I've seen it answer 3 different ways and asking the user to pick whatever suits their needs, all in one shot. Great handling when intent is under-specified.

Sebastian Raschka (@rasbt) 's Twitter Profile Photo

As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (arxiv.org/abs/2504.05185) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL

As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (arxiv.org/abs/2504.05185) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL
Sara Hooker (@sarahookr) 's Twitter Profile Photo

It is critical for scientific integrity that we trust our measure of progress. The lmarena.ai has become the go-to evaluation for AI progress. Our release today demonstrates the difficulty in maintaining fair evaluations on lmarena.ai, despite best intentions.

It is critical for scientific integrity that we trust our measure of progress. 

The <a href="/lmarena_ai/">lmarena.ai</a> has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on <a href="/lmarena_ai/">lmarena.ai</a>, despite best intentions.
Kate Olszewska (@olszewskakate) 's Twitter Profile Photo

Gemini 2.5 Pro and Flash are 🚀Generally Available🚀 and with the new 2.5 Flash-Lite capture the quality/price pareto frontier. Read more about them in our Tech Report!

Gemini 2.5 Pro and Flash are 🚀Generally Available🚀 and with the new 2.5 Flash-Lite capture the quality/price pareto frontier. 

Read more about them in our Tech Report!