bench (@zed_hat) 's Twitter Profile
bench

@zed_hat

ID: 2332500696

calendar_today07-02-2014 22:43:33

40 Tweet

119 Followers

442 Following

Michele Wang (@michelelwang) 's Twitter Profile Photo

our team at openai is hiring technical staff to build frontier evals for finance. If you're passionate about measuring real-world capabilities, have a love/hate relationship with Excel, or are an ex-banker/ex-investor with technical skills, please reach out!

Bloomberg (@business) 's Twitter Profile Photo

Benchmark has made an uncommonly large bet on Exa, an AI startup that's making a search engine for AIs (not humans) bloomberg.com/news/articles/…

Exa (@exaailabs) 's Twitter Profile Photo

We raised $85M in Series B funding at a $700M valuation, led by Benchmark. Exa is a research lab building the search engine for AI.

Tejal Patwardhan (@tejalpatwardhan) 's Twitter Profile Photo

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.

Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
Michele Wang (@michelelwang) 's Twitter Profile Photo

so excited for GDPval 🚀 our team's first eval measuring frontier models not just on raw intelligence, but on their ability to deliver real professional work across 44 jobs: covering Excel spreadsheets, docs, PDFs, audio and video files, CAD, and more!

Exa (@exaailabs) 's Twitter Profile Photo

Introducing exa-code, a big step towards eliminating LLM code hallucination. We indexed 1B+ docs pages, Github repos, StackOverflow posts, and more. Given a query, exa-code initiates a hybrid search over this data, chunks it, and returns a concatenated, token-efficient string.

QC (@qiaochuyuan) 's Twitter Profile Photo

i didn’t think i’d become one of those “death is what gives meaning to life” people but it’s becoming clearer that human art derives part of its meaning from the way the artist sacrificed part of their finite life to make it. they could have done anything but not everything and

Exa (@exaailabs) 's Twitter Profile Photo

Introducing Exa 2.0 Breakthroughs in our AI research and engineering have enabled us to build both the fastest search API (<350ms) and the highest quality search on the market. Product and technical deep dive below:

Introducing Exa 2.0

Breakthroughs in our AI research and engineering have enabled us to build both the fastest search API (&lt;350ms) and the highest quality search on the market.

Product and technical deep dive below:
Exa (@exaailabs) 's Twitter Profile Photo

Introducing Exa 2.1 We scaled our pre-training and test-time compute by an order of magnitude, unlocking frontier search API performance for both fast and agentic search. Deep dive below:

Introducing Exa 2.1

We scaled our pre-training and test-time compute by an order of magnitude, unlocking frontier search API performance for both fast and agentic search.

Deep dive below:
Michele Wang (@michelelwang) 's Twitter Profile Photo

when we first created this investment banking modeling eval, my PE friends would say how GPT-4o couldn't even convert P&L screenshots into Excel. how far we have come... huge shoutout to the whole team on this amazing model!!! make sure to select 5.2-thinking or 5.2-pro when

when we first created this investment banking modeling eval, my PE friends would say how GPT-4o couldn't even convert P&amp;L screenshots into Excel. 

how far we have come... huge shoutout to the whole team on this amazing model!!! 

make sure to select 5.2-thinking or 5.2-pro when
Samuel Marks (@saprmarks) 's Twitter Profile Photo

New open-source, agentic tool for building behavioral evals for AIs. Just: 1. Describe the behavior you're interested in (e.g. "Does the AI sycophantically affirm user beliefs?") 2. Refine the evaluation by giving feedback to the agent. 3. Profit.

Exa (@exaailabs) 's Twitter Profile Photo

What does it take to store the web as a database? exa-d is our internal data framework that orchestrates declarative typed dependencies, sparse updates with precise granularity, efficient and parallel execution across scaling compute, and more. exa.ai/blog/exa-d

Flapping Airplanes (@flappyairplanes) 's Twitter Profile Photo

Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.

Exa (@exaailabs) 's Twitter Profile Photo

Introducing Exa Instant: the first sub-200ms search engine. Faster than Google, it's custom built to power realtime AI products like chat and voice.

Bernie Sanders (@sensanders) 's Twitter Profile Photo

Walk into a sandwich shop. It’s regulated for health and safety. But AI, which will transform the world economically and socially, is completely unregulated. That’s insane. We need to make certain that AI works for ALL humanity, not just the billionaires who own it.

Exa (@exaailabs) 's Twitter Profile Photo

Exa powers most of the popular coding agents. We wrote about how we build and evaluate coding-related web search. Blog and open evals: exa.ai/blog/webcode

Exa powers most of the popular coding agents. We wrote about how we build and evaluate coding-related web search. 

Blog and open evals: exa.ai/blog/webcode
Bernie Sanders (@sensanders) 's Twitter Profile Photo

Uncontrolled AI poses a severe danger to all of humanity. On Wednesday, I'll be hosting a discussion with leading AI scientists from the US and China about the need for international cooperation against this existential threat. This is an enormously important issue. Join us.

Uncontrolled AI poses a severe danger to all of humanity.

On Wednesday, I'll be hosting a discussion with leading AI scientists from the US and China about the need for international cooperation against this existential threat. This is an enormously important issue. Join us.