Ross Taylor (@rosstaylor90) Twitter Tweets • TwiCopy

Ross Taylor

@rosstaylor90

+ Follow

Ship against the dying of the light. @GenReasoning Prev: reasoning lead @MetaAI, LLaMA 2/3, @paperswithcode co-creator, Galactica LLM lead, cofo Atlas ML (acq)

ID: 524807755

linkhttp://rossjtaylor.com calendar_today14-03-2012 22:51:10

2,2K Tweet

8,8K Followers

1,1K Following

Ross Taylor

@rosstaylor90

2 months ago

Really impressive! I’m glad we’re on the verge of a good surfing simulation…. (with eye tracking input soon? :))

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

We need a version of the Needle for model releases: - Day 1 benchmarks (SWE-Bench Verified, MMLU Pro, HLE) - the model predictably looks good; needle points to “good vibes”. - Day 2 benchmarks (Pelicans, EQBench…and eval bizarro world) - the model underperforms: the needle

thumb_up_off_alt109

chat_bubble_outline7

repeat6

shareShare

Ross Taylor

@rosstaylor90

2 months ago

Give me a good eval, and I shall move the world. - Archimedes (apocryphal)

thumb_up_off_alt30

chat_bubble_outline0

repeat1

shareShare

Ross Taylor

@rosstaylor90

2 months ago

Most takes on RL environments are bad. 1. There are hardly any high-quality RL environments and evals available. Most agentic environments and evals are flawed when you look at the details. It’s a crisis: and no one is talking about it because they’re being hoodwinked by labs

thumb_up_off_alt701

chat_bubble_outline30

repeat46

shareShare

Ross Taylor

@rosstaylor90

a month ago

Never scale before you learn.

thumb_up_off_alt25

chat_bubble_outline1

repeat1

shareShare

Ross Taylor

@rosstaylor90

a month ago

This was an LLM wars subtweet, and is both right and wrong in different ways. It’s wrong in the sense that you learn things at scale that you wouldn’t learn otherwise - so, on the contrary, scaling allows you to learn. You don’t want to overoptimise for lessons at smaller scales

thumb_up_off_alt28

chat_bubble_outline2

repeat0

shareShare

Ross Taylor

@rosstaylor90

a month ago

Quick hiring call. We’re looking for full stack engineers to join our growing team at General Reasoning. We have more work than hands at the moment - a nice problem to have! - and are working with clients on some groundbreaking projects (the most excited I’ve been since my early LLM

Quick hiring call. We’re looking for full stack engineers to join our growing team at <a href="/GenReasoning/">General Reasoning</a>.

We have more work than hands at the moment - a nice problem to have! - and are working with clients on some groundbreaking projects (the most excited I’ve been since my early LLM

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Taco Cohen

@tacocohen

a month ago

Last week we found an issue with SWE-Bench, allowing agents to cheat by looking at future commits. Instead of celebrating the SWE-Bench Devs for quickly fixing the issue and being transparent, the HN crowd is dunking on them and drawing wildly inaccurate conclusions about

thumb_up_off_alt223

chat_bubble_outline7

repeat22

shareShare