The Full Stack (@full_stack_dl) Twitter Tweets • TwiCopy

Sergey Karayev

2 years ago

By what year will there be an AI that is more capable than most humans in most domains of digital work (e.g. you can tell it to do anything you currently hire a white collar professional to do, and it does the job better than the median human)?

thumb_up_off_alt2

chat_bubble_outline3

repeat1

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

It feels like something to be you. Do you think it feels like something to be GPT-4?

thumb_up_off_alt4

chat_bubble_outline6

repeat1

shareShare

The Full Stack

@full_stack_dl

2 years ago

This is sadly true! If you want the latest version, come join us in November for our in-person workshop with AI by the Bay scale.bythebay.io/llm-workshop

thumb_up_off_alt19

chat_bubble_outline0

repeat4

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

Is there a service I can use to pipe my GPT-4 calls through, and it automatically finetunes GPT-3.5 (or whatever) on all of them, and lets me know when it's up to par?

thumb_up_off_alt33

chat_bubble_outline8

repeat4

shareShare

Jo Kristian Bergum

@jobergum

2 years ago

Wow - don't miss this!

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

Solutions from replies: - OpenPipe looks exactly right openpipe.ai - Portkey launching feature soon - Saurabh Bhatnagar building his own I currently use Helicone, any plans from them?

thumb_up_off_alt33

chat_bubble_outline4

repeat7

shareShare

The Full Stack

@full_stack_dl

2 years ago

We're live to talk about production AI, LLMs, open source, and more! youtube.com/watch?v=aN3OxH…

thumb_up_off_alt41

chat_bubble_outline0

repeat6

shareShare

AI by the Bay

@scalebythebay

2 years ago

We bring in The Full Stack, a venerable boot camp crew that pioneered technical deep dives into deep learning where people fly in from around the world. 🥞 Their #LLM Bootcamp in the spring was sold out and this is your chance to attend the ➡️ version. 👉 scale.bythebay.io/register

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

Let's say that a US-based research company has developed an AGI model that was able to use the browser, pass captchas, hire people on Upwork, and lie about its intentions. What should they do after observing this?

thumb_up_off_alt20

chat_bubble_outline6

repeat8

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

Has anyone had good experiences with GPT-powered code generation for complete web app features? As in, you describe what should exist, and GPT actually provides the source of all the necessary files and where they should go. Ideally in the context of Ruby on Rails.

thumb_up_off_alt9

chat_bubble_outline9

repeat2

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

Which set of statements do you agree with? 1. AGI is as much or more of a risk to human flourishing as nuclear weapons 2. I have a good idea for what should be done about that

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

Has anyone done comprehensive testing of gpt-4-vision-preview? I want to know stuff like the minimum text size it can read, the radius of the smallest circle it can locate in an image, the number of circles it can count, etc. Could be an automated benchmark for other models too

thumb_up_off_alt13

chat_bubble_outline3

repeat1

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

LLM Provider Comparisons 1. Martian 2. Artificial Analysis 3. Ultravox AI

LLM Provider Comparisons

1. <a href="/withmartian/">Martian</a>
2. <a href="/ArtificialAnlys/">Artificial Analysis</a>
3. <a href="/FixieAI/">Ultravox AI</a>

thumb_up_off_alt23

chat_bubble_outline1

repeat5

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

What percentage of your Twitter feed (the stuff you actually read, not just scroll past) do you believe is currently written by AI?

thumb_up_off_alt1

chat_bubble_outline1

repeat2

shareShare

Sergey Karayev

@sergeykarayev

2 years ago

Several agents plus three simple baselines were tested on HumanEval. Agents were mostly worse and always more expensive than the baselines. The good: · Evaluating the Pareto frontier · Strong simple baselines (just repeated calls!) The bad: · Clearly saturating the benchmark

thumb_up_off_alt17

chat_bubble_outline1

repeat1

shareShare

Sergey Karayev

@sergeykarayev

5 months ago

Is Claude Code still the best coding agent on the market? You can now easily find out by launching Claude, Codex, Gemini, and Amp on every ticket in your codebase:

thumb_up_off_alt25

chat_bubble_outline2

repeat5

shareShare

The Full Stack

@full_stack_dl

5 months ago

Would you be interested in a course or workshop on ✨Building Software with AI Agents✨???

thumb_up_off_alt5

chat_bubble_outline2

repeat2

shareShare