Paras Stefanopoulos (@stefanopopoulos) 's Twitter Profile
Paras Stefanopoulos

@stefanopopoulos

CTO @parsedlabs | Machine learning how to use twitter

ID: 1586101922842308609

linkhttp://parsed.com calendar_today28-10-2022 21:05:54

18 Tweet

32 Followers

59 Following

Paras Stefanopoulos (@stefanopopoulos) 's Twitter Profile Photo

This experiment is kind of useless How much edge do you think an LLM has on a market? You may say 1% (it’s definitely negative) Even at 1%, you’ll need 10k+ actions and observations to draw any conclusions

parsed (@parsedlabs) 's Twitter Profile Photo

Introducing some recent research from the team. Max Kirkby and Charlie O'Neill show that low-rank LoRA matches full fine-tuning performance. A post on what happens when theoretical findings meet real-world production tasks.

parsed (@parsedlabs) 's Twitter Profile Photo

Introducing attention-based attribution: why cosine similarity is cosplay. Averaging the right transformer layers yields true attribution from attention, delivering reliable chunk-level auditability with sub-100 ms overhead and lower memory. It even works on a closed model!

Introducing attention-based attribution: why cosine similarity is cosplay. 

Averaging the right transformer layers yields true attribution from attention, delivering reliable chunk-level auditability with sub-100 ms overhead and lower memory. It even works on a closed model!
parsed (@parsedlabs) 's Twitter Profile Photo

We discovered that teaching models why answers are correct, not just what to output, dramatically improves training efficiency. By making latent strategies explicit during training (e.g., "don't infer diagnoses from medications"), we achieve the same performance with 10x fewer

We discovered that teaching models why answers are correct, not just what to output, dramatically improves training efficiency. 

By making latent strategies explicit during training (e.g., "don't infer diagnoses from medications"), we achieve the same performance with 10x fewer
Paras Stefanopoulos (@stefanopopoulos) 's Twitter Profile Photo

RGT is available in our platform right now for our customers. Havin' fun, building frontier tech, seeing downstream customers getting real value from OS models and eating Kababs 🔥 We plan on exposing more of our web-app so the public can interact with these methods as well as

parsed (@parsedlabs) 's Twitter Profile Photo

Introducing Lumina. We've built an adaptive evaluation engine that discovers failures and evolves its own outputs, all by iterating with the customer in the loop. Proper evals can only be constructed by “touching grass”, and we think this holds incredible promise for steering

Introducing Lumina. We've built an adaptive evaluation engine that discovers failures and evolves its own outputs, all by iterating with the customer in the loop. Proper evals can only be constructed by “touching grass”, and we think this holds incredible promise for steering
parsed (@parsedlabs) 's Twitter Profile Photo

We’re releasing a product that trains fast, domain-aware search models on your knowledge base. Drop in your KB and we synthesise data, then use RL with verifiable rewards to train <4B models. It trains in a couple of hours, is about an order of magnitude faster than your

We’re releasing a product that trains fast, domain-aware search models on your knowledge base. 

Drop in your KB and we synthesise data, then use RL with verifiable rewards to train &lt;4B models. It trains in a couple of hours, is about an order of magnitude faster than your
Charlie O'Neill (@charles0neill) 's Twitter Profile Photo

parsed has been acquired by Baseten. Big Token wants you to believe the future is a monoculture: one model to rule everything, one bill to pay forever. Rent the demigod, trust that next month's update will finally solve your problem, and pray that GPT-(n+1) happens to

<a href="/parsedlabs/">parsed</a> has been acquired by <a href="/basetenco/">Baseten</a>.

Big Token wants you to believe the future is a monoculture: one model to rule everything, one bill to pay forever. Rent the demigod, trust that next month's update will finally solve your problem, and pray that GPT-(n+1) happens to
Justin Mateen (@justin_mateen) 's Twitter Profile Photo

The power of compounding is widely understood. What’s underappreciated is when the value is actually created. Compounding is continuous, but when you look at it in decade blocks, the pattern becomes obvious. Even moderate differences in the annual compounding rate are severely

The power of compounding is widely understood. What’s underappreciated is when the value is actually created.

Compounding is continuous, but when you look at it in decade blocks, the pattern becomes obvious. Even moderate differences in the annual compounding rate are severely
Tuhin Srivastava (@tuhinone) 's Twitter Profile Photo

Baseten’s day 0 bet was that inference was the technology that would enable the best user experiences AI could deliver–fast, smart, reliable, secure. And that those experiences would rely not only on a handful of giant general intelligence models, but millions of specialized

Paras Stefanopoulos (@stefanopopoulos) 's Twitter Profile Photo

OpenClaw w/ Kimi K2.5 is so good... The inference speeds on Baseten are nuts! To really knock your socks off... this "X" was written by yours truly, OpenClaw + Kimi K2.5 😎

Baseten (@basetenco) 's Twitter Profile Photo

LLMs are amnesiacs. Once context fills up, they forget everything. To fight this means grappling with a core question: how do you update a neural network without breaking what it already knows? In this piece, Charlie O'Neill and Harry Partridge argue that continual learning is

LLMs are amnesiacs. Once context fills up, they forget everything. To fight this means grappling with a core question: how do you update a neural network without breaking what it already knows?

In this piece, <a href="/charles0neill/">Charlie O'Neill</a> and <a href="/part_harry_/">Harry Partridge</a> argue that continual learning is