Sam Paech (@sam_paech) 's Twitter Profile
Sam Paech

@sam_paech

AI tinkerer, maintainer of EQ-Bench

ID: 704133968

linkhttps://eqbench.com calendar_today19-07-2012 01:34:16

726 Tweet

1,1K Followers

145 Following

Sam Paech (@sam_paech) 's Twitter Profile Photo

Been seeing some chatter that the new mistral small 3.2 writes a lot like deepseek v3. This analysis of their slop profiles confirms. I think the network representation here makes a bit more sense than the phylo tree, given the complicated nature of model lineages.

Been seeing some chatter that the new mistral small 3.2 writes a lot like deepseek v3. This analysis of their slop profiles confirms.

I think the network representation here makes a bit more sense than the phylo tree, given the complicated nature of model lineages.
Sam Paech (@sam_paech) 's Twitter Profile Photo

Let me spruik Judgemark real quick cuz I think it's neat: It works by measuring *separability* -- the ability to pin down writing ability in a blind test. The evaluated judge grades a set of models' writing outputs and we calc the error bar overlap. Less overlap: better judge.

Let me spruik Judgemark real quick cuz I think it's neat:

It works by measuring *separability* -- the ability to pin down writing ability in a blind test.

The evaluated judge grades a set of models' writing outputs and we calc the error bar overlap.

Less overlap: better judge.