Stanislas Polu(@spolu) 's Twitter Profileg
Stanislas Polu

@spolu

_co-founder+engineer(https://t.co/fCirsLjeo2), _alumni(https://t.co/8jAnpFAkp1, https://t.co/e99AaHzlA0, https://t.co/4jg6knqi2S, https://t.co/kXE6PNf8xH)

ID:10580512

linkhttps://spolu.now.sh calendar_today26-11-2007 02:35:35

8,6K Tweets

13,6K Followers

606 Following

Stanislas Polu(@spolu) 's Twitter Profile Photo

Clear negative correlation between accuracy and reasoning gap. This goes directly against the hypothesis that larger models are more contaminated.

Best news for largest language models in a long time!

WTF is going on with Mistral Large 5 shots without CoT?

account_circle
Stanislas Polu(@spolu) 's Twitter Profile Photo

Next week I'll run a model on all the conversations of the week to estimate (usefulness, time saved or lost in minutes) so that I can compute # of humans saved / week by Dust users :)

account_circle
Stanislas Polu(@spolu) 's Twitter Profile Photo

Semantic search is powerful but bad at quantitative questions (by construction).

To circumvent that, we built Table Queries๐Ÿ““

Any structured data in your company (Google Sheets, Notion DBs, CSVs...) gets turned in to JIT in-memory sqlite DBs that models can query using SQL๐Ÿ‘จโ€๐Ÿซ

account_circle
Stanislas Polu(@spolu) 's Twitter Profile Photo

We made two hard bets with Dust:

- An horizontal platform with access to all the SaaS relied on by our users (Notion, Github, Slack, Drive, Intercom, ...)
- Not one Assistant, but many Assistants specialized on specific tasks.
- Capability to do semantic rertieval but also

account_circle
Stanislas Polu(@spolu) 's Twitter Profile Photo

Anybody tried to make models play chess against one another in standard algebraic notation?

We know models are quite good at it. But who wins?

Mistral-Large vs Claude 2 vs Gemini 1.5 vs GPT-4

account_circle