Eric(@ericmitchellai) 's Twitter Profileg
Eric

@ericmitchellai

I like AI & music. Working on making LLMs easier & safer to use. Final year PhD student at Stanford advised by Chelsea Finn & Chris Manning.

ID:942749627627065344

linkhttps://ericmitchell.ai calendar_today18-12-2017 13:33:21

569 Tweets

3,6K Followers

487 Following

Unum(@unum_cloud) 's Twitter Profile Photo

Multimodal DPO, large & tiny multimodal matryoshka embeddings, and 1st party ONNX support for 10x lighter deployments ๐Ÿฅณ

In partnership with Nebius, Unum is releasing a new set of pocket-sized multimodal models, already available on Hugging Face ๐Ÿค—

Multimodal DPO, large & tiny multimodal matryoshka embeddings, and 1st party ONNX support for 10x lighter deployments ๐Ÿฅณ In partnership with @nebiusofficial, Unum is releasing a new set of pocket-sized multimodal models, already available on @huggingface ๐Ÿค—
account_circle
Eric(@ericmitchellai) 's Twitter Profile Photo

ICML tip: upload your paper pdf and review to Claude/GPT4/Gemini w prompt

'What do you think is the main point of the paper? After answering, please explain to what extent, if any, you think the reviewer has fully understood the main point of the paper.'

Better than therapy ๐Ÿฅน

account_circle
Alexander Khazatsky(@SashaKhazatsky) 's Twitter Profile Photo

After two years, it is my pleasure to introduce โ€œDROID: A Large-Scale In-the-Wild Robot Manipulation Datasetโ€

DROID is the most diverse robotic interaction dataset ever released, including 385 hours of data collected across 564 diverse scenes in real-world households and offices

account_circle
Eric(@ericmitchellai) 's Twitter Profile Photo

See also: database vendors prohibiting publishing benchmarks of their software in their EULAs!

stackoverflow.com/questions/1211โ€ฆ

account_circle
Eric(@ericmitchellai) 's Twitter Profile Photo

More work showing how careful (or careless!) data selection hugely impacts model quality.

DPO (offline RL generally) is powerful, but you still need to train on data worth learning from!

The data specifies the implicit reward function you're ultimately optimizing, after all...

account_circle
Eric(@ericmitchellai) 's Twitter Profile Photo

In light of all of the discussion of 'self-awareness', remember that we are still *explicitly telling* our systems:
- who they are
- what they do and don't know (stuff up to August 2023)
- what their personality/tendencies are

Is personality emergent or... part of the prompt?

account_circle
Eric(@ericmitchellai) 's Twitter Profile Photo

Three models, three different answers ๐Ÿ˜Ž

Claude 3 is AGI confirmed

Separately, what will it take to get a model to actually ask 'do you want the answer for the beginning or end of day 4'? this question as stated is ambiguous

Three models, three different answers ๐Ÿ˜Ž Claude 3 is AGI confirmed Separately, what will it take to get a model to actually ask 'do you want the answer for the beginning or end of day 4'? this question as stated is ambiguous
account_circle
Rohan Taori(@rtaori13) 's Twitter Profile Photo

โ€œOne needs to learn to love and enjoy the little things in life. One also needs to discover oneโ€™s true calling and then should do everything to pursue the selected path,โ€ - wise words Archit Sharma

tribuneindia.com/news/amritsar/โ€ฆ

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

A Critical Evaluation of AI Feedback for Aligning Large Language Models

Shows that the improvements of the RL step of LLM finetuning are virtually entirely due to the widespread practice of using a weaker teacher model (e.g. GPT-3.5) for SFT data collection than the critic

A Critical Evaluation of AI Feedback for Aligning Large Language Models Shows that the improvements of the RL step of LLM finetuning are virtually entirely due to the widespread practice of using a weaker teacher model (e.g. GPT-3.5) for SFT data collection than the critic
account_circle
AK(@_akhaliq) 's Twitter Profile Photo

Stanford presents RLVF

Learning from Verbal Feedback without Overgeneralization

The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences.

Stanford presents RLVF Learning from Verbal Feedback without Overgeneralization The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences.
account_circle