Joe Fox (@josephdfox) Twitter Tweets • TwiCopy

Niko McCarty 🧫

a year ago

Mixtures of engineered bacteria were able to: - Identify if a number is prime - Check if a letter in a string is a vowel - Determine the max number of pieces of a pie obtained from n straight cuts. Answers are printed by expressing fluorescent proteins in different patterns.

thumb_up_off_alt4,4K

chat_bubble_outline75

repeat512

shareShare

Joe Fox

@josephdfox

a year ago

Airlock looks neat

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Haize Labs

@haizelabs

a year ago

We're excited to share our new preprint introducing endless jailbreaks via bijection learning. Our attack exploits the advanced reasoning abilities of frontier LLMs like GPT-4o and Claude 3.5 Sonnet, revealing a critical model vulnerability that arises from capabilities.

thumb_up_off_alt315

chat_bubble_outline15

repeat46

shareShare

Ziang Xiao

@ziangxiao

a year ago

Be aware if you plan to derive anything about human behaviors with "LLM participants." In this 100-page paper, we show how current LLM-generated psychometrics responses cannot capture nuances where human individuality resides and how to evaluate it properly. #AI4SocialScience

thumb_up_off_alt76

chat_bubble_outline6

repeat18

shareShare

Joe Fox

@josephdfox

a year ago

This paper + huggingface.co/datasets/proj-… release/discussion recently, as well as other surveys on personality adherence in LLMs (i.e. arxiv.org/pdf/2406.01171) are critical to anyone trying to simulate customer personas or potential customers in the innovation space.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Neel Nanda

@neelnanda5

a year ago

Our paper on individual neurons that regulate an LLM's confidence was accepted to NeurIPS! Great work by Alessandro Stolfo and Ben Wu @ICLR Check it out if you want to learn about wild mechanisms, that exploit LayerNorm's non-linearity and the null space of the unembedding, productively!

thumb_up_off_alt148

chat_bubble_outline0

repeat6

shareShare

Christoph Riedl

@criedl

a year ago

Large study shows humans can learn from AI feedback but access to AI also amplifies existing inequalities by increasing the skill gap and reduces intellectual diversity: everyone learns to specialize in the same areas arxiv.org/abs/2409.18660

thumb_up_off_alt74

chat_bubble_outline1

repeat25

shareShare

Joe Fox

@josephdfox

a year ago

The thing I was most excited about in the OpenAI dev day talk was their improved Evals offerings that include integration of more customizable testing criteria. There are a lot of tools for this outside of OAI, but neat to see as part of the whole offering, Factuality, sentiment,

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ziqian Zhong

@fjzzq2002

a year ago

🧙‍♂️ Does all of a transformer's magic come from training? In our NeurIPS 2024 paper, we discovered that for many tasks, merely training the embedding and unembedding layers of transformers yields surprisingly strong performance! A thread 🧵

thumb_up_off_alt104

chat_bubble_outline6

repeat16

shareShare

Nora Belrose

@norabelrose

a year ago

We generate explanations for millions of features extracted from Llama 3.1 and Gemma. You can download them at huggingface.co/datasets/Eleut…. Our analysis confirms that SAE latents are much more interpretable than neurons, even when neurons are sparsified using top-k postprocessing.

thumb_up_off_alt16

chat_bubble_outline2

repeat2

shareShare

Joe Fox

@josephdfox

a year ago

I like the commentary on the approach of feature overlap when looking at SAEs trained on residual streams vs MLPs and implications for interpretability choices when on a budget.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Marcel Binz

@marcel_binz

a year ago

Excited to announce Centaur -- the first foundation model of human cognition. Centaur can predict and simulate human behavior in any experiment expressible in natural language. You can readily download the model from Hugging Face and test it yourself: huggingface.co/marcelbinz/Lla…

thumb_up_off_alt1,1K

chat_bubble_outline41

repeat247

shareShare

Marcel Binz

@marcel_binz

a year ago

The arXiv.org preprint is currently on hold but while waiting you can already download a PDF here: osf.io/preprints/psya…

thumb_up_off_alt61

chat_bubble_outline6

repeat9

shareShare

UA NEXT Conference

@uanextcon

a year ago

This year's swag sponsor is LinkedIn! View the full schedule and register online: uakron.edu/next/ #SwagSponsor #LinkedIn #OnlineLearning #NEXT #Sponsored

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Decart

@decartai

a year ago

1/ We are excited to introduce Oasis, the world's first real-time AI world model, developed in collaboration with Etched. Imagine a video game entirely generated by AI, or a video you can interact with—constantly rendered at 20 fps, in real-time, with zero latency

thumb_up_off_alt1,1K

chat_bubble_outline146

repeat220

shareShare

Yanzhe Zhang

@stevenyzzhang

a year ago

Humans sometimes get distracted by pop-ups… but for AI agents, it’s worse! Pop-ups explicitly designed for agents can make them click 87% of the time, majorly derailing their tasks. Tao Yu Diyi Yang arxiv.org/abs/2411.02391 github.com/SALT-NLP/Popup…

thumb_up_off_alt205

chat_bubble_outline6

repeat55

shareShare

Jacob Farrar

@jacobmfarrar

a year ago

What a blast! A huge thanks to @zipsmbb Head Coach John Groce for joining us on this week's episode of Zips Nation Insider. We talked so long that this one will probably be a 2-parter. The episode will stream on all of 330ToGO platforms.

thumb_up_off_alt19

chat_bubble_outline0

repeat4

shareShare

Goodfire

@goodfireai

10 months ago

We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are, to the best of our knowledge, the first open-source SAEs for models at this scale and capability level.