Joe Fox (@josephdfox) 's Twitter Profile
Joe Fox

@josephdfox

I research entrepreneurship, angel investing, startups, iatrogenesis. Following calling w @Alexakaye3 & thanking God for it. Like != endorsement

ID: 84117383

calendar_today21-10-2009 17:12:06

1,1K Tweet

1,1K Takipçi

2,2K Takip Edilen

Niko McCarty 🧫 (@nikomccarty) 's Twitter Profile Photo

Mixtures of engineered bacteria were able to: - Identify if a number is prime - Check if a letter in a string is a vowel - Determine the max number of pieces of a pie obtained from n straight cuts. Answers are printed by expressing fluorescent proteins in different patterns.

Mixtures of engineered bacteria were able to:

- Identify if a number is prime
- Check if a letter in a string is a vowel
- Determine the max number of pieces of a pie obtained from n straight cuts.

Answers are printed by expressing fluorescent proteins in different patterns.
Haize Labs (@haizelabs) 's Twitter Profile Photo

We're excited to share our new preprint introducing endless jailbreaks via bijection learning. Our attack exploits the advanced reasoning abilities of frontier LLMs like GPT-4o and Claude 3.5 Sonnet, revealing a critical model vulnerability that arises from capabilities.

We're excited to share our new preprint introducing endless jailbreaks via bijection learning. 

Our attack exploits the advanced reasoning abilities of frontier LLMs like GPT-4o and Claude 3.5 Sonnet, revealing a critical model vulnerability that arises from capabilities.
Ziang Xiao (@ziangxiao) 's Twitter Profile Photo

Be aware if you plan to derive anything about human behaviors with "LLM participants." In this 100-page paper, we show how current LLM-generated psychometrics responses cannot capture nuances where human individuality resides and how to evaluate it properly. #AI4SocialScience

Joe Fox (@josephdfox) 's Twitter Profile Photo

This paper + huggingface.co/datasets/proj-… release/discussion recently, as well as other surveys on personality adherence in LLMs (i.e. arxiv.org/pdf/2406.01171) are critical to anyone trying to simulate customer personas or potential customers in the innovation space.

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Our paper on individual neurons that regulate an LLM's confidence was accepted to NeurIPS! Great work by Alessandro Stolfo and Ben Wu @ICLR Check it out if you want to learn about wild mechanisms, that exploit LayerNorm's non-linearity and the null space of the unembedding, productively!

Christoph Riedl (@criedl) 's Twitter Profile Photo

Large study shows humans can learn from AI feedback but access to AI also amplifies existing inequalities by increasing the skill gap and reduces intellectual diversity: everyone learns to specialize in the same areas arxiv.org/abs/2409.18660

Large study shows humans can learn from AI feedback but access to AI also amplifies existing inequalities by increasing the skill gap and reduces intellectual diversity: everyone learns to specialize in the same areas arxiv.org/abs/2409.18660
Joe Fox (@josephdfox) 's Twitter Profile Photo

The thing I was most excited about in the OpenAI dev day talk was their improved Evals offerings that include integration of more customizable testing criteria. There are a lot of tools for this outside of OAI, but neat to see as part of the whole offering, Factuality, sentiment,

Ziqian Zhong (@fjzzq2002) 's Twitter Profile Photo

🧙‍♂️ Does all of a transformer's magic come from training? In our NeurIPS 2024 paper, we discovered that for many tasks, merely training the embedding and unembedding layers of transformers yields surprisingly strong performance! A thread 🧵

🧙‍♂️ Does all of a transformer's magic come from training?
In our NeurIPS 2024 paper, we discovered that for many tasks, merely training the embedding and unembedding layers of transformers yields surprisingly strong performance! A thread 🧵
Nora Belrose (@norabelrose) 's Twitter Profile Photo

We generate explanations for millions of features extracted from Llama 3.1 and Gemma. You can download them at huggingface.co/datasets/Eleut…. Our analysis confirms that SAE latents are much more interpretable than neurons, even when neurons are sparsified using top-k postprocessing.

Joe Fox (@josephdfox) 's Twitter Profile Photo

I like the commentary on the approach of feature overlap when looking at SAEs trained on residual streams vs MLPs and implications for interpretability choices when on a budget.

Marcel Binz (@marcel_binz) 's Twitter Profile Photo

Excited to announce Centaur -- the first foundation model of human cognition. Centaur can predict and simulate human behavior in any experiment expressible in natural language. You can readily download the model from Hugging Face and test it yourself: huggingface.co/marcelbinz/Lla…

Decart (@decartai) 's Twitter Profile Photo

1/ We are excited to introduce Oasis, the world's first real-time AI world model, developed in collaboration with Etched. Imagine a video game entirely generated by AI, or a video you can interact with—constantly rendered at 20 fps, in real-time, with zero latency

Yanzhe Zhang (@stevenyzzhang) 's Twitter Profile Photo

Humans sometimes get distracted by pop-ups… but for AI agents, it’s worse! Pop-ups explicitly designed for agents can make them click 87% of the time, majorly derailing their tasks. Tao Yu Diyi Yang arxiv.org/abs/2411.02391 github.com/SALT-NLP/Popup…

Humans sometimes get distracted by pop-ups… but for AI agents, it’s worse!

Pop-ups explicitly designed for agents can make them click 87% of the time, majorly derailing their tasks.

<a href="/taoyds/">Tao Yu</a> <a href="/Diyi_Yang/">Diyi Yang</a> 

arxiv.org/abs/2411.02391
github.com/SALT-NLP/Popup…
Jacob Farrar (@jacobmfarrar) 's Twitter Profile Photo

What a blast! A huge thanks to @zipsmbb Head Coach John Groce for joining us on this week's episode of Zips Nation Insider. We talked so long that this one will probably be a 2-parter. The episode will stream on all of 330ToGO platforms.

Goodfire (@goodfireai) 's Twitter Profile Photo

We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are, to the best of our knowledge, the first open-source SAEs for models at this scale and capability level.

We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are, to the best of our knowledge, the first open-source SAEs for models at this scale and capability level.