Amy Lu (@amyxlu) 's Twitter Profile
Amy Lu

@amyxlu

CS PhD student @berkeley_ai, AI for drug discovery @PrescientDesign. Prev: @GoogleAI @insitro @UofT 🇨🇦

ID: 741271352439672833

linkhttps://amyxlu.github.io/ calendar_today10-06-2016 14:10:37

571 Tweet

2,2K Followers

1,1K Following

Amy Lu (@amyxlu) 's Twitter Profile Photo

FWIW: I personally don't think a scaling "wall" is conclusive, but I do think that the signal-to-noise ratio in language >> images >>> proteins >> DNA. So the LLM / "compressor" should be more intentional, esp. since mutation assays look at such remarkably finegrained details

Amy Lu (@amyxlu) 's Twitter Profile Photo

I think we started using transformers in ~2019 because protein and DNA also have global and local patterns, like language. Maybe somewhere along the way we forgot that some biological tasks don’t neatly fit self-attention’s strengths for and are mostly local patterns (for ex.,

Amy Lu (@amyxlu) 's Twitter Profile Photo

Deadline is on May 26 AoE!! Best paper awards are awarded for each track (incl. the AI for Science track) ✏️🤖🧪🗺️

Ahmed Alaa (@_ahmedmalaa) 's Twitter Profile Photo

Our Rising Stars series is back with Paula Nicoleta Gradu sharing a control-theoretic perspective on medication tapering problems! Link: youtu.be/IP9unMmvbNE?si…

Our Rising Stars series is back with <a href="/paula_gradu/">Paula Nicoleta Gradu</a>  sharing a control-theoretic perspective on medication tapering problems!

Link: youtu.be/IP9unMmvbNE?si…
Amy Lu (@amyxlu) 's Twitter Profile Photo

It’s finally happening!!! Diffusion is so much more satisfying than autoregressive for protein & DNA sequences that don’t really have directionality 🥹 Waiting for this to empirically land & replace BERT/one-step discrete diffusion for protein foundation models 👀

Biology+AI Daily (@biologyaidaily) 's Twitter Profile Photo

Flash Invariant Point Attention 1.FlashIPA introduces a linear-scaling reformulation of Invariant Point Attention (IPA), a core algorithm in protein and RNA structure modeling. It achieves SE(3)-invariant geometry-aware attention with dramatically reduced memory and runtime,

Flash Invariant Point Attention

1.FlashIPA introduces a linear-scaling reformulation of Invariant Point Attention (IPA), a core algorithm in protein and RNA structure modeling. It achieves SE(3)-invariant geometry-aware attention with dramatically reduced memory and runtime,
Amy Lu (@amyxlu) 's Twitter Profile Photo

also to be clear!! some of the best academic works in discrete diffusion have been for proteins/molecules/DNA. Point here is Google/Inception etc fully investing in this could bear more fruit than at first glance Also this idea of scratch pad / baked-in error correction /

Kevin Frans (@kvfrans) 's Twitter Profile Photo

Over the past year, I've been compiling some "alchemist's notes" on deep learning. Right now it covers basic optimization, architectures, and generative models. Focus is on learnability -- each page has nice graphics and an end-to-end implementation. notes.kvfrans.com

Over the past year, I've been compiling some "alchemist's notes" on deep learning. Right now it covers basic optimization, architectures, and generative models.

Focus is on learnability -- each page has nice graphics and an end-to-end implementation.

notes.kvfrans.com
Amy Lu (@amyxlu) 's Twitter Profile Photo

Submission deadline is now **May 31 AoE**! Best Paper awards are given for the robotics, RL theory, language modeling, and AI for Science tracks. Exploration is an evolving and expanding scope of research -- excited to explore (ha ha) the intersections together 🗺️🧭🤖🧪

Amy Lu (@amyxlu) 's Twitter Profile Photo

MFW people are surprised that scaling up transformer-based protein language models didn't help with hella high-resolution variant effect fitness prediction tasks

Albert Gu (@_albertgu) 's Twitter Profile Photo

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence.

Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Deepak Pathak (@pathak2206) 's Twitter Profile Photo

Thrilled to finally release this study! 🚀 We view (discrete) diffusion models as implicitly doing data augmentation over autoregressive. Through this lens, we find that diffusion outperforms AR in data-constrained settings, but it requires larger models and way more epochs to

Thrilled to finally release this study! 🚀 We view (discrete) diffusion models as implicitly doing data augmentation over autoregressive. Through this lens, we find that diffusion outperforms AR in data-constrained settings, but it requires larger models and way more epochs to
Anshul Kundaje (anshulkundaje@bluesky) (@anshulkundaje) 's Twitter Profile Photo

Data diversity, quality & relevance rules over model size any day of the week. Very clever approach of generating synthetic protein sequences from backbone structures to give big boosts to pLMs.