Yunha Hwang (@micro_yunha) 's Twitter Profile
Yunha Hwang

@micro_yunha

Building genomic intelligence @tatta_bio

ID: 1125797908262027264

linkhttps://www.yunhahwang.com/ calendar_today07-05-2019 16:21:52

510 Tweet

1,1K Followers

1,1K Following

Sergey Ovchinnikov (@sokrypton) 's Twitter Profile Photo

Testing on more examples (below is toxin/anti-toxin) Note: Notebook currently only works on modern GPUs (L4, A100, etc.) due to flash attention requirements, which requires a colab pro subscription. colab.research.google.com/github/sokrypt…

Testing on more examples (below is toxin/anti-toxin)

Note: Notebook currently only works on modern GPUs (L4, A100, etc.) due to flash attention requirements, which requires a colab pro subscription.

colab.research.google.com/github/sokrypt…
Sergey Ovchinnikov (@sokrypton) 's Twitter Profile Photo

Arne Elofsson @arneelof.bsky.social Yes, it's all about the data you train on. Ultimately, we believe pLM/gLMs are just storing coevolution matrices. ESM2 was never provided with pairs of proteins, so it never "stored" this information (unless the pair of "proteins" are also "domains" in another orgranism). (1/2)

Sergey Ovchinnikov (@sokrypton) 's Twitter Profile Photo

Patrick Bryant We expect any LLM (any task) to be highly dependent on space it was trained on. This was the motivation behind the OMG database and subsequent training, to diversify and expand to new sequence space (new protein families not in uniprot or multi-protein-spanning sequences).

Andre Cornman (@ancornman1) 's Twitter Profile Photo

We are releasing OMG📷, an Open MetaGenomic dataset on Hugging Face. Similar to FineWeb for NLP, OMG is a massive dataset for open-science in genomics. We train a genomic language model gLM2 on OMG, demonstrating new capabilities like unsupervised protein-protein interaction.

Leo Zang (@leotz03) 's Twitter Profile Photo

Large protein databases reveal structural complementarity and functional locality - Cluster AFDB with FoldSeek, Annotate with deepFRI, Generate embeddings with Geometricus and Use PaCAMP for dimension reduction Preprint: biorxiv.org/content/10.110…

Large protein databases reveal structural complementarity and functional locality
- Cluster AFDB with FoldSeek, Annotate with deepFRI, Generate embeddings with Geometricus and Use PaCAMP for dimension reduction
Preprint: biorxiv.org/content/10.110…
Atteo (@mozarellapesto) 's Twitter Profile Photo

Thanks to Sergey Ovchinnikov for the colab. These results are super cool... This is a homotetramer, I used colabfold to predict the structure and find interfaces, then overlayed onto the co-evolution plot and the results, although rough, are pretty interesting

Thanks to <a href="/sokrypton/">Sergey Ovchinnikov</a> for the colab. These results are super cool...
This is a homotetramer, I used colabfold to predict the structure and find interfaces, then overlayed onto the co-evolution plot and the results, although rough, are pretty interesting
Hannes Stärk (@hannesstaerk) 's Twitter Profile Photo

on Monday we chat about the OMG dataset: Open MetaGenomic Corpus for mixed-modality genomic language modeling biorxiv.org/content/10.110… With the author Andre Cornman from Tatta Bio Join us on zoom at 11am EDT / 5pm CEST: portal.valencelabs.com/logg

on Monday we chat about the OMG dataset: Open MetaGenomic Corpus for mixed-modality genomic language modeling biorxiv.org/content/10.110…

With the author <a href="/ancornman1/">Andre Cornman</a> from <a href="/tatta_bio/">Tatta Bio</a> 

Join us on zoom at 11am EDT / 5pm CEST: portal.valencelabs.com/logg
Panoplia Laboratories (@panoplialabs) 's Twitter Profile Photo

See our work profiled in the new Asimov Press pandemic prevention mini-issue! Learn why we think antivirals are needed for "Day Zero" of the next pandemic – and get a glimpse into the research that we + others are carrying out to try and make that possible.

Nishant Jha (@parambulat0r) 's Twitter Profile Photo

I'm excited to launch the free tier of PlatePlanner! PlatePlanner is a tool for research scientists that helps them quickly create beautiful platemaps.

Leo Zang (@leotz03) 's Twitter Profile Photo

Fine-tuning protein language models boosts predictions across diverse tasks | Nature Communications - Finetune pLMs (ESM2, ProtT5, Ankh) on different tasks (GB1, GFP, AAV, Location, Meltome, Stability, Disorder Prediction, and Secondary Structure Prediction) - Explore various PEFT

Fine-tuning protein language models boosts predictions across diverse tasks | <a href="/NatureComms/">Nature Communications</a> 
- Finetune pLMs (ESM2, ProtT5, Ankh) on different tasks (GB1, GFP, AAV, Location, Meltome, Stability, Disorder Prediction, and Secondary Structure Prediction)
- Explore various PEFT
Meg T (she/her/hers) (@megthescientist) 's Twitter Profile Photo

September 3rd — Kaiyi Jiang , EVOLVEpro September 17th — Jeff Ruffolo , ProseLM October 1st — Amy Lu , CHEAP October 15th — Kapil Devkota , Ray-gun October 29th — Andre Cornman , The OMG dataset & gLM With more announced soon✨

Heng Li (@lh3lh3) 's Twitter Profile Photo

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB arxiv.org/abs/2409.00613
Melania Nowicka (@melanianowicka) 's Twitter Profile Photo

🎉Our paper 'Beware of data leakage from protein LLM pretraining' was accepted at #MLCB2024! Meet Leon and Tobias at the spotlight talk and poster session on Thursday in Seattle to chat about how to address this important problem!! Jakub Bartoszewicz x.com/jmbartoszewicz…

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

We’re presenting AlphaProteo: an AI system for designing novel proteins that bind more successfully to target molecules. 🧬 It could help scientists better understand how biological systems function, save time in research, advance drug design and more. 🧵 dpmd.ai/3XuMqbX