Thomas Sounack (@tsounack) 's Twitter Profile
Thomas Sounack

@tsounack

AI/ML Engineer @ Dana-Farber Cancer Institute | Stanford alum

ID: 1796616219132661760

calendar_today31-05-2024 18:54:47

25 Tweet

70 Followers

54 Following

Antoine Chaffin (@antoine_chaffin) 's Twitter Profile Photo

You can just continue pre-train things ✨ Happy to announce the release of BioClinical ModernBERT, a ModernBERT model whose pre-training has been continued on medical data The result: SOTA performance on various medical tasks with long context support and ModernBERT efficiency

You can just continue pre-train things ✨
Happy to announce the release of BioClinical ModernBERT, a ModernBERT model whose pre-training has been continued on medical data
The result: SOTA performance on various medical tasks with long context support and ModernBERT efficiency
LightOn (@lightonio) 's Twitter Profile Photo

🚀Announcing BioClinical ModernBERT, a SOTA encoder for healthcare AI, developed by Thomas Sounack Thomas Sounack for Dana-Farber Cancer Institute in collaboration with Harvard University, LightOn, Massachusetts Institute of Technology (MIT), McGill University, Albany Med Health System, Microsoft Research. Seamless continued pre-training enables SOTA

🚀Announcing BioClinical ModernBERT, a SOTA encoder for healthcare AI, developed by Thomas Sounack <a href="/tsounack/">Thomas Sounack</a> for Dana-Farber Cancer Institute in collaboration with <a href="/Harvard/">Harvard University</a>, <a href="/LightOnIO/">LightOn</a>, <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a>, <a href="/mcgillu/">McGill University</a>, <a href="/AlbanyMed/">Albany Med Health System</a>, <a href="/MSFTResearch/">Microsoft Research</a>.

Seamless continued pre-training enables SOTA
Jeremy Howard (@jeremyphoward) 's Twitter Profile Photo

Your daily reminder that fine tuning is just continued pretraining. Super cool results from Antoine Chaffin who is putting this knowledge into practice to improve medical AI:

Josh Davis (@joshp_davis) 's Twitter Profile Photo

BioClinical ModernBERT is out! Built on the largest, most diverse biomedical/clinical dataset to date ‼️Delivers SOTA across the board Thrilled to be part of this effort led by Thomas Sounack

Mike Dupont (@introsp3ctor) 's Twitter Profile Photo

codepen.io/jmikedupont2/p… colab.research.google.com/drive/1uSx8yYZ… next demo visualizing BioClinical-ModernBERT-base embeddings on a sphere

𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8) 's Twitter Profile Photo

BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP → Built on ModernBERT with 8K context, RoPE, and fast unpadded inference Trained via two-phase continued pretraining: - Phase 1: 160.5B tokens (PubMed + PMC + 20 diverse clinical

BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP

→ Built on ModernBERT with 8K context, RoPE, and fast unpadded inference

Trained via two-phase continued pretraining:

- Phase 1: 160.5B tokens (PubMed + PMC + 20 diverse clinical
Thomas Sounack (@tsounack) 's Twitter Profile Photo

Exciting to see BioClinical ModernBERT (base) ranked #2 among trending fill-mask models - right after BERT! The large version is currently at #4. Grateful for the interest, and can’t wait to see what projects people apply it to!

Exciting to see BioClinical ModernBERT (base) ranked #2 among trending fill-mask models - right after BERT!

The large version is currently at #4.

Grateful for the interest, and can’t wait to see what projects people apply it to!