Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

IRENE was built upon a unified transformer-based architecture (also the basic arch of ChatGPT).

We show that such a neat, easy-to-implement solution works surprisingly well in multimodal medical AI!

🧵 2/4

IRENE was built upon a unified transformer-based architecture (also the basic arch of ChatGPT). 

We show that such a neat, easy-to-implement solution works surprisingly well in multimodal medical AI!

🧵 2/4
account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

IRENE outperforms previous image-only and non-unified multimodal diagnosis models by large margins in the identification of pulmonary disease (by 12% and 9%, respectively) and adverse clinical outcomes in patients with COVID-19 (by 29% and 7%, respectively).

🧵 4/4

IRENE outperforms previous image-only and non-unified multimodal diagnosis models by large margins in the identification of pulmonary disease (by 12% and 9%, respectively) and adverse clinical outcomes in patients with COVID-19 (by 29% and 7%, respectively).

🧵 4/4
account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

IRENE addresses the differences among modalities with bidirectional multimodal attention.

The proposed attention bridges the gap between token-level modality-specific features and high-level diagnosis-oriented holistic representations.

🧵 3/4

IRENE addresses the differences among modalities with bidirectional multimodal attention.

The proposed attention bridges the gap between token-level modality-specific features and high-level diagnosis-oriented holistic representations.

🧵 3/4
account_circle
Kevin K. Yang 楊凱筌(@KevinKaichuang) 's Twitter Profile Photo

Combine PubMedBERT to encode natural-language protein annotations with masked language modeling of the protein sequence for pretraining.

Hong-Yu Zhou

biorxiv.org/content/10.110…

Combine PubMedBERT to encode natural-language protein annotations with masked language modeling of the protein sequence for pretraining. 

@HongYuZhou14

biorxiv.org/content/10.110…
account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

Amazing efforts! MIMIC-CXR has greatly advanced the research in medical imaging (of course including mine😺). Now, it is time to turn to the multi-modal perspective!🤪

account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

Really love the idea of performing evaluation based on radiologists' feedback. An important step towards generating reliable radiology reports with machine learning.

account_circle
Eric Topol(@EricTopol) 's Twitter Profile Photo

A bit of versatility😉
'We introduce MedInterp, the largest multimodal dataset to date for medical image interpretation, consisting of over 13 million annotated instances spanning 11 tasks across 3 modalities'
arxiv.org/abs/2405.07988
Pranav Rajpurkar Julián Nicolás Acosta Hong-Yu Zhou

A bit of #AI  versatility😉
'We introduce MedInterp, the largest multimodal dataset to date for medical image interpretation, consisting of over 13 million annotated instances spanning 11 tasks across 3 modalities'
arxiv.org/abs/2405.07988
@pranavrajpurkar @jn_acosta @HongYuZhou14
account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

We announced MedVersa, a generalist AI that excels in multifaceted medical image interpretation! 🚀🩺

👉arxiv.org/abs/2405.07988

MedVersa has two promising features🧐:

1. Learning from vision and language supervision. This maximizes the flexibility of the framework. Imagine

account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

Really love the phrase 'opportunistic screening'. It is a field with great potential that AI can play a vital role!
👇

account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

Two papers on multi-modal & self-supervised learning for biomedicine were accepted by ICLR 2023:
X-ray+Radiology report:
arxiv.org/abs/2301.13155
Protein+Gene ontology:
arxiv.org/abs/2301.13154
For both papers, we released codes (pre-training & fine-tuning) and pre-trained models.

account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

For generative LLMs like GPT, the characteristic of being non-deterministic is preventing them from being a reliable tool for discriminative tasks.

account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

Thank Kevin for sharing our paper which was accepted by ICLR'23! Our appoach can encode knowledge graphs into pre-trained models via a masked modeling objective. The whole framework is easy to implement. Code and models are available at github.com/RL4M/KeAP.

account_circle
Raffaele Di Giacomo, PhD(@sciqst) 's Twitter Profile Photo

Eric Topol Pranav Rajpurkar Julián Nicolás Acosta Hong-Yu Zhou Cool development in for ! MedInterp looks like a game-changer for medical image interpretation with its extensive dataset and coverage of multiple modalities. Such innovations could significantly enhance diagnostic accuracy and efficiency.

For more deep dives and

account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

In arxiv.org/abs/2301.13154, we showed that using cross-attention can be a simple yet effective way to perform pre-training on multi-modal data. The whole training process can be simply guided by a masked modeling objective!

account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

Both papers focus on a common problem: how to leverage multi-modal data to enhance single-modal representations.

In arxiv.org/abs/2301.13155 , we found 'a simple summation operation' is surprisingly effective to bridge the gap between masked image and text modeling.

account_circle
Hong-Yu Zhou(@HongYuZhou14) 's Twitter Profile Photo

Compared to PCRLv1 (at ICCV'21), PCRLv2 has following merits:

1. Simpler implementation (no mixup and attention modules).

2. Faster pre-training speed (2x speed up). < 24 hours on 4 NVIDIA TITAN V (12G MEM each).

3. Consistent performance gains on 5 well-known datasets.

3/3

account_circle