Tim Woelfle (@timwoelfle) 's Twitter Profile
Tim Woelfle

@timwoelfle

Neurology resident interested in the intersection of health and artificial intelligence: digital biomarkers, pragmatic trials, wearables, reproducible research

ID: 239842085

linkhttps://timwoelfle.de calendar_today18-01-2011 15:15:22

131 Tweet

158 Takipçi

94 Takip Edilen

Lars G. Hemkens (@lghemkens) 's Twitter Profile Photo

We tested how we can best collaborate with AI to do systematic reviews, meta-research or asses study designs - fantastic team and teamwork, thank you Tim Woelfle et al!!

David Nunan (@dnunan79) 's Twitter Profile Photo

The study I was waiting for (and knew would be done). Echoes my (disappointing) experience of Cochrane RoB using GPT4 And don’t ask it to do anything around data integrity checks!

Adam Rodman (@adamrodmanmd) 's Twitter Profile Photo

Fantastic study (and great research methodology) on the abilities of LLMs to perform evidence appraisal. Certainly something a lot of us have been hoping for. TL;DR: humans outperform LLMs alone, but human + AI performs quite well in some settings.

Kari Tikkinen (@karitikkinen) 's Twitter Profile Photo

”Current LLMs alone appraised evidence worse than humans. Human-AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.” #EBM #AI

Tim Woelfle (@timwoelfle) 's Twitter Profile Photo

Great study benchmarking LLMs on clinical oncology questions! They employ some similar techniques as we do, in particular the consistency approach on repeated prompts. The self-assessed confidence is a very interesting approach I'd like to see more in the future.

Yann LeCun (@ylecun) 's Twitter Profile Photo

As long as AI systems are trained to reproduce human-generated data (e.g. text) and have no search/planning/reasoning capability, performance will saturate below or around human level. Furthermore, the amount of trials needed to reach that level will be far larger than the

The BMJ (@bmj_latest) 's Twitter Profile Photo

Recommendations for researchers on when and how to conduct citation searching and how to report it bmj.com/content/385/bm…

Gordon H. Guyatt (@guyattgh) 's Twitter Profile Photo

Why were so few RCTs done to find out optimal COVID control strategies (masks, isolation)? Why so few RCTs of educational strategies? We conduct uncontrolled experiments over & over, remain in the dark. Cultural change to accept RCTs outside conventional medicine urgently needed.

Max Welling (@wellingmax) 's Twitter Profile Photo

Shall we please stop worrying about rogue AI and instead worry about the Atlantic Overturning Circulation crossing a tipping point. It seems close and would make Europe basically unlivable. (Thanks to Jonas Köhler for the link) youtu.be/ZHNNW8c_FaA?si…

Tim Woelfle (@timwoelfle) 's Twitter Profile Photo

Our work "Benchmarking Human-AI Collaboration for Common Evidence Appraisal Tools" is published in Journal of Clinical Epidemiology! doi.org/10.1016/j.jcli… Evidence appraisal tools are very resource intensive but LLMs may assist human raters. Wonder how OpenAI's o1 & Meta's Llama 3.1 will perform?

Our work "Benchmarking Human-AI Collaboration for Common Evidence Appraisal Tools" is published in <a href="/JClinEpi/">Journal of Clinical Epidemiology</a>! doi.org/10.1016/j.jcli…

Evidence appraisal tools are very resource intensive but LLMs may assist human raters. Wonder how <a href="/OpenAI/">OpenAI</a>'s o1 &amp; <a href="/Meta/">Meta</a>'s Llama 3.1 will perform?
Lars G. Hemkens (@lghemkens) 's Twitter Profile Photo

Human-AI collaboration may save time for a second human rater for reporting and bias assessments. We tested Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B. Wonderful work led by Tim Woelfle published in Journal of Clinical Epidemiology jclinepi.com/article/S0895-…

RC2NB (@rc2nb) 's Twitter Profile Photo

🚀 Excited to share our latest Journal of Neurology publication on dreaMS app. Six gamified, adaptive cognitive tests (<10 min) improve sensitivity to change by addressing floor/ceiling & practice effects. Big thanks to our team & partners! Read more: link.springer.com/article/10.100…