Tim Woelfle (@timwoelfle) Twitter Tweets • TwiCopy

Tim Woelfle

@timwoelfle

+ Follow

Neurology resident interested in the intersection of health and artificial intelligence: digital biomarkers, pragmatic trials, wearables, reproducible research

ID: 239842085

linkhttps://timwoelfle.de calendar_today18-01-2011 15:15:22

131 Tweet

158 Takipçi

94 Takip Edilen

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Lars G. Hemkens

@lghemkens

a year ago

We tested how we can best collaborate with AI to do systematic reviews, meta-research or asses study designs - fantastic team and teamwork, thank you Tim Woelfle et al!!

thumb_up_off_alt16

chat_bubble_outline2

repeat4

shareShare

David Nunan

@dnunan79

a year ago

The study I was waiting for (and knew would be done). Echoes my (disappointing) experience of Cochrane RoB using GPT4 And don’t ask it to do anything around data integrity checks!

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

Fantastic study (and great research methodology) on the abilities of LLMs to perform evidence appraisal. Certainly something a lot of us have been hoping for. TL;DR: humans outperform LLMs alone, but human + AI performs quite well in some settings.

thumb_up_off_alt12

chat_bubble_outline0

repeat4

shareShare

Kari Tikkinen

@karitikkinen

a year ago

”Current LLMs alone appraised evidence worse than humans. Human-AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.” #EBM #AI

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Tim Woelfle

@timwoelfle

a year ago

Great study benchmarking LLMs on clinical oncology questions! They employ some similar techniques as we do, in particular the consistency approach on repeated prompts. The self-assessed confidence is a very interesting approach I'd like to see more in the future.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Yann LeCun

@ylecun

a year ago

As long as AI systems are trained to reproduce human-generated data (e.g. text) and have no search/planning/reasoning capability, performance will saturate below or around human level. Furthermore, the amount of trials needed to reach that level will be far larger than the

thumb_up_off_alt4,4K

chat_bubble_outline242

repeat757

shareShare

The BMJ

@bmj_latest

a year ago

Recommendations for researchers on when and how to conduct citation searching and how to report it bmj.com/content/385/bm…

thumb_up_off_alt12

chat_bubble_outline3

repeat12

shareShare

Gordon H. Guyatt

@guyattgh

a year ago

Why were so few RCTs done to find out optimal COVID control strategies (masks, isolation)? Why so few RCTs of educational strategies? We conduct uncontrolled experiments over & over, remain in the dark. Cultural change to accept RCTs outside conventional medicine urgently needed.

thumb_up_off_alt57

chat_bubble_outline3

repeat18

shareShare

Max Welling

@wellingmax

a year ago

Shall we please stop worrying about rogue AI and instead worry about the Atlantic Overturning Circulation crossing a tipping point. It seems close and would make Europe basically unlivable. (Thanks to Jonas Köhler for the link) youtu.be/ZHNNW8c_FaA?si…

thumb_up_off_alt223

chat_bubble_outline9

repeat28

shareShare

Tim Woelfle

@timwoelfle

9 months ago

Our work "Benchmarking Human-AI Collaboration for Common Evidence Appraisal Tools" is published in Journal of Clinical Epidemiology! doi.org/10.1016/j.jcli… Evidence appraisal tools are very resource intensive but LLMs may assist human raters. Wonder how OpenAI's o1 & Meta's Llama 3.1 will perform?

Our work "Benchmarking Human-AI Collaboration for Common Evidence Appraisal Tools" is published in <a href="/JClinEpi/">Journal of Clinical Epidemiology</a>! doi.org/10.1016/j.jcli…

Evidence appraisal tools are very resource intensive but LLMs may assist human raters. Wonder how <a href="/OpenAI/">OpenAI</a>'s o1 & <a href="/Meta/">Meta</a>'s Llama 3.1 will perform?

thumb_up_off_alt6

chat_bubble_outline0

repeat5

shareShare

Lars G. Hemkens

@lghemkens

9 months ago

Human-AI collaboration may save time for a second human rater for reporting and bias assessments. We tested Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B. Wonderful work led by Tim Woelfle published in Journal of Clinical Epidemiology jclinepi.com/article/S0895-…

thumb_up_off_alt16

chat_bubble_outline1

repeat4

shareShare

RC2NB

@rc2nb

5 months ago

🚀 Excited to share our latest Journal of Neurology publication on dreaMS app. Six gamified, adaptive cognitive tests (<10 min) improve sensitivity to change by addressing floor/ceiling & practice effects. Big thanks to our team & partners! Read more: link.springer.com/article/10.100…

thumb_up_off_alt4

chat_bubble_outline0

repeat3

shareShare

Tim Woelfle

Gate.io

Lars G. Hemkens

David Nunan

Adam Rodman

Kari Tikkinen

Tim Woelfle

Yann LeCun

The BMJ

Gordon H. Guyatt

Max Welling

Tim Woelfle

Lars G. Hemkens

RC2NB