Hongbo (@hongbo00231523) 's Twitter Profile
Hongbo

@hongbo00231523

PhD student @NlpWestlake, supervised by Prof. Yue Zhang. Previously @cuhksz and @sheffielduni.

ID: 1398104727317934082

calendar_today28-05-2021 02:32:05

9 Tweet

18 Takipçi

141 Takip Edilen

Xin Zhang | 张鑫 (@xinzhangai) 's Twitter Profile Photo

AutoSurvey, a tool designed to help researchers stay updated with the latest advancements in the field of computer science! AutoSurvey can automatically generate comprehensive literature reviews arxiv.org/abs/2406.10252

Guangsheng Bao (@gshbao) 's Twitter Profile Photo

⛄️Excited to share our work on causal analysis of LLMs at COLING 2025!💖Hongbo Linyi Yang Cunxiang Wang "How Likely Do LLMs with CoT Mimic Human Reasoning?" Paper: arxiv.org/pdf/2402.16048

⛄️Excited to share our work on causal analysis of LLMs at COLING 2025!💖<a href="/Hongbo00231523/">Hongbo</a> <a href="/linyi_yang/">Linyi Yang</a> <a href="/CunxiangWang/">Cunxiang Wang</a>

"How Likely Do LLMs with CoT Mimic Human Reasoning?"

Paper: arxiv.org/pdf/2402.16048
Guangsheng Bao (@gshbao) 's Twitter Profile Photo

LLMs often rely on correlations, not causation. ❤️‍🔥 Our causal analyses show that RLVR-trained LRMs move closer to true causal reasoning — but distilled LRMs and LLMs do not⁉️ 🧠 Paper: "Correlation or Causation?" 📘 [arxiv.org/pdf/2509.17380](arxiv.org/pdf/2509.17380)

LLMs often rely on correlations, not causation. ❤️‍🔥

Our causal analyses show that RLVR-trained LRMs move closer to true causal reasoning — but distilled LRMs and LLMs do not⁉️

🧠 Paper: "Correlation or Causation?"
📘 [arxiv.org/pdf/2509.17380](arxiv.org/pdf/2509.17380)
Hongbo (@hongbo00231523) 's Twitter Profile Photo

⛄️ Excited to share our EMNLP 2025 paper: Direct Value Optimization (DVO) 🌲 💡 Instead of pairwise DPO-style tuning, DVO learns directly from value signals in MCTS search data, enabling efficient RL training for reasoning LLMs ⚡️ 📘 arxiv.org/pdf/2502.13723

⛄️ Excited to share our EMNLP 2025 paper:
Direct Value Optimization (DVO) 🌲

💡 Instead of pairwise DPO-style tuning,
DVO learns directly from value signals in MCTS search data,
enabling efficient RL training for reasoning LLMs ⚡️

📘 arxiv.org/pdf/2502.13723