Bo Liu (Benjamin Liu) (@benjamin_eecs) 's Twitter Profile
Bo Liu (Benjamin Liu)

@benjamin_eecs

Reinforcement Learning PhD @NUSingapore | Undergrad @PKU1898 | Building autonomous decision making systems | Ex intern @MSFTResearch @deepseek_ai

ID: 1493495244771377156

linkhttp://benjamin-eecs.github.io calendar_today15-02-2022 07:59:55

61 Tweet

117 Followers

255 Following

Quentin Gallouédec (@qgallouedec) 's Twitter Profile Photo

Which is the best RL agent on the Hub? Now you can, thanks to the Open RL leaderboard 🏆 ! 🧩 Features: - Automatic evaluation of models on the 🤗 Hub - Compatible with all torch-based RL libraries - Supports 87 environments, with more to come 🔥 huggingface.co/spaces/open-rl…

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Launching DeepSeek-V2: The Cutting-Edge Open-Source MoE Model! 🌟 Highlights: > Places top 3 in AlignBench, surpassing GPT-4 and close to GPT-4-Turbo. > Ranks top-tier in MT-Bench, rivaling LLaMA3-70B and outperforming Mixtral 8x22B. > Specializes in math, code and reasoning.

🚀 Launching DeepSeek-V2: The Cutting-Edge Open-Source MoE Model!

🌟 Highlights:
> Places top 3 in AlignBench, surpassing GPT-4 and close to GPT-4-Turbo.
> Ranks top-tier in MT-Bench, rivaling LLaMA3-70B and outperforming Mixtral 8x22B.
> Specializes in math, code and reasoning.
Bo Liu (Benjamin Liu) (@benjamin_eecs) 's Twitter Profile Photo

Formal math is a great representation space for RL people who love accurate reward signals. It might be the key to general reasoning.

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

DeepSeek-Coder-V2: First Open Source Model Beats GPT4-Turbo in Coding and Math > Excels in coding and math, beating GPT4-Turbo, Claude3-Opus, Gemini-1.5Pro, Codestral. > Supports 338 programming languages and 128K context length. > Fully open-sourced with two sizes: 230B (also

DeepSeek-Coder-V2: First Open Source Model Beats GPT4-Turbo in Coding and Math

> Excels in coding and math, beating GPT4-Turbo, Claude3-Opus, Gemini-1.5Pro, Codestral.
> Supports 338 programming languages and 128K context length.
> Fully open-sourced with two sizes: 230B (also
SSI Inc. (@ssi) 's Twitter Profile Photo

Superintelligence is within reach. Building safe superintelligence (SSI) is the most important technical problem of our​​ time. We've started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence. It’s called Safe Superintelligence

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

Respect to Artifacts of Claude 3.5 Sonnet! DeepSeek-Coder-V2 can do the same cool stuff directly in your browser. Visit coder.deepseek.com -> select "Coder V2" -> input prompt -> click “Run HTML” to see the magic happen! #DeepSeekCoder #Claude

OpenAI (@openai) 's Twitter Profile Photo

We’ve trained a model, CriticGPT, to catch bugs in GPT-4’s code. We’re starting to integrate such models into our RLHF alignment pipeline to help humans supervise AI on difficult tasks: openai.com/index/finding-…

OpenAI (@openai) 's Twitter Profile Photo

We trained advanced language models to generate text that weaker models can easily verify, and found it also made these texts easier for human evaluation. This research could help AI systems be more verifiable and trustworthy in the real world. openai.com/index/prover-v…

Richard Sutton (@richardssutton) 's Twitter Profile Photo

The one-step trap (in AI research) The one-step trap is the common mistake of thinking that all or most of an AI agent’s learned predictions can be one-step ones, with all longer-term predictions generated as needed by iterating the one-step predictions. The most important

OpenAI (@openai) 's Twitter Profile Photo

We’ve developed Rule-Based Rewards (RBRs) to align AI behavior safely without needing extensive human data collection, making our systems safer and more reliable for everyday use. openai.com/index/improvin…

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 dpmd.ai/imo-silver

Jim Fan (@drjimfan) 's Twitter Profile Photo

Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down:

OpenAI (@openai) 's Twitter Profile Photo

We’re sharing the GPT-4o System Card, an end-to-end safety assessment that outlines what we’ve done to track and address safety challenges, including frontier model risks in accordance with our Preparedness Framework. openai.com/index/gpt-4o-s…

nature (@nature) 's Twitter Profile Photo

This week on the Nature Podcast: AIs based on deep learning struggle to keep learning new things, but ‘waking up’ their ‘neurons’ could help overcome this go.nature.com/4dRaF9x

Machine Learning Street Talk (@mlstreettalk) 's Twitter Profile Photo

We just released our interview with the father of Generative AI - Jürgen Schmidhuber! The G, P, and T in "ChatGPT" (GPT means "Generative Pre-Trained Transformer") go back to Juergen's work of 1990-91 when he published what's now called "Unnormalised Linear Transformers,"

OpenAI (@openai) 's Twitter Profile Photo

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…