Mark (@mkovarski) 's Twitter Profile
Mark

@mkovarski

AI

ID: 253845203

calendar_today18-02-2011 02:37:24

37,37K Tweet

2,2K Followers

7,7K Following

Mengmi Zhang (@mengmizhang) 's Twitter Profile Photo

I’m excited to present our #NeurIPS2025 Spotlight work in San Diego! Humans rely on sound when vision misleads—but AI doesn’t. We introduce a neuroscience-inspired model that resolves cross-modal conflicts for sound localization.👂Paper&code: arxiv.org/pdf/2505.11217 #AI #AudioAI

I’m excited to present our #NeurIPS2025 Spotlight work in San Diego!  Humans rely on sound when vision misleads—but AI doesn’t. We introduce a neuroscience-inspired model that resolves cross-modal conflicts for sound localization.👂Paper&code: arxiv.org/pdf/2505.11217 #AI #AudioAI
StepFun (@stepfun_ai) 's Twitter Profile Photo

🚀 Step-Audio-EditX is now open source!! ✨ Zero-Shot TTS with high timbre similarity ✨ Iterative editing of dozens of audio emotion and speaking style ✨ Fine-grained control over paralinguistic features Whether for audio editing, interactive design, or personalized scenarios,

Sapir Harary (@sapirharary) 's Twitter Profile Photo

🚨 New paper alert! We’re thrilled to share our new preprint “PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise” ✨ LLMs generate text one token at a time, but factuality checks still wait for a full sentence. We extend NLI to text prefixes, enabling the

🚨 New paper alert!

We’re thrilled to share our new preprint “PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise” ✨

LLMs generate text one token at a time, but factuality checks still wait for a full sentence.

We extend NLI to text prefixes, enabling the
zeng zhiyuan (@zhiyuan_nlper) 's Twitter Profile Photo

🚀 Thrilled to share our new work, "RLoop: A Self-Improving Framework for Reinforcement Learning"! arxiv.org/pdf/2511.04285

🚀 Thrilled to share our new work, "RLoop: A Self-Improving Framework for Reinforcement Learning"!
arxiv.org/pdf/2511.04285
Amanda Bertsch (@abertsch72) 's Twitter Profile Photo

Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

Can LLMs accurately aggregate information over long, information-dense texts? Not yet…

We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
Vals AI (@_valsai) 's Twitter Profile Photo

Kimi this Kimi that. Just Kimi the bottom line. The Kimi.ai K2 Thinking model has taken 2nd place on Vals Index for open source. It beat out Z.ai’s GLM 4.5, though GLM 4.6 is still holding strong and doesn’t look to be budging anytime soon Here’s what we found in our

Kimi this Kimi that. Just Kimi the bottom line.

The <a href="/Kimi_Moonshot/">Kimi.ai</a> K2 Thinking model has taken 2nd place on Vals Index for open source. It beat out <a href="/Zai_org/">Z.ai</a>’s GLM 4.5, though GLM 4.6 is still holding strong and doesn’t look to be budging anytime soon

Here’s what we found in our
Ellis Brown (@_ellisbrown) 's Twitter Profile Photo

MLLMs are great at understanding videos, but struggle with spatial reasoning—like estimating distances or tracking objects across time. the bottleneck? getting precise 3D spatial annotations on real videos is expensive and error-prone. introducing SIMS-V 🤖 [1/n]

Microsoft Developer (@msdev) 's Twitter Profile Photo

Last week, @GitHub released the 2025 Octoverse report. Over 180 million developers contributed to more than a billion projects, and for the first time, TypeScript surpassed Python and JavaScript. Activity on GitHub is at a record high, with more contributors, repos, and

Last week, @GitHub released the 2025 Octoverse report. 

Over 180 million developers contributed to more than a billion projects, and for the first time, TypeScript surpassed Python and JavaScript. 

Activity on GitHub is at a record high, with more contributors, repos, and
Raj Dabre (@prajdabre1) 's Twitter Profile Photo

Here's your weekend challenge: Implement speculative decoding. Step 1: Read the following paper and/or blog: arxiv.org/abs/2211.17192 galacodes.hashnode.dev/speculative-de… (cc Jay Gala | Building Indilingo) Step 2: Choose a family of models which come in various sizes. My choice would be the Gemma3 or Qwen

Here's your weekend challenge: Implement speculative decoding.

Step 1: Read the following paper and/or blog: arxiv.org/abs/2211.17192 galacodes.hashnode.dev/speculative-de… (cc <a href="/jaygala223/">Jay Gala | Building Indilingo</a>)
Step 2: Choose a family of models which come in various sizes. My choice would be the Gemma3 or Qwen
AgiBot (@agibot_zhiyuan) 's Twitter Profile Photo

🚀 Meet AgiBot A2—your new interactive star for business & fun! 69kg, 169cm, and packed with tech that crushes service + performance. ✨ 96% noise-resistant recognition (hears you even in crowds!) ✨ 2h runtime + quick battery swap (non-stop action!) ✨ 40+ DOF for human-like

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

This paper creates a realistic benchmark to test if AI agents can truly perform end to end LLM research. Agents need about 6.5x more run time here than older tests, which shows the tasks are harder. InnovatorBench packs 20 tasks across 6 areas, covering data work, loss design,

This paper creates a realistic benchmark to test if AI agents can truly perform end to end LLM research.

Agents need about 6.5x more run time here than older tests, which shows the tasks are harder. 

InnovatorBench packs 20 tasks across 6 areas, covering data work, loss design,
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

The paper introduces HaluMem to pinpoint where AI memory systems hallucinate and shows errors grow during extraction and updates. It supports 1M+ token contexts. HaluMem tests memory step by step instead of only final answers. It measures 3 stages, extraction, updating, and

The paper introduces HaluMem to pinpoint where AI memory systems hallucinate and shows errors grow during extraction and updates.

It supports 1M+ token contexts.

HaluMem tests memory step by step instead of only final answers.

It measures 3 stages, extraction, updating, and
Li Zexin (@xh_lee23) 's Twitter Profile Photo

I was asked whether Chinese robots can play Chinese Kungfu, just like how Chinese people are always asked the same questions. Answers are: Yes, they can. But, I can’t 🤣

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

New AI at Meta paper explains when a smaller, curated dataset beats using everything. Standard training wastes effort because many examples are redundant or wrong. They formalize a label generator, a pruning oracle, and a learner. From this, they derive exact error laws and

New <a href="/AIatMeta/">AI at Meta</a> paper explains when a smaller, curated dataset beats using everything.

Standard training wastes effort because many examples are redundant or wrong.

They formalize a label generator, a pruning oracle, and a learner.

From this, they derive exact error laws and
Alex Prompter (@alex_prompter) 's Twitter Profile Photo

🚨 Google just proposed training AI in space. Their new paper, “Towards a Future Space-Based, Highly Scalable AI Infrastructure System,” explores building orbital ML data centers powered directly by the Sun fleets of satellites running TPUs, networked by laser links. Why?

🚨 Google just proposed training AI in space.

Their new paper, “Towards a Future Space-Based, Highly Scalable AI Infrastructure System,” explores building orbital ML data centers powered directly by the Sun fleets of satellites running TPUs, networked by laser links.

Why?