AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile
AI Bites | YouTube Channel

@ai_bites

AI Happenings, papers and ideas tweet. Opensource online AI education for the world. Former @UniofOxford @Oxford_VGG

ID: 2728547289

linkhttps://www.youtube.com/c/AIBites calendar_today29-07-2014 22:36:06

2,2K Tweet

1,1K Followers

706 Following

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Here is a novel Gaussian-based representation, "DualGS" for volumetric videos, achieving robust human performance tracking and high-fidelity rendering. Paper: Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Link: arxiv.org/abs/2409.08353 Project:

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Existing image-to-3D methods could not work well for amateur character drawings in terms of appearance and geometry. This work proposes a novel system, DrawingSpinUp, to produce plausible 3D animations and breathe life into character drawings, allowing them to freely spin up,

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

B-KinD-Multi discovers keypoints without the need for bounding box annotations or manual keypoint, and works on a range of organisms and any number of agents. free-viewpoint portraits. B-KinD-multi leverages pre-trained video segmentation models to guide keypoint discovery in

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

CtRNet-X is a novel framework capable of estimating the robot pose with partially visible robot manipulators. Our approach leverages the Vision-Language Models for fine-grained robot components detection, and integrates it into a keypoint-based pose estimation network, which

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Paper: Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation Link: arxiv.org/abs/2409.09921 Project: sites.google.com/illinois.edu/c… #AI #AI美女 #LLMs #deeplearning #machinelearning #3D #Robotics #teleop

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Here is a a novel coarse-to-fine continuous pose diffusion method to enhance the precision of pick-and-place operations within robotic manipulation tasks. Leveraging the capabilities of diffusion networks, we facilitate the accurate perception of object poses. This accurate

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Masked Conditional Diffusion (MacDiff) - a unified framework for human skeleton modeling, which learns powerful representations for both discriminative and generative downstream tasks. Paper: MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion Link:

Masked Conditional Diffusion (MacDiff) - a unified framework for human skeleton modeling, which learns powerful representations for both discriminative and generative downstream tasks.

Paper: MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Link:
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Recent breakthroughs in deep learning have made it possible to estimate steering angles directly from raw camera inputs. However, the limited available navigation data can hinder optimal feature learning, impacting the system's performance in complex driving scenarios. This

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

This work studies the problem of generating intermediate images from image pairs with large motion while maintaining semantic consistency. Existing methods either limit to small motion or focus on topologically similar objects, leading to artifacts and inconsistency in the

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Checkout our latest video walking through the o1-preview model from OpenAI The video talks about reasoning, Chain of Thought(CoT), Examples of o1-preview's response, o1-mini model, evaluations and manymore. Link: youtu.be/aDIaC6LrAmQ?si… #AI #Openaio1 #GenerativeAI #LLMs

Checkout our latest video walking through the o1-preview model from <a href="/OpenAI/">OpenAI</a> 

The video talks about reasoning, Chain of Thought(CoT), Examples of o1-preview's response, o1-mini model, evaluations and manymore. 

Link: youtu.be/aDIaC6LrAmQ?si…

#AI #Openaio1 #GenerativeAI #LLMs
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Building on top of the recent work RoadRunner, this work addresses the challenge of long-range (±100 m) traversability estimation. RoadRunner (M&M) is an end-to-end learning-based framework that directly predicts the traversability and elevation maps at multiple ranges (±50 m,

Building on top of the recent work RoadRunner, this work addresses the challenge of long-range (±100 m) traversability estimation. RoadRunner (M&amp;M) is an end-to-end learning-based framework that directly predicts the traversability and elevation maps at multiple ranges (±50 m,
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Phidias supports reference-augmented image-to-3D, text-to-3D, and 3D-to-3D generation, where the 3D reference can be obtained via retrieval or specified by users. Paper: Phidias : A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. This paper shows that the perceived inefficiency was caused by a flaw in the inference pipeline

Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task.
This paper shows that the perceived inefficiency was caused by a flaw in the inference pipeline
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in cognitive neuroscience and computer vision. MinD-3D is a novel and effective three-stage framework specifically

Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in cognitive neuroscience and computer vision.
MinD-3D is a novel and effective three-stage framework specifically
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

3D Gaussian splats (3DGS) lack spatial autocorrelation of splat features, which leads to suboptimal performance in sparse reconstruction settings. This work introduces neural fields to regularize 3D gaussian splats for sparse 3D and 4D reconstruction. It proposes a novel

3D Gaussian splats (3DGS) lack spatial autocorrelation of splat features, which leads to suboptimal performance in sparse reconstruction settings.
This work introduces neural fields to regularize 3D gaussian splats for sparse 3D and 4D reconstruction. It proposes a novel
AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

AMEGO - a representation of long videos. AMEGO breaks the video into Hand-Object Interaction (HOI) tracklets, and location segments. This forms a semantic-free memory of the video. AMEGO is built in an online fashion, eliminating the need to reprocess past frames. Paper: AMEGO:

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. The approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation Paper: Gradient-Driven 3D

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional knowledge obtained through an improved motion retrieval process. Paper: MoRAG -

AI Bites | YouTube Channel (@ai_bites) 's Twitter Profile Photo

Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. DynaMo is a new in-domain, self-supervised method