AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Neural Information Processing SystemsNov-15-2025, 05:42:54 GMT

Full-Atom Protein Pocket Design via Iterative Refinement

The design and optimization of functional proteins that bind specific ligand molecules is paramount in therapeutics and bio-engineering.

atom, residue, residue type, (14 more...)

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre: Workflow (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)

arXiv.org Artificial IntelligenceOct-14-2025

BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations

Yuan, Weiduo, Li, Jerry, Yue, Justin, Shah, Divyank, Karydis, Konstantinos, Qiu, Hang

Accurate LiDAR-camera calibration is fundamental to fusing multi-modal perception in autonomous driving and robotic systems. Traditional calibration methods require extensive data collection in controlled environments and cannot compensate for the transformation changes during the vehicle/robot movement. In this paper, we propose the first model that uses bird's-eye view (BEV) features to perform LiDAR camera calibration from raw data, termed BEVCALIB. To achieve this, we extract camera BEV features and LiDAR BEV features separately and fuse them into a shared BEV feature space. To fully utilize the geometric information from the BEV feature, we introduce a novel feature selector to filter the most important features in the transformation decoder, which reduces memory consumption and enables efficient training. Extensive evaluations on KITTI, NuScenes, and our own dataset demonstrate that BEVCALIB establishes a new state of the art. Under various noise conditions, BEVCALIB outperforms the best baseline in the literature by an average of (47.08%, 82.32%) on KITTI dataset, and (78.17%, 68.29%) on NuScenes dataset, in terms of (translation, rotation), respectively. In the open-source domain, it improves the best reproducible baseline by one order of magnitude. Our code and demo results are available at https://cisl.ucr.edu/BEVCalib.

artificial intelligence, calibration, machine learning, (18 more...)

2506.02587

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry:

Transportation (0.35)
Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-8-2025, 10:47:00 GMT

Full-Atom Protein Pocket Design via Iterative Refinement

The design and optimization of functional proteins that bind specific ligand molecules is paramount in therapeutics and bio-engineering.

atom, residue, residue type, (14 more...)

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre: Workflow (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Neural Information Processing SystemsOct-2-2025, 21:36:06 GMT

4e0928de075538c593fbdabb0c5ef2c3-AuthorFeedback.pdf

artificial intelligence, machine learning, proposal, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

arXiv.org Artificial IntelligenceJul-16-2025

PhysiX: A Foundation Model for Physics Simulations

Nguyen, Tung, Koneru, Arsh, Li, Shufan, Grover, Aditya

Foundation models have achieved remarkable success across video, image, and language domains. By scaling up the number of parameters and training datasets, these models acquire generalizable world knowledge and often surpass task-specific approaches. However, such progress has yet to extend to the domain of physics simulation. A primary bottleneck is data scarcity: while millions of images, videos, and textual resources are readily available on the internet, the largest physics simulation datasets contain only tens of thousands of samples. This data limitation hinders the use of large models, as overfitting becomes a major concern. As a result, physics applications typically rely on small models, which struggle with long-range prediction due to limited context understanding. Additionally, unlike images, videos, or text-which typically exhibit fixed granularity-physics datasets often vary drastically in scale, amplifying the challenges of scaling up multitask training. We introduce PhysiX, the first large-scale foundation model for physics simulation. PhysiX is a 4.5B parameter autoregressive generative model. It uses a discrete tokenizer to encode physical processes at different scales into a sequence of discrete tokens, and employs an autoregressive next-token prediction objective to model such processes in the token space. To mitigate the rounding error in the discretization process, PhysiX incorporates a specialized refinement module. Through extensive experiments, we show that PhysiX effectively addresses the data bottleneck, outperforming task-specific baselines under comparable settings as well as the previous absolute state-of-the-art approaches on The Well benchmark. Our results indicate that knowledge learned from natural videos can be successfully transferred to physics simulation, and that joint training across diverse simulation tasks enables synergistic learning.

artificial intelligence, machine learning, natural language, (15 more...)

2506.17774

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJan-22-2025, 15:55:25 GMT

Review for NeurIPS paper: Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

Weaknesses: - The paper is missing a literature review / related work section. While previous works are cited, and authors compare their results w.r.t. Previous works in the literature (many of which are cited in this paper) have already addressed the problems that this paper aims at solving, namely 1) leveraging information from past frames in the video to make predictions in the current frame, and 2) proposed refinement modules for VOS. Although many of these works are indeed cited, authors do not explicitly mention the relationship between those works and their method, in terms of how they addressed the issues that their approach is trying to solve, and how do their contributions compare to the components of existing approaches designed specifically to address these problems. Although this paper's results are better than those reported in previous works, the scientific contributions are ultimately what matters to the community to build on top of in order to make consistent and grounded progress.

adaptive feature bank, feature bank and uncertain-region refinement, video object segmentation, (6 more...)

Technology: Information Technology > Artificial Intelligence > Vision (0.85)

arXiv.org Artificial IntelligenceDec-23-2024

C2F-TP: A Coarse-to-Fine Denoising Framework for Uncertainty-Aware Trajectory Prediction

Wang, Zichen, Miao, Hao, Wang, Senzhang, Wang, Renzhi, Wang, Jianxin, Zhang, Jian

Accurately predicting the trajectory of vehicles is critically important for ensuring safety and reliability in autonomous driving. Although considerable research efforts have been made recently, the inherent trajectory uncertainty caused by various factors including the dynamic driving intends and the diverse driving scenarios still poses significant challenges to accurate trajectory prediction. To address this issue, we propose C2F-TP, a coarse-to-fine denoising framework for uncertainty-aware vehicle trajectory prediction. C2F-TP features an innovative two-stage coarse-to-fine prediction process. Specifically, in the spatial-temporal interaction stage, we propose a spatial-temporal interaction module to capture the inter-vehicle interactions and learn a multimodal trajectory distribution, from which a certain number of noisy trajectories are sampled. Next, in the trajectory refinement stage, we design a conditional denoising model to reduce the uncertainty of the sampled trajectories through a step-wise denoising operation. Extensive experiments are conducted on two real datasets NGSIM and highD that are widely adopted in trajectory prediction. The result demonstrates the effectiveness of our proposal.

artificial intelligence, machine learning, trajectory, (17 more...)

2412.13231

Country:

Asia > China (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (0.67)
Automobiles & Trucks (0.67)
Transportation > Ground > Road (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

arXiv.org Artificial IntelligenceDec-13-2024

A dual contrastive framework

Sun, Yuan, Zhang, Zhao, Ortiz, Jorge

In current multimodal tasks, models typically freeze the encoder and decoder while adapting intermediate layers to task-specific goals, such as region captioning. Region-level visual understanding presents significant challenges for large-scale vision-language models. While limited spatial awareness is a known issue, coarse-grained pretraining, in particular, exacerbates the difficulty of optimizing latent representations for effective encoder-decoder alignment. We propose AlignCap, a framework designed to enhance region-level understanding through fine-grained alignment of latent spaces. Our approach introduces a novel latent feature refinement module that enhances conditioned latent space representations to improve region-level captioning performance. We also propose an innovative alignment strategy, the semantic space alignment module, which boosts the quality of multimodal representations. Additionally, we incorporate contrastive learning in a novel manner within both modules to further enhance region-level captioning performance. To address spatial limitations, we employ a General Object Detection (GOD) method as a data preprocessing pipeline that enhances spatial reasoning at the regional level. Extensive experiments demonstrate that our approach significantly improves region-level captioning performance across various tasks

artificial intelligence, machine learning, natural language, (19 more...)

2412.10348

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-25-2024

A Robust and Efficient Visual-Inertial Initialization with Probabilistic Normal Epipolar Constraint

Mu, Changshi, Feng, Daquan, Zheng, Qi, Zhuang, Yuan

Accurate and robust initialization is essential for Visual-Inertial Odometry (VIO), as poor initialization can severely degrade pose accuracy. During initialization, it is crucial to estimate parameters such as accelerometer bias, gyroscope bias, initial velocity, and gravity, etc. The IMU sensor requires precise estimation of gyroscope bias because gyroscope bias affects rotation, velocity and position. Most existing VIO initialization methods adopt Structure from Motion (SfM) to solve for gyroscope bias. However, SfM is not stable and efficient enough in fast motion or degenerate scenes. To overcome these limitations, we extended the rotation-translation-decoupling framework by adding new uncertainty parameters and optimization modules. First, we adopt a gyroscope bias optimizer that incorporates probabilistic normal epipolar constraints. Second, we fuse IMU and visual measurements to solve for velocity, gravity, and scale efficiently. Finally, we design an additional refinement module that effectively diminishes gravity and scale errors. Extensive initialization tests on the EuRoC dataset show that our method reduces the gyroscope bias and rotation estimation error by an average of 16% and 4% respectively. It also significantly reduces the gravity error, with an average reduction of 29%.

artificial intelligence, estimation, machine learning, (15 more...)

2410.19473

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.46)