AITopics | Representation & Reasoning

Collaborating Authors

Representation & Reasoning

... includes all of the major AI methods for (a) representing knowledge about a task or a problem area, and (b) reasoning about a problem.

News Overviews Instructional Materials AI-Alerts Classics

Tell What You Hear From What You See - Video to Audio Generation Through Text

Neural Information Processing SystemsJun-1-2025, 05:42:13 GMT

The content of visual and audio scenes is multi-faceted such that a video stream can be paired with various audio streams and vice-versa. Thereby, in video-to-audio generation task, it is imperative to introduce steering approaches for controlling the generated audio. While Video-to-Audio generation is a well-established generative task, existing methods lack such controllability. In this work, we propose VATT, a multi-modal generative framework that takes a video and an optional text prompt as input, and generates audio and optional textual description (caption) of the audio. Such a framework has two unique advantages: i) Video-to-Audio generation process can be refined and controlled via text which complements the context of the visual information, and ii) The model can suggest what audio to generate for the video by generating audio captions.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned Images

Neural Information Processing SystemsJun-1-2025, 05:33:39 GMT

Recent open-world 3D representation learning methods using Vision-Language Models (VLMs) to align 3D point clouds with image-text information have shown superior 3D zero-shot performance.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

9c19a2aa1d84e04b0bd4bc888792bd1e-AuthorFeedback.pdf

Neural Information Processing SystemsJun-1-2025, 05:32:09 GMT

Thank you for pointing out the issue, we'll add more discussion in the revision.

abduction, logic & formal reasoning, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Fast Projected Newton-like Method for Precision Matrix Estimation under Total Positivity

Neural Information Processing SystemsJun-1-2025, 05:32:05 GMT

The precision matrix in such a distribution is an M-matrix. This problem can be formulated as a signconstrained log-determinant program. Current algorithms are designed using the block coordinate descent method or the proximal point algorithm, which becomes computationally challenging in high-dimensional cases due to the requirement to solve numerous nonnegative quadratic programs or large-scale linear systems. To address this issue, we propose a novel algorithm based on the two-metric projection method, incorporating a carefully designed search direction and variable partitioning scheme. Our algorithm substantially reduces computational complexity, and its theoretical convergence is established. Experimental results on synthetic and real-world datasets demonstrate that our proposed algorithm provides a significant improvement in computational efficiency compared to the state-of-the-art methods.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.48)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Voxel Proposal Network via Multi-Frame Knowledge Distillation for Semantic Scene Completion Kairui Yang

Neural Information Processing SystemsJun-1-2025, 05:31:30 GMT

Semantic scene completion is a difficult task that involves completing the geometry and semantics of a scene from point clouds in a large-scale environment. Many current methods use 3D/2D convolutions or attention mechanisms, but these have limitations in directly constructing geometry and accurately propagating features from related voxels, the completion likely fails while propagating features in a single pass without considering multiple potential pathways. And they are generally only suitable for static scenes and struggle to handle dynamic aspects. This paper introduces Voxel Proposal Network (VPNet) that completes scenes from 3D and Bird's-Eye-View (BEV) perspectives. It includes Confident Voxel Proposal based on voxel-wise coordinates to propose confident voxels with high reliability for completion.

completion, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.68)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(2 more...)

Add feedback

An Information Theoretic Perspective on Conformal Prediction Qualcomm AI Research

Neural Information Processing SystemsJun-1-2025, 05:26:05 GMT

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a userspecified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.

artificial intelligence, machine learning, prediction, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > New York (0.14)
North America > United States > Massachusetts (0.14)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Instructional Material (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming Ali TehraniJamsaz, Arijit Bhattacharjee, Le Chen, Nesreen K. Ahmed Amir Yazdanbakhsh

Neural Information Processing SystemsJun-1-2025, 05:24:16 GMT

Recent advancements in Large Language Models (LLMs) have renewed interest in automatic programming language translation. Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extensions remains underexplored due to challenges such as complex parallel semantics. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions.

large language model, machine learning, programming language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Iowa (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(2 more...)

Add feedback

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

Neural Information Processing SystemsJun-1-2025, 05:22:10 GMT

Text-to-image diffusion models have shown remarkable success in generating personalized subjects based on a few reference images. However, current methods often fail when generating multiple subjects simultaneously, resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effectively decoupling identities from multiple subjects. Our main idea is to utilize segmented subjects generated by a foundation model for segmentation (Segment Anything) for both training and inference, as a form of data augmentation for training and initialization for the generation process. Moreover, we further introduce a new metric to better evaluate the performance of our method on multisubject personalization. Experimental results show that our MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects as shown in Figure 1. Specifically, in human evaluation, MuDI obtains twice the success rate for personalizing multiple subjects without identity mixing over existing baselines and is preferred over 70% against the strongest baseline. Our project page is at https://mudi-t2i.github.io/.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.66)

Industry: Leisure & Entertainment (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

A Graph Theoretic Additive Approximation of Optimal Transport

Nathaniel Lahn, Deepika Mulchandani, Sharath Raghvendra

Neural Information Processing SystemsJun-1-2025, 05:21:39 GMT

Transportation cost is an attractive similarity measure between probability distributions due to its many useful theoretical properties. However, solving optimal transport exactly can be prohibitively expensive. Therefore, there has been significant effort towards the design of scalable approximation algorithms.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Condition Number for Joint Optimization of Cycle-Consistent Networks

Leonidas J. Guibas, Qixing Huang, Zhenxiao Liang

Neural Information Processing SystemsJun-1-2025, 05:19:22 GMT

A recent trend in optimizing maps such as dense correspondences between objects or neural networks between pairs of domains is to optimize them jointly. In this context, there is a natural cycle-consistency constraint, which regularizes composite maps associated with cycles, i.e., they are forced to be identity maps. However, as there is an exponential number of cycles in a graph, how to sample a subset of cycles becomes critical for efficient and effective enforcement of the cycleconsistency constraint. This paper presents an algorithm that select a subset of weighted cycles to minimize a condition number of the induced joint optimization problem. Experimental results on benchmark datasets justify the effectiveness of our approach for optimizing dense correspondences between 3D shapes and neural networks for predicting dense image flows.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: