AITopics

cb3213ada48302953cb0f166464ab356-Supplemental.pdf

Neural Information Processing SystemsMar-21-2025, 20:47:02 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.70)

Add feedback

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Columbia University Google Cornell University

Neural Information Processing SystemsMar-21-2025, 20:46:58 GMT

We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval. Furthermore, we study a modality-agnostic, single-backbone Transformer by sharing weights among the three modalities. We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks. Especially, VATT's vision Transformer achieves the top-1 accuracy of 82.1% on Kinetics-400, 83.6% on Kinetics-600, 72.7% on Kinetics-700, and 41.1% on Moments in Time, new records while avoiding supervised pre-training. Transferring to image classification leads to 78.7% top-1 accuracy on ImageNet compared to 64.7% by training the same Transformer from scratch, showing the generalizability of our model despite the domain gap between videos and images. VATT's audio Transformer also sets a new record on waveform-based audio event recognition by achieving the mAP of 39.4% on AudioSet without any supervised pre-training. VATT's source code is publicly available.

artificial intelligence, machine learning, transformer, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.89)

Add feedback

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Neural Information Processing SystemsMar-21-2025, 20:46:52 GMT

Recent developments of vision large language models (LLMs) have seen remarkable progress, yet still encounter challenges towards multimodal generalists, such as coarse-grained instance-level understanding, lack of unified support for both images and videos, and insufficient coverage across various vision tasks.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

50729453d56ecf6a8b7be78998776472-Paper-Conference.pdf

Neural Information Processing SystemsMar-21-2025, 20:46:40 GMT

artificial intelligence, machine learning, reconstruction, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.67)

Industry:

Information Technology (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Graph Neural Flows for Unveiling Systemic Interactions Among Irregularly Sampled Time Series

Neural Information Processing SystemsMar-21-2025, 20:46:32 GMT

Interacting systems are prevalent in nature. It is challenging to accurately predict the dynamics of the system if its constituent components are analyzed independently. We develop a graph-based model that unveils the systemic interactions of time series observed at irregular time points, by using a directed acyclic graph to model the conditional dependencies (a form of causal notation) of the system components and learning this graph in tandem with a continuous-time model that parameterizes the solution curves of ordinary differential equations (ODEs). Our technique, a graph neural flow, leads to substantial enhancements over non-graph-based methods, as well as graph-based methods without the modeling of conditional dependencies. We validate our approach on several tasks, including time series classification and forecasting, to demonstrate its efficacy.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.92)
Information Technology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

cb2653f548f8709598e8b5156738cc51-Paper.pdf

Neural Information Processing SystemsMar-21-2025, 20:46:21 GMT

artificial intelligence, deep learning, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Appendix: Supplementary material A Detailed Derivation of Formula 4

Neural Information Processing SystemsMar-21-2025, 20:46:13 GMT

We state the PAC-Bayes theorem (Section 4) which bounds the generalization error of any posterior distribution Q on parameters that can be reached using the training set given a prior distribution P on parameters that should be chosen in advance and before observing the training set. Let Q and P be k-dimensional Gaussian distributions (Jiang et al., 2020), the KL-term can be simply written as Z KL(N(µ Nevertheless, we have contributed theoretically to better capture the true posterior by (1) relaxing an i.i.d. We recognize that our hypothetical covariance only considers the linear correlation between weights of neurons (filters). There is a gap between our hypothetical covariance and true covariance. But we also remark that, an estimation of the "true" posterior from data is also problematic, (e.g., use sharpness-like methods (Keskar et al., 2016) to get samplings parameters and estimate the covariance), may easily lead to further question on the accuracy of estimation and intractable derivation in theory.

artificial intelligence, convolutional layer, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

f48c04ffab49ff0e5d1176244fdfb65c-Paper.pdf

Neural Information Processing SystemsMar-21-2025, 20:46:07 GMT

This paper studies the novel concept of weight correlation in deep neural networks and discusses its impact on the networks' generalisation ability. For fully-connected layers, the weight correlation is defined as the average cosine similarity between weight vectors of neurons, and for convolutional layers, the weight correlation is defined as the cosine similarity between filter matrices. Theoretically, we show that, weight correlation can, and should, be incorporated into the PAC Bayesian framework for the generalisation of neural networks, and the resulting generalisation bound is monotonic with respect to the weight correlation. We formulate a new complexity measure, which lifts the PAC Bayes measure with weight correlation, and experimentally confirm that it is able to rank the generalisation errors of a set of networks more precisely than existing measures. More importantly, we develop a new regulariser for training, and provide extensive experiments that show that the generalisation error can be greatly reduced with our novel approach.

artificial intelligence, correlation, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

VPGTrans: Transfer Visual Prompt Generator across LLMs 2

Neural Information Processing SystemsMar-21-2025, 20:45:50 GMT

Since developing a new multimodal LLM (MLLM) by pre-training on a tremendous amount of image-text pairs from scratch is exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: