AITopics | Oceania

Collaborating Authors

Oceania

Coarse-to-fine Animal Pose and Shape Estimation: Supplementary Material

Neural Information Processing SystemsJun-2-2025, 12:02:20 GMT

We conduct further ablation studies for our approach in this supplementary material, including comparison with test-time optimization and sensitivity analysis of the refinement stage. Additional qualitative results are also provided. We compare our coarse-to-fine approach with the testtime optimization approach. As has been done in our coarse-to-fine pipeline, we also use the output from our coarse estimation stage as an initialization. Instead of apply the mesh refinement GCN, we further optimize the SMAL parameters based on the keypoints and silhouettes for 10, 50, 100, 200 iterations, respectively.

artificial intelligence, keypoint and silhouette, optimization problem, (12 more...)

Neural Information Processing Systems

Country: Oceania > Australia (0.15)

Technology:

Information Technology > Artificial Intelligence > Vision (0.41)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

Add feedback

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos 2

Neural Information Processing SystemsJun-2-2025, 11:31:40 GMT

We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language models, and augmented by the Segment Anything Model, VideoLISA generates temporally consistent segmentation masks in videos based on language instructions. Existing image-based methods, such as LISA, struggle with video tasks due to the additional temporal dimension, which requires temporal dynamic understanding and consistent segmentation across frames. VideoLISA addresses these challenges by integrating a Sparse Dense Sampling strategy into the video-LLM, which balances temporal context and spatial detail within computational constraints. Additionally, we propose a One-Token-Seg-All approach using a specially designed token, enabling the model to segment and track objects across multiple frames. Extensive evaluations on diverse benchmarks, including our newly introduced ReasonVOS benchmark, demonstrate VideoLISA's superior performance in video object segmentation tasks involving complex reasoning, temporal understanding, and object tracking. While optimized for videos, VideoLISA also shows promising generalization to image segmentation, revealing its potential as a unified foundation model for language-instructed object segmentation.

large language model, machine learning, segmentation, (16 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
Europe > Netherlands (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment (0.67)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Supplementary Material: Beltrami Flow and Neural Diffusion on Graphs

Neural Information Processing SystemsJun-2-2025, 11:26:20 GMT

Datasets The statistics for the largest connected components of the experimental datasets are given in Table 1. Replication of results and hyper-parameters Code to regenerate our experimental results together with hyperparameters for all datasets is provided. Numerical ODE solver We use the library torchdiffeq [2] to discretise the continuous time evolution and learn the system dynamics. The Pontryagin maximum / adjoint method is used to replace backpropagation for all datasets, with the exception of Cora and Citeseer due to the high memory complexity of applying backpropagation directly through the computational graph of the numerical integrator. Decoupling the terminal integration time between inference and training At training time we use a fixed terminal time T that is tuned as a hyperparameter.

artificial intelligence, machine learning, polyakov action, (14 more...)

Neural Information Processing Systems

Country: Oceania > Australia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)

Add feedback

From Stochastic Mixability to Fast Rates

Nishant A. Mehta, Robert C. Williamson

Neural Information Processing SystemsJun-2-2025, 09:57:06 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss l. In the parametric setting, depending upon (l, F, P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of l, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss l (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (l, F, P), and in so doing provides new insight into the fast-rates phenomenon.

artificial intelligence, machine learning, stochastic mixability, (17 more...)

Neural Information Processing Systems

Country: Oceania > Australia (0.14)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models

Neural Information Processing SystemsJun-2-2025, 09:37:47 GMT

Cross-modal (image-to-text and text-to-image) retrieval is an established task used in evaluation benchmarks to test the performance of vision-language models (VLMs).

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Oceania > Australia (0.14)

Industry: Information Technology (0.70)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

BTS: Building Timeseries Dataset: Empowering Large-Scale Building Analytics

Neural Information Processing SystemsJun-2-2025, 07:06:48 GMT

Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive realworld datasets on multiple building operations. In this paper, we introduce the Building TimeSeries (BTS) dataset. Our dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique classes. Moreover, the metadata is standardized using the Brick schema. To demonstrate the utility of this dataset, we performed benchmarks on the multi-label timeseries classification task. This task represent an essential initial step in addressing challenges related to interoperability in building analytics.

data mining, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Oceania > Australia > New South Wales (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Renewable (1.00)
Construction & Engineering (1.00)
Government > Regional Government (0.93)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.47)

Add feedback

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models, Dong Gong 1

Neural Information Processing SystemsJun-2-2025, 05:47:29 GMT

Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned. Owing to their powerful generalizability, pretrained vision-language models such as Contrastive Language-Image Pre-training (CLIP) [1] have lately gained traction as practical CL candidates. However, the domain mismatch between the pre-training and the downstream CL tasks often calls for finetuning of the CLIP on the latter.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unlocking the Potential of Global Human Expertise Elliot Meyerson 1 Olivier Francon 1 Darren Sargent

Neural Information Processing SystemsJun-2-2025, 02:21:52 GMT

Solving societal problems on a global scale requires the collection and processing of ideas and methods from diverse sets of international experts. As the number and diversity of human experts increase, so does the likelihood that elements in this collective knowledge can be combined and refined to discover novel and better solutions. However, it is difficult to identify, combine, and refine complementary information in an increasingly large and diverse knowledge base. This paper argues that artificial intelligence (AI) can play a crucial role in this process. An evolutionary AI framework, termed RHEA, fills this role by distilling knowledge from diverse models created by human experts into equivalent neural networks, which are then recombined and refined in a population-based search. The framework was implemented in a formal synthetic domain, demonstrating that it is transparent and systematic. It was then applied to the results of the XPRIZE Pandemic Response Challenge, in which over 100 teams of experts across 23 countries submitted models based on diverse methodologies to predict COVID-19 cases and suggest non-pharmaceutical intervention policies for 235 nations, states, and regions across the globe. Building upon this expert knowledge, by recombining and refining the 169 resulting policy suggestion models, RHEA discovered a broader and more effective set of policies than either AI or human experts alone, as evaluated based on real-world data. The results thus suggest that AI can play a crucial role in realizing the potential of human expertise in global problem-solving.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

Neural Information Processing Systems

Country:

South America (1.00)
Oceania (1.00)
Asia > Middle East (1.00)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

Neural Information Processing SystemsJun-2-2025, 01:16:08 GMT

In this study, we aim to reduce generation latency for Named Entity Recognition (NER) with Large Language Models (LLMs). The main cause of high latency in LLMs is the sequential decoding process, which autoregressively generates all labels and mentions for NER, significantly increase the sequence length. To this end, we introduce Parallel Decoding in LLM for NER (PaDeLLM-NER), a approach that integrates seamlessly into existing generative model frameworks without necessitating additional modules or architectural modifications. PaDeLLM-NER accelerates decoding by simultaneously generating all mentions at once, i.e., a label-mention pair per sequence.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
Europe (0.68)
Oceania > Australia (0.28)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.66)

Industry:

Information Technology (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FIDE: Frequency-Inflated Conditional Diffusion Model for Extreme-Aware Time Series Generation

Neural Information Processing SystemsJun-1-2025, 22:02:20 GMT

Time series generation is a crucial aspect of data analysis, playing a pivotal role in learning the temporal patterns and their underlying dynamics across diverse fields. Conventional time series generation methods often struggle to capture extreme values adequately, diminishing their value in critical applications such as scenario planning and risk management for healthcare, finance, climate change adaptation, and beyond. In this paper, we introduce a conditional diffusion model called FIDE to address the challenge of preserving the distribution of extreme values in generative modeling for time series. FIDE employs a novel high-frequency inflation strategy in the frequency domain, preventing premature fade-out of the extreme values. It also extends the traditional diffusion-based model, enabling the generation of samples conditioned on the block maxima, thereby enhancing the model's capacity to capture extreme events. Additionally, the FIDE framework incorporates the Generalized Extreme Value (GEV) distribution within its generative modeling framework, ensuring fidelity to both block maxima and overall data distribution.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: