AITopics

2502.20132

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States (0.05)
Europe > France (0.04)
(15 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.93)
Media > Television (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zhou, Youran, Bouadjenek, Mohamed Reda, Aryal, Sunil

Developing robust methods to handle missing data in real-world applications effectively

arXiv.org Artificial IntelligenceFeb-28-2025

Missing data is a pervasive challenge spanning diverse data types, including tabular, sensor data, time-series, images and so on. Its origins are multifaceted, resulting in various missing mechanisms. Prior research in this field has predominantly revolved around the assumption of the Missing Completely At Random (MCAR) mechanism. However, Missing At Random (MAR) and Missing Not At Random (MNAR) mechanisms, though equally prevalent, have often remained underexplored despite their significant influence. This PhD project presents a comprehensive research agenda designed to investigate the implications of diverse missing data mechanisms. The principal aim is to devise robust methodologies capable of effectively handling missing data while accommodating the unique characteristics of MCAR, MAR, and MNAR mechanisms. By addressing these gaps, this research contributes to an enriched understanding of the challenges posed by missing data across various industries and data modalities. It seeks to provide practical solutions that enable the effective management of missing data, empowering researchers and practitioners to leverage incomplete datasets confidently.

data type, dataset, mechanism, (16 more...)

2502.19635

Country: Oceania > Australia > Victoria (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceFeb-28-2025

Privacy-Aware Joint DNN Model Deployment and Partition Optimization for Delay-Efficient Collaborative Edge Inference

Cheng, Zhipeng, Xia, Xiaoyu, Wang, Hong, Liwang, Minghui, Chen, Ning, Fan, Xuwei, Wang, Xianbin

Edge inference (EI) is a key solution to address the growing challenges of delayed response times, limited scalability, and privacy concerns in cloud-based Deep Neural Network (DNN) inference. However, deploying DNN models on resource-constrained edge devices faces more severe challenges, such as model storage limitations, dynamic service requests, and privacy risks. This paper proposes a novel framework for privacy-aware joint DNN model deployment and partition optimization to minimize long-term average inference delay under resource and privacy constraints. Specifically, the problem is formulated as a complex optimization problem considering model deployment, user-server association, and model partition strategies. To handle the NP-hardness and future uncertainties, a Lyapunov-based approach is introduced to transform the long-term optimization into a single-time-slot problem, ensuring system performance. Additionally, a coalition formation game model is proposed for edge server association, and a greedy-based algorithm is developed for model deployment within each coalition to efficiently solve the problem. Extensive simulations show that the proposed algorithms effectively reduce inference delay while satisfying privacy constraints, outperforming baseline approaches in various scenarios.

algorithm, edge server, inference delay, (10 more...)

2502.16091

Country:

North America > United States (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Communications (1.00)
(3 more...)

arXiv.org Machine LearningFeb-28-2025

MCNet: Monotonic Calibration Networks for Expressive Uncertainty Calibration in Online Advertising

Dai, Quanyu, Xiao, Jiaren, Du, Zhaocheng, Zhu, Jieming, Luo, Chengxiao, Wu, Xiao-Ming, Dong, Zhenhua

In online advertising, uncertainty calibration aims to adjust a ranking model's probability predictions to better approximate the true likelihood of an event, e.g., a click or a conversion. However, existing calibration approaches may lack the ability to effectively model complex nonlinear relations, consider context features, and achieve balanced performance across different data subsets. To tackle these challenges, we introduce a novel model called Monotonic Calibration Networks, featuring three key designs: a monotonic calibration function (MCF), an order-preserving regularizer, and a field-balance regularizer. The nonlinear MCF is capable of naturally modeling and universally approximating the intricate relations between uncalibrated predictions and the posterior probabilities, thus being much more expressive than existing methods. MCF can also integrate context features using a flexible model architecture, thereby achieving context awareness. The order-preserving and field-balance regularizers promote the monotonic relationship between adjacent bins and the balanced calibration performance on data subsets, respectively. Experimental results on both public and industrial datasets demonstrate the superior performance of our method in generating well-calibrated probability predictions.

calibration function, neural network, probability, (15 more...)

doi: 10.1145/3696410.3714802

2503.00334

Country:

Oceania > Australia > New South Wales > Sydney (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Hong Kong (0.04)
(5 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Marketing (1.00)
Information Technology > Services (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)
Information Technology > Data Science > Data Mining (0.94)

Guzman-Olivares, Daniel, Schmidt, Philipp, Golebiowski, Jacek, Bekasov, Artur

Clustering Context in Off-Policy Evaluation

arXiv.org Machine LearningFeb-28-2025

Off-policy evaluation can leverage logged data to estimate the effectiveness of new policies in e-commerce, search engines, media streaming services, or automatic diagnostic tools in healthcare. However, the performance of baseline off-policy estimators like IPS deteriorates when the logging policy significantly differs from the evaluation policy. Recent work proposes sharing information across similar actions to mitigate this problem. In this work, we propose an alternative estimator that shares information across similar contexts using clustering. We study the theoretical properties of the proposed estimator, characterizing its bias and variance under different conditions. We also compare the performance of the proposed estimator and existing approaches in various synthetic problems, as well as a real-world recommendation dataset. Our experimental results confirm that clustering contexts improves estimation accuracy, especially in deficient information settings.

assumption 3, estimator, off-policy evaluation, (13 more...)

2502.21304

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.48)
Leisure & Entertainment (0.48)
Information Technology > Services (0.34)
Media > Music (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningFeb-28-2025

Amortized Conditional Independence Testing

Duong, Bao, Hoang, Nu, Nguyen, Thin

Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery-a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the degree of conditional dependence, which is highly challenging yet cannot capture nor utilize prior knowledge in a data-driven manner. In this study, an entirely new approach is introduced, where we instead propose to amortize conditional independence testing and devise ACID ( Amortized C onditional In D ependence test)- a novel transformer-based neural network architecture that learns to test for conditional independence . ACID can be trained on synthetic data in a supervised learning fashion, and the learned model can then be applied to any dataset of similar natures or adapted to new domains by fine-tuning with a negligible computational cost. Our extensive empirical evaluations on both synthetic and real data reveal that ACID consistently achieves state-of-the-art performance against existing baselines under multiple metrics, and is able to generalize robustly to unseen sample sizes, dimensionalities, as well as non-linearities with a remarkably low inference time.

architecture, dataset, test statistics, (13 more...)

2502.20925

Country:

Oceania > Australia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Vukadin, Davor, Šilić, Marin, Delač, Goran

Large Language Models as Attribution Regularizers for Efficient Model Training

arXiv.org Artificial IntelligenceFeb-27-2025

Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. However, effectiv ely leveraging their vast knowledge for training smaller downstream model s remains an open challenge, especially in domains like tabular data lea rning, where simpler models are often preferred due to interpretability and efficiency. In this paper, we introduce a novel yet straightforward meth od for incorporating LLM-generated global task feature attributions i nto the training process of smaller networks. Specifically, we propose an attribution-matching regularization term that aligns the training dyna mics of the smaller model with the insights provided by the LLM. By doing so, our approach yields superior performance in few-shot learn ing scenarios. Notably, our method requires only black-box API access to th e LLM, making it easy to integrate into existing training pipeline s with minimal computational overhead. Furthermore, we demonstrate how this method can be used to ad dress common issues in real-world datasets, such as skewness and b ias. By integrating high-level knowledge from LLMs, our approach i mproves generalization, even when training data is limited or imbal anced. We validate its effectiveness through extensive experiments a cross multiple tasks, demonstrating improved learning efficiency and model robustness.

conference, dataset, language model, (15 more...)

2502.20268

Country:

Oceania > Australia (0.28)
Europe > Austria > Vienna (0.14)
Europe > Croatia (0.14)
(6 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Damato, Stefano, Azzimonti, Dario, Corani, Giorgio

Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood

arXiv.org Machine LearningFeb-27-2025

We introduce the use of Gaussian Processes (GPs) for the probabilistic forecasting of intermittent time series. The model is trained in a Bayesian framework that accounts for the uncertainty about the latent function and marginalizes it out when making predictions. We couple the latent GP variable with two types of forecast distributions: the negative binomial (NegBinGP) and the Tweedie distribution (TweedieGP). While the negative binomial has already been used in forecasting intermittent time series, this is the first time in which a fully parameterized Tweedie density is used for intermittent time series. We properly evaluate the Tweedie density, which is both zero-inflated and heavy tailed, avoiding simplifying assumptions made in existing models. We test our models on thousands of intermittent count time series. Results show that our models provide consistently better probabilistic forecasts than the competitors. In particular, TweedieGP obtains the best estimates of the highest quantiles, thus showing that it is more flexible than NegBinGP.

artificial intelligence, data mining, machine learning, (20 more...)

2502.19086

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Retail (0.67)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Balmaseda, Vicente, Wang, Bokun, Lin, Ching-Long, Yang, Tianbao

Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning

arXiv.org Artificial IntelligenceFeb-27-2025

In self-supervised contrastive learning, negative pairs are typically constructed using an anchor image and a sample drawn from the entire dataset, excluding the anchor. However, this approach can result in the creation of negative pairs with similar semantics, referred to as "false negatives", leading to their embeddings being falsely pushed apart. To address this issue, we introduce GloFND, an optimization-based approach that automatically learns on the fly the threshold for each anchor data to identify its false negatives during training. In contrast to previous methods for false negative discovery, our approach globally detects false negatives across the entire dataset rather than locally within the mini-batch. Moreover, its per-iteration computation cost remains independent of the dataset size. Experimental results on image and image-text data demonstrate the effectiveness of the proposed method. Our implementation is available at https://github.com/vibalcam/GloFND .

dataset, false negative, glofnd, (15 more...)

2502.20612

Country:

North America > United States > Texas (0.04)
North America > United States > Iowa (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Johns, Brandon, Zhou, Zhuomin, Abdi, Elahe

ServoLNN: Lagrangian Neural Networks Driven by Servomechanisms

arXiv.org Artificial IntelligenceFeb-27-2025

Combining deep learning with classical physics facilitates the efficient creation of accurate dynamical models. In a recent class of neural network, Lagrangian mechanics is hard-coded into the architecture, and training the network learns the given system. However, the current architectures do not facilitate the modelling of dynamical systems that are driven by servomechanisms (e.g. servomotors, stepper motors, current sources, volumetric pumps). This article presents ServoLNN, a new architecture to model dynamical systems that are driven by servomechanisms. ServoLNN is compatible for use in real-time applications, where the driving motion is known only just-in-time. A PyTorch implementation of ServoLNN is provided. The derivations and results reveal the occurrence of a possible family of solutions that the training may converge on. The effect of the family of solutions on the predicted physical quantities is explored, as is the resolution to reduce the family of solutions to a single solution. Resultantly, the architecture can simultaneously accurately find the energies, power, rate of work, mass matrix, generalised accelerations, generalised forces, and the generalised forces that drive the servomechanisms.

architecture, generalised coordinate, neural network, (16 more...)

2502.19802

Country: Oceania > Australia > Victoria (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)