AITopics

In recent years, modeling and analysis of interval-valued time series have garnered increasing attention in econometrics, finance, and statistics. However, these studies have predominantly focused on statistical inference in the forecasting of univariate and multivariate interval-valued time series, overlooking another important aspect: classification. In this paper, we introduce a classification approach that treats intervals as unified entities, applicable to both univariate and multivariate interval-valued time series. Specifically, we first extend the point-valued time series imaging methods to interval-valued scenarios using the $D_K$-distance, enabling the imaging of interval-valued time series. Then, we employ suitable deep learning model for classification on the obtained imaging dataset, aiming to achieve classification for interval-valued time series. In theory, we derived a sharper excess risk bound for deep multiclassifiers based on offset Rademacher complexity. Finally, we validate the superiority of the proposed method through comparisons with various existing point-valued time series classification methods in both simulation studies and real data applications.

artificial intelligence, interval-valued time sery, machine learning, (19 more...)

2504.04667

Country: Asia > China (0.46)

Genre: Research Report (0.63)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kacprzak, Tomasz, Kamper, Francois, Heiss, Michael W., Janka, Gianluca, Dillner, Ann M., Takahama, Satoshi

Scalable Approximate Algorithms for Optimal Transport Linear Models

Recently, linear regression models incorporating an optimal transport (OT) loss have been explored for applications such as supervised unmixing of spectra, music transcription, and mass spectrometry. However, these task-specific approaches often do not generalize readily to a broader class of linear models. In this work, we propose a novel algorithmic framework for solving a general class of non-negative linear regression models with an entropy-regularized OT datafit term, based on Sinkhorn-like scaling iterations. Our framework accommodates convex penalty functions on the weights (e.g. squared-$\ell_2$ and $\ell_1$ norms), and admits additional convex loss terms between the transported marginal and target distribution (e.g. squared error or total variation). We derive simple multiplicative updates for common penalty and datafit terms. This method is suitable for large-scale problems due to its simplicity of implementation and straightforward parallelization.

algorithm, artificial intelligence, machine learning, (13 more...)

2504.04609

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Better Rates for Random Task Orderings in Continual Linear Models

Evron, Itay, Levinstein, Ran, Schliserman, Matan, Sherman, Uri, Koren, Tomer, Soudry, Daniel, Srebro, Nathan

We study the common continual learning setup where an overparameterized model is sequentially fitted to a set of jointly realizable tasks. We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations. For linear models, we prove that fitting a task is equivalent to a single stochastic gradient descent (SGD) step on a modified objective. We develop novel last-iterate SGD upper bounds in the realizable least squares setup, and apply them to derive new results for continual learning. Focusing on random orderings over $T$ tasks, we establish universal forgetting rates, whereas existing rates depend on the problem dimensionality or complexity. Specifically, in continual regression with replacement, we improve the best existing rate from $O((d-r)/k)$ to $O(\min(k^{-1/4}, \sqrt{d-r}/k, \sqrt{Tr}/k))$, where $d$ is the dimensionality and $r$ the average task rank. Furthermore, we establish the first rates for random task orderings without replacement. The obtained rate of $O(\min(T^{-1/4}, (d-r)/T))$ proves for the first time that randomization alone, with no task repetition, can prevent catastrophic forgetting in sufficiently long task sequences. Finally, we prove a similar $O(k^{-1/4})$ universal rate for the forgetting in continual linear classification on separable data. Our universal rates apply for broader projection methods, such as block Kaczmarz and POCS, illuminating their loss convergence under i.i.d and one-pass orderings.

artificial intelligence, machine learning, nullnull, (16 more...)

2504.04579

Country:

Europe (0.28)
Asia (0.28)
North America > United States (0.27)

Genre: Research Report > New Finding (0.87)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Artificial IntelligenceApr-6-2025

New Intent Discovery with Pre-training and Contrastive Learning

Zhang, Yuwei, Zhang, Haode, Zhan, Li-Ming, Lam, Albert Y. S., Wu, Xiao-Ming

New intent discovery aims to uncover novel intent categories from user utterances to expand the set of supported intent classes. It is a critical task for the development and service expansion of a practical dialogue system. Despite its importance, this problem remains under-explored in the literature. Existing approaches typically rely on a large amount of labeled utterances and employ pseudo-labeling methods for representation learning and clustering, which are label-intensive, inefficient, and inaccurate. In this paper, we provide new solutions to two important research questions for new intent discovery: (1) how to learn semantic utterance representations and (2) how to better cluster utterances. Particularly, we first propose a multi-task pre-training strategy to leverage rich unlabeled data along with external labeled data for representation learning. Then, we design a new contrastive loss to exploit self-supervisory signals in unlabeled data for clustering. Extensive experiments on three intent recognition benchmarks demonstrate the high effectiveness of our proposed method, which outperforms state-of-the-art methods by a large margin in both unsupervised and semi-supervised scenarios. The source code will be available at https://github.com/zhang-yu-wei/MTP-CLNN.

artificial intelligence, computational linguistic, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2205.12914

Country:

Asia (0.68)
North America > United States > Minnesota (0.28)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)

Hu, Hengrui, Kothari, Anai N., Banerjee, Anjishnu

A Novel Algorithm for Personalized Federated Learning: Knowledge Distillation with Weighted Combination Loss

Federated learning (FL) offers a privacy-preserving framework for distributed machine learning, enabling collaborative model training across diverse clients without centralizing sensitive data. However, statistical heterogeneity, characterized by non-independent and identically distributed (non-IID) client data, poses significant challenges, leading to model drift and poor generalization. This paper proposes a novel algorithm, pFedKD-WCL (Personalized Federated Knowledge Distillation with Weighted Combination Loss), which integrates knowledge distillation with bi-level optimization to address non-IID challenges. pFedKD-WCL leverages the current global model as a teacher to guide local models, optimizing both global convergence and local personalization efficiently. We evaluate pFedKD-WCL on the MNIST dataset and a synthetic dataset with non-IID partitioning, using multinomial logistic regression and multilayer perceptron models. Experimental results demonstrate that pFedKD-WCL outperforms state-of-the-art algorithms, including FedAvg, FedProx, Per-FedAvg, and pFedMe, in terms of accuracy and convergence speed.

artificial intelligence, federated learning, machine learning, (13 more...)

2504.04642

Country: North America > United States (0.16)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)

Flores, Gerardo, Schiff, Abigail, Smith, Alyssa H., Fukuyama, Julia A, Wilson, Ashia C.

A Consequentialist Critique of Binary Classification Evaluation Practices

ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-ROC. We highlight that a consequentialist perspective, long advocated by decision theorists, should naturally favor evaluations that support independent decisions using a mixture of thresholds given their prevalence, such as Brier scores and Log loss. However, our empirical analysis reveals a strong preference for top-K metrics or fixed thresholds in evaluations at major conferences like ICML, FAccT, and CHIL. To address this gap, we use this decision-theoretic framework to map evaluation metrics to their optimal use cases, along with a Python package, briertools, to promote the broader adoption of Brier scores. In doing so, we also uncover new theoretical connections, including a reconciliation between the Brier Score and Decision Curve Analysis, which clarifies and responds to a longstanding critique by (Assel, et al. 2017) regarding the clinical utility of proper scoring rules.

artificial intelligence, brier score, machine learning, (17 more...)

2504.04528

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.92)

Industry:

Law (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Halihal, Morad, Routtenberg, Tirza, Poor, H. Vincent

Cramer-Rao Bounds for Laplacian Matrix Estimation

Abstract--In this paper, we analyze the performance of the estimation of Laplacian matrices under general observatio n models. Laplacian matrix estimation involves structural c on-straints, including symmetry and null-space properties, a long with matrix sparsity. By exploiting a linear reparametriza tion that enforces the structural constraints, we derive closed -form matrix expressions for the Cram er-Rao Bound (CRB) specifically tailored to Laplacian matrix estimation. We further extend the derivation to the sparsity-constrained case, introduc ing two oracle CRBs that incorporate prior information of the suppo rt set, i.e. the locations of the nonzero entries in the Laplaci an matrix. We examine the properties and order relations betwe en the bounds, and provide the associated Slepian-Bangs formu la for the Gaussian case. We demonstrate the use of the new CRBs in three representative applications: (i) topology identi fication in power systems, (ii) graph filter identification in diffuse d models, and (iii) precision matrix estimation in Gaussian M arkov random fields under Laplacian constraints. The CRBs are eval - uated and compared with the mean-squared-errors (MSEs) of the constrained maximum likelihood estimator (CMLE), whic h integrates both equality and inequality constraints along with sparsity constraints, and of the oracle CMLE, which knows the locations of the nonzero entries of the Laplacian matrix . We perform this analysis for the applications of power syste m topology identification and graphical LASSO, and demonstra te that the MSEs of the estimators converge to the CRB and oracle CRB, given a sufficient number of measurements. Graph-structured data and signals arise in numerous applications, including power systems, communications, finance, social networks, and biological networks, for analysis and inference of networks [ 2 ], [ 3 ]. In this context, the Laplacian matrix, which captures node connectivity and edge weights, serves as a fundamental tool for clustering [ 4 ], modeling graph diffusion processes [ 5 ], [ 6 ], topology inference [ 6 ]-[ 12 ], anomaly detection [ 13 ], graph-based filtering [ 14 ]-[ 18 ], and analyzing smoothness on graphs [ 19 ]. M. Halihal and T. Routtenberg are with the School of Electric al and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel, e-mail: moradha@post.bgu.ac.il, tirzar@b gu.ac.il.

artificial intelligence, data mining, machine learning, (18 more...)

2504.04576

Country: Asia > Middle East > Israel > Southern District > Beer-Sheva (0.24)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Energy > Power Industry (0.93)

Technology:

Information Technology > Communications (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Vyas, Niall, Hegde, Disha, Cockayne, Jon

Randomised Postiterations for Calibrated BayesCG

arXiv.org Machine LearningApr-5-2025

The Bayesian conjugate gradient method offers probabilistic solutions to linear systems but suffers from poor calibration, limiting its utility in uncertainty quantification tasks. Recent approaches leveraging postit-erations to construct priors have improved computational properties but failed to correct calibration issues. In this work, we propose a novel randomised postiteration strategy that enhances the calibration of the BayesCG posterior while preserving its favourable convergence characteristics. We present theoretical guarantees for the improved calibration, supported by results on the distribution of posterior errors. Numerical experiments demonstrate the efficacy of the method in both synthetic and inverse problem settings, showing enhanced uncertainty quantification and better propagation of uncertainties through computational pipelines.

artificial intelligence, machine learning, posterior, (16 more...)

2504.04247

Genre: Research Report (0.51)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Winer, Mike, Hanin, Boris

Deep Neural Nets as Hamiltonians

arXiv.org Artificial IntelligenceApr-5-2025

Neural networks are complex functions of both their inputs and parameters. Much prior work in deep learning theory analyzes the distribution of network outputs at a fixed a set of inputs (e.g. a training dataset) over random initializations of the network parameters. The purpose of this article is to consider the opposite situation: we view a randomly initialized Multi-Layer Perceptron (MLP) as a Hamiltonian over its inputs. For typical realizations of the network parameters, we study the properties of the energy landscape induced by this Hamiltonian, focusing on the structure of near-global minimum in the limit of infinite width. Specifically, we use the replica trick to perform an exact analytic calculation giving the entropy (log volume of space) at a given energy. We further derive saddle point equations that describe the overlaps between inputs sampled iid from the Gibbs distribution induced by the random MLP. For linear activations we solve these saddle point equations exactly. But we also solve them numerically for a variety of depths and activation functions, including $\tanh, \sin, \text{ReLU}$, and shaped non-linearities. We find even at infinite width a rich range of behaviors. For some non-linearities, such as $\sin$, for instance, we find that the landscapes of random MLPs exhibit full replica symmetry breaking, while shallow $\tanh$ and ReLU networks or deep shaped MLPs are instead replica symmetric.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2503.23982

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Machine LearningApr-5-2025

Causal Inference Isn't Special: Why It's Just Another Prediction Problem

Fernández-Loría, Carlos

Causal inference is often portrayed as fundamentally distinct from predictive modeling, with its own terminology, goals, and intellectual challenges. But at its core, causal inference is simply a structured instance of prediction under distribution shift. In both cases, we begin with labeled data from a source domain and seek to generalize to a target domain where outcomes are not observed. The key difference is that in causal inference, the labels -- potential outcomes -- are selectively observed based on treatment assignment, introducing bias that must be addressed through assumptions. This perspective reframes causal estimation as a familiar generalization problem and highlights how techniques from predictive modeling, such as reweighting and domain adaptation, apply directly to causal tasks. It also clarifies that causal assumptions are not uniquely strong -- they are simply more explicit. By viewing causal inference through the lens of prediction, we demystify its logic, connect it to familiar tools, and make it more accessible to practitioners and educators alike.

artificial intelligence, causal inference, machine learning, (13 more...)

2504.0432

Genre: Research Report (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)