AITopics | Supervised Learning

Collaborating Authors

Supervised Learning

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Structured Prediction with Stronger Consistency Guarantees

Neural Information Processing SystemsJan-19-2025, 15:30:48 GMT

We present an extensive study of surrogate losses for structured prediction supported by * H -consistency bounds*. These are recently introduced guarantees that are more relevant to learning than Bayes-consistency, since they are not asymptotic and since they take into account the hypothesis set H used. We first show that no non-trivial H -consistency bound can be derived for widely used surrogate structured prediction losses. We then define several new families of surrogate losses, including *structured comp-sum losses* and *structured constrained losses*, for which we prove H -consistency bounds and thus Bayes-consistency. These loss functions readily lead to new structured prediction algorithms with stronger theoretical guarantees, based on their minimization.

stronger consistency guarantee, structured prediction, surrogate loss, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Towards Sharper Generalization Bounds for Structured Prediction

Neural Information Processing SystemsJan-19-2025, 10:27:46 GMT

In this paper, we investigate the generalization performance of structured prediction learning and obtain state-of-the-art generalization bounds. Our analysis is based on factor graph decomposition of structured prediction algorithms, and we present novel margin guarantees from three different perspectives: Lipschitz continuity, smoothness, and space capacity condition. In the Lipschitz continuity scenario, we improve the square-root dependency on the label set cardinality of existing bounds to a logarithmic dependence. In the smoothness scenario, we provide generalization bounds that are not only a logarithmic dependency on the label set cardinality but a faster convergence rate of order \mathcal{O}(\frac{1}{n}) on the sample size n . In the space capacity scenario, we obtain bounds that do not depend on the label set cardinality and have faster convergence rates than \mathcal{O}(\frac{1}{\sqrt{n}}) .

scenario, sharper generalization bound, structured prediction, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)

Add feedback

Lifting Weak Supervision To Structured Prediction

Neural Information Processing SystemsJan-19-2025, 07:15:56 GMT

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from various sources. WS is theoretically well-understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for \emph{structured prediction}, where the output space consists of more than a binary or multi-class label set: e.g. Do the favorable theoretical properties of WS for binary classification lift to this setting?

lifting weak supervision, pseudolabel, structured prediction, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

A Metric Topology of Deep Learning for Data Classification

Wu, Jwo-Yuh, Huang, Liang-Chi, Li, Wen-Hsuan, Liu, Chun-Hung

arXiv.org Machine LearningJan-19-2025

Empirically, Deep Learning (DL) has demonstrated unprecedented success in practical applications. However, DL remains by and large a mysterious "black-box", spurring recent theoretical research to build its mathematical foundations. In this paper, we investigate DL for data classification through the prism of metric topology. Considering that conventional Euclidean metric over the network parameter space typically fails to discriminate DL networks according to their classification outcomes, we propose from a probabilistic point of view a meaningful distance measure, whereby DL networks yielding similar classification performances are close. The proposed distance measure defines such an equivalent relation among network parameter vectors that networks performing equally well belong to the same equivalent class. Interestingly, our proposed distance measure can provably serve as a metric on the quotient set modulo the equivalent relation. Then, under quite mild conditions it is shown that, apart from a vanishingly small subset of networks likely to predict non-unique labels, our proposed metric space is compact, and coincides with the well-known quotient topological space. Our study contributes to fundamental understanding of DL, and opens up new ways of studying DL using fruitful metric space theory.

artificial intelligence, machine learning, topology, (13 more...)

arXiv.org Machine Learning

2501.11265

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.55)

Add feedback

On Certified Generalization in Structured Prediction

Neural Information Processing SystemsJan-18-2025, 19:04:07 GMT

In structured prediction, target objects have rich internal structure which does not factorize into independent components and violates common i.i.d. This challenge becomes apparent through the exponentially large output space in applications such as image segmentation or scene graph generation.We present a novel PAC-Bayesian risk bound for structured prediction wherein the rate of generalization scales not only with the number of structured examples but also with their size.The underlying assumption, conforming to ongoing research on generative models, is that data are generated by the Knothe-Rosenblatt rearrangement of a factorizing reference measure. This allows to explicitly distill the structure between random output variables into a Wasserstein dependency matrix. Our work makes a preliminary step towards leveraging powerful generative models to establish generalization bounds for discriminative downstream tasks in the challenging setting of structured prediction.

certified generalization, generative model, structured prediction, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Is magnitude 'generically continuous' for finite metric spaces?

Katsumasa, Hirokazu, Roff, Emily, Yoshinaga, Masahiko

arXiv.org Machine LearningJan-15-2025

Magnitude is a real-valued invariant of metric spaces which, in the finite setting, can be understood as recording the 'effective number of points' in a space as the scale of the metric varies. Motivated by applications in topological data analysis, this paper investigates the stability of magnitude: its continuity properties with respect to the Gromov-Hausdorff topology. We show that magnitude is nowhere continuous on the Gromov-Hausdorff space of finite metric spaces. Yet, we find evidence to suggest that it may be 'generically continuous', in the sense that generic Gromov-Hausdorff limits are preserved by magnitude. We make the case that, in fact, 'generic stability' is what matters for applicability.

finite metric space, magnitude, metric space, (13 more...)

arXiv.org Machine Learning

2501.08745

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.88)

Add feedback

Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Neural Information Processing SystemsJan-14-2025, 09:04:49 GMT

Recent advance in deep learning has led to rapid adoption of machine learning based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor. This work studies model update regression in structured prediction tasks. We choose syntactic dependency parsing and conversational semantic parsing as representative examples of structured prediction tasks in NLP.

model update regression, prediction task, structured prediction, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.89)

Add feedback

Multimodal semantic retrieval for product search

Liu, Dong, Ramos, Esther Lopez

arXiv.org Artificial IntelligenceJan-13-2025

Semantic retrieval (also known as dense retrieval) based on textual data has been extensively studied for both web search and product search application fields, where the relevance of a query and a potential target document is computed by their dense vector representation comparison. Product image is crucial for e-commence search interactions and is a key factor for customers at product explorations. But its impact for semantic retrieval has not been well studied yet. In this research, we build a multimodal representation for product items in e-commerece search in contrast to pure-text representation of products, and investigate the impact of such representations. The models are developed and evaluated on e-commerce datasets. We demonstrate that a multimodal representation scheme for a product can show improvement either on purchase recall or relevance accuracy in semantic retrieval. Additionally, we provide numerical analysis for exclusive matches retrieved by a multimodal semantic retrieval model versus a text-only semantic retrieval model, to demonstrate the validation of multimodal solutions.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.07365

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Services > e-Commerce Services (0.36)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.30)

Add feedback

A cohomology-based Gromov-Hausdorff metric approach for quantifying molecular similarity

Wee, JunJie, Gong, Xue, Tuschmann, Wilderich, Xia, Kelin

arXiv.org Machine LearningJan-13-2025

We introduce, for the first time, a cohomology-based Gromov-Hausdorff ultrametric method to analyze 1-dimensional and higher-dimensional (co)homology groups, focusing on loops, voids, and higher-dimensional cavity structures in simplicial complexes, to address typical clustering questions arising in molecular data analysis. The Gromov-Hausdorff distance quantifies the dissimilarity between two metric spaces. In this framework, molecules are represented as simplicial complexes, and their cohomology vector spaces are computed to capture intrinsic topological invariants encoding loop and cavity structures. These vector spaces are equipped with a suitable distance measure, enabling the computation of the Gromov-Hausdorff ultrametric to evaluate structural dissimilarities. We demonstrate the methodology using organic-inorganic halide perovskite (OIHP) structures. The results highlight the effectiveness of this approach in clustering various molecular structures. By incorporating geometric information, our method provides deeper insights compared to traditional persistent homology techniques.

artificial intelligence, cohomology generator, machine learning, (17 more...)

arXiv.org Machine Learning

2411.13887

Country:

North America > Canada (0.46)
North America > United States > Michigan (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.90)

Add feedback

The Magnitude of Categories of Texts Enriched by Language Models

Bradley, Tai-Danae, Vigneaux, Juan Pablo

arXiv.org Artificial IntelligenceJan-11-2025

The purpose of this article is twofold. Firstly, we use the next-token probabilities given by a language model to explicitly define a $[0,1]$-enrichment of a category of texts in natural language, in the sense of Bradley, Terilla, and Vlassopoulos. We consider explicitly the terminating conditions for text generation and determine when the enrichment itself can be interpreted as a probability over texts. Secondly, we compute the M\"obius function and the magnitude of an associated generalized metric space $\mathcal{M}$ of texts using a combinatorial version of these quantities recently introduced by Vigneaux. The magnitude function $f(t)$ of $\mathcal{M}$ is a sum over texts $x$ (prompts) of the Tsallis $t$-entropies of the next-token probability distributions $p(-|x)$ plus the cardinality of the model's possible outputs. The derivative of $f$ at $t=1$ recovers a sum of Shannon entropies, which justifies seeing magnitude as a partition function. Following Leinster and Schulman, we also express the magnitude function of $\mathcal M$ as an Euler characteristic of magnitude homology and provide an explicit description of the zeroeth and first magnitude homology groups.

category, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2501.06662

Country: North America > United States > California (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.37)

Add feedback