AITopics | Supervised Learning

Supplementary Material for " Towards Sharper Generalization Bounds for Structured Prediction " Shaojie Li

Neural Information Processing SystemsFeb-11-2025, 00:15:22 GMT

In this supplementary material, we provide complete proofs of the theorems of the main paper.

artificial intelligence, inductive learning, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.41)

Add feedback

Towards Sharper Generalization Bounds for Structured Prediction Shaojie Li

Neural Information Processing SystemsFeb-11-2025, 00:15:19 GMT

In this paper, we investigate the generalization performance of structured prediction learning and obtain state-of-the-art generalization bounds. Our analysis is based on factor graph decomposition of structured prediction algorithms, and we present novel margin guarantees from three different perspectives: Lipschitz continuity, smoothness, and space capacity condition. In the Lipschitz continuity scenario, we improve the square-root dependency on the label set cardinality of existing bounds to a logarithmic dependence.

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

Joint Metric Space Embedding by Unbalanced OT with Gromov-Wasserstein Marginal Penalization

Beier, Florian, Piening, Moritz, Beinert, Robert, Steidl, Gabriele

arXiv.org Artificial IntelligenceFeb-11-2025

We propose a new approach for unsupervised alignment of heterogeneous datasets, which maps data from two different domains without any known correspondences to a common metric space. Our method is based on an unbalanced optimal transport problem with Gromov-Wasserstein marginal penalization. It can be seen as a counterpart to the recently introduced joint multidimensional scaling method. We prove that there exists a minimizer of our functional and that for penalization parameters going to infinity, the corresponding sequence of minimizers converges to a minimizer of the so-called embedded Wasserstein distance. Our model can be reformulated as a quadratic, multi-marginal, unbalanced optimal transport problem, for which a bi-convex relaxation admits a numerical solver via block-coordinate descent. We provide numerical examples for joint embeddings in Euclidean as well as non-Euclidean spaces.

artificial intelligence, joint metric space embedding, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.0751

Country: Europe > Germany (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.64)

Add feedback

A Appendix

Neural Information Processing SystemsFeb-10-2025, 21:15:33 GMT

A.1 Random processes The notion of random vector in infinite-dimensional vector spaces is not general enough to describe many models of noise, as for example the white noise described in Example 2.7. To overcome this problem, a possibility is to consider the noise as a random process on Y (see the approach in [16]). Here we assume that the random process is linear, with zero mean and bounded. However, the converse is not true as shown by the following example. Hence the random process setting extends the usual formalism of random variables.

artificial intelligence, data quality, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Quality (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Lifting Weak Supervision To Structured Prediction

Neural Information Processing SystemsFeb-10-2025, 20:45:30 GMT

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g.

artificial intelligence, inductive learning, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Genre:

Workflow (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Lifting Weak Supervision To Structured Prediction

Neural Information Processing SystemsFeb-10-2025, 20:45:26 GMT

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g.

artificial intelligence, machine learning, manifold, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.72)

Add feedback

A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions

Neural Information Processing SystemsFeb-10-2025, 19:05:05 GMT

Topological data analysis (TDA) is an area of data science that focuses on using invariants from algebraic topology to provide multiscale shape descriptors for geometric data sets, such as graphs and point clouds. One of the most important such descriptors is persistent homology, which encodes the change in shape as a filtration parameter changes; a typical parameter is the feature scale. For many data sets, it is useful to simultaneously vary multiple filtration parameters, for example feature scale and density. While the theoretical properties of single parameter persistent homology are well understood, less is known about the multiparameter case. In particular, a central question is the problem of representing multiparameter persistent homology by elements of a vector space for integration with standard machine learning algorithms.

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

Add feedback

Geometric Algebra Transformer Johann Brehmer

Neural Information Processing SystemsFeb-10-2025, 18:39:52 GMT

Problems involving geometric data arise in physics, chemistry, robotics, computer vision, and many other fields. Such data can take numerous forms, for instance points, direction vectors, translations, or rotations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric (or Clifford) algebra, which offers an efficient 16-dimensional vector-space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a Transformer, GATr is versatile, efficient, and scalable. We demonstrate GATr in problems from n-body modeling to wall-shear-stress estimation on large arterial meshes to robotic motion planning. GATr consistently outperforms both non-geometric and equivariant baselines in terms of error, data efficiency, and scalability.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

On the Stochastic Stability of Deep Markov Models Ján Drgoňa

Neural Information Processing SystemsFeb-10-2025, 18:03:54 GMT

Definition 4. Given a metric space (X, d), a mapping T: X X is called contractive if there exist a constant c [0, 1) and a metric d such that following holds: d(T (x

artificial intelligence, machine learning, stochastic stability, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.71)

Technology: