AITopics | Supervised Learning

Collaborating Authors

Supervised Learning

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Stochastic Structured Prediction under Bandit Feedback, Christopher Lo

Neural Information Processing SystemsMar-12-2024, 13:00:28 GMT

Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.

algorithm, objective, optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)

Add feedback

Improved Deep Metric Learning with Multi-class N-pair Loss Objective

Neural Information Processing SystemsMar-12-2024, 12:29:35 GMT

Deep metric learning has gained much popularity in recent years, following the success of deep learning. However, existing frameworks of deep metric learning based on contrastive loss and triplet loss often suffer from slow convergence, partially because they employ only one negative example while not interacting with the other negative classes in each update. In this paper, we propose to address this problem with a new metric learning objective called multi-class N-pair loss. The proposed objective function firstly generalizes triplet loss by allowing joint comparison among more than one negative examples - more specifically, N-1 negative examples - and secondly reduces the computational burden of evaluating deep embedding vectors via an efficient batch construction strategy using only N pairs of examples, instead of (N+1) N. We demonstrate the superiority of our proposed loss to the triplet loss as well as other competing loss functions for a variety of tasks on several visual recognition benchmark, including fine-grained object recognition and verification, image clustering and retrieval, and face verification and identification.

metric learning, negative example, triplet loss, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Construction & Engineering (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Structured Prediction Theory Based on Factor Graph Complexity

Neural Information Processing SystemsMar-12-2024, 11:16:20 GMT

We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, factor graph complexity, which we show can be estimated from data and bounded in terms of familiar quantities for several commonly used hypothesis sets along with a sparsity measure for features and graphs. Our proof techniques include generalizations of Talagrand's contraction lemma that can be of independent interest. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to design two new algorithms, Voted Conditional Random Field (VCRF) and Voted Structured Boosting (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory.

algorithm, hypothesis, prediction, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games

Neural Information Processing SystemsMar-12-2024, 10:15:53 GMT

We study a general adversarial online learning problem, in which we are given a decision set X in a reflexive Banach space X and a sequence of reward vectors in the dual space of X. At each iteration, we choose an action from X, based on the observed sequence of previous rewards. Our goal is to minimize regret. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging to our setting and obtain upper bounds on the worst-case regret that generalize many previous results. Under the assumption of uniformly continuous rewards, we obtain explicit regret bounds in a setting where the decision set is the set of probability distributions on a compact metric space S. Importantly, we make no convexity assumptions on either S or the reward functions. We also prove a general lower bound on the worst-case regret for any online algorithm. We then apply these results to the problem of learning in repeated two-player zero-sum games on compact metric spaces. In doing so, we first prove that if both players play a Hannan-consistent strategy, then with probability 1 the empirical distributions of play weakly converge to the set of Nash equilibria of the game. We then show that, under mild assumptions, Dual Averaging on the (infinite-dimensional) space of probability distributions indeed achieves Hannan-consistency.

algorithm, assumption, sequence, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.55)

Add feedback

Active Nearest-Neighbor Learning in Metric Spaces

Neural Information Processing SystemsMar-12-2024, 08:00:29 GMT

We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. Our algorithm is based on a generalized sample compression scheme and a new label-efficient active model-selection procedure.

compression, marmann, passive learner, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.62)

Add feedback

Novelty Detection on Radio Astronomy Data using Signatures

Arrubarrena, Paola, Lemercier, Maud, Nikolic, Bojan, Lyons, Terry, Cass, Thomas

arXiv.org Artificial IntelligenceMar-12-2024

We introduce SigNova, a new semi-supervised framework for detecting anomalies in streamed data. While our initial examples focus on detecting radio-frequency interference (RFI) in digitized signals within the field of radio astronomy, it is important to note that SigNova's applicability extends to any type of streamed data. The framework comprises three primary components. Firstly, we use the signature transform to extract a canonical collection of summary statistics from observational sequences. This allows us to represent variable-length visibility samples as finite-dimensional feature vectors. Secondly, each feature vector is assigned a novelty score, calculated as the Mahalanobis distance to its nearest neighbor in an RFI-free training set. By thresholding these scores we identify observation ranges that deviate from the expected behavior of RFI-free visibility samples without relying on stringent distributional assumptions. Thirdly, we integrate this anomaly detector with Pysegments, a segmentation algorithm, to localize consecutive observations contaminated with RFI, if any. This approach provides a compelling alternative to classical windowing techniques commonly used for RFI detection. Importantly, the complexity of our algorithm depends on the RFI pattern rather than on the size of the observation window. We demonstrate how SigNova improves the detection of various types of RFI (e.g., broadband and narrowband) in time-frequency visibility data. We validate our framework on the Murchison Widefield Array (MWA) telescope and simulated data and the Hydrogen Epoch of Reionization Array (HERA).

frequency, rfi, threshold, (16 more...)

arXiv.org Artificial Intelligence

2402.14892

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Oceania > Australia (0.04)
North America > United States > New Jersey (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.92)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.54)

Add feedback

Chasing Convex Functions with Long-term Constraints

Lechowicz, Adam, Christianson, Nicolas, Sun, Bo, Bashir, Noman, Hajiesmaili, Mohammad, Wierman, Adam, Shenoy, Prashant

arXiv.org Artificial IntelligenceFeb-21-2024

This paper introduces and studies a novel class of online metric problems with long-term demand constraints motivated by emerging applications in the design of sustainable systems. In convex function chasing with a long-term constraint, an online player aims to satisfy a demand by making decisions in a normed vector space, paying a hitting cost based on time-varying convex cost functions which are revealed online, and switching cost defined by the norm. The player is constrained to ensure that the entire demand is satisfied at or before the time horizon ends, and their objective is to minimize their total cost. The generality of this problem makes it applicable to a wide variety of online resource allocation problems; in this paper, we consider one such special case, discussing its connections to other online settings and suggestions towards broad new areas of inquiry in online optimization with long-term constraints. Our motivation to introduce these problems is rooted in an emerging class of carbon-aware control problems for sustainable systems. A shared objective involves minimizing carbon emissions by shifting flexible workloads temporally and/or spatially to better leverage low-carbon electricity generation (e.g., renewables such as solar and wind).

algorithm, constraint, cost function, (16 more...)

arXiv.org Artificial Intelligence

2402.14012

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(8 more...)

Genre: Research Report (0.64)

Industry: Energy > Renewable (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Fast, smooth and adaptive regression in metric spaces

Neural Information Processing SystemsFeb-16-2024, 11:06:48 GMT

It was recently shown that certain nonparametric regressors can escape the curse of dimensionality in the sense that their convergence rates adapt to the intrinsic dimension of data (\cite{BL:65, SK:77}). We prove some stronger results in more general settings. In particular, we consider a regressor which, by combining aspects of both tree-based regression and kernel regression, operates on a general metric space, yields a smooth function, and evaluates in time O(\log n) . We derive a tight convergence rate of the form n {-2/(2 d)} where d is the Assouad dimension of the input space.

adaptive regression, dimension, metric space, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.69)

Add feedback

Cocktail Party Processing via Structured Prediction

Neural Information Processing SystemsFeb-16-2024, 06:58:15 GMT

While human listeners excel at selectively attending to a conversation in a cocktail party, machine performance is still far inferior by comparison. We show that the cocktail party problem, or the speech separation problem, can be effectively approached via structured prediction. To account for temporal dynamics in speech, we employ conditional random fields (CRFs) to classify speech dominance within each time-frequency unit for a sound mixture. To capture complex, nonlinear relationship between input and output, both state and transition feature functions in CRFs are learned by deep neural networks. The formulation of the problem as classification allows us to directly optimize a measure that is well correlated with human speech intelligibility.

cocktail party processing, structured prediction, wang

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)

Add feedback

Learning Adaptive Value of Information for Structured Prediction

Neural Information Processing SystemsFeb-16-2024, 06:02:31 GMT

Discriminative methods for learning structured models have enabled wide-spread use of very rich feature representations. However, the computational cost of feature extraction is prohibitive for large-scale or time-sensitive applications, often dominating the cost of inference in the models. Significant efforts have been devoted to sparsity-based model selection to decrease this cost. Such feature selection methods control computation statically and miss the opportunity to fine-tune feature extraction to each input at run-time. We address the key challenge of learning to control fine-grained feature extraction adaptively, exploiting non-homogeneity of the data.

information, learning adaptive value, structured prediction, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)

Add feedback