AITopics | Supervised Learning

Collaborating Authors

Supervised Learning

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners

Zaki, Mohammadi, Mohan, Avi, Gopalan, Aditya

arXiv.org Machine LearningJun-13-2020

We study the problem of best arm identification in linearly parameterised multi-armed bandits. Given a set of feature vectors $\mathcal{X}\subset\mathbb{R}^d,$ a confidence parameter $\delta$ and an unknown vector $\theta^*,$ the goal is to identify $\arg\max_{x\in\mathcal{X}}x^T\theta^*$, with probability at least $1-\delta,$ using noisy measurements of the form $x^T\theta^*.$ For this fixed confidence ($\delta$-PAC) setting, we propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem. Previous approaches rely on access to minimax optimization oracles. The algorithm, which we call the \textit{Phased Elimination Linear Exploration Game} (PELEG), maintains a high-probability confidence ellipsoid containing $\theta^*$ in each round and uses it to eliminate suboptimal arms in phases. PELEG achieves fast shrinkage of this confidence ellipsoid along the most confusing (i.e., close to, but not optimal) directions by interpreting the problem as a two player zero-sum game, and sequentially converging to its saddle point using low-regret learners to compute players' strategies in each round. We analyze the sample complexity of PELEG and show that it matches, up to order, an instance-dependent lower bound on sample complexity in the linear bandit setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees.

algorithm, bandit, sample complexity, (13 more...)

arXiv.org Machine Learning

2006.07562

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.34)

Add feedback

Neural Architecture Search using Bayesian Optimisation with Weisfeiler-Lehman Kernel

Ru, Binxin, Wan, Xingchen, Dong, Xiaowen, Osborne, Michael

arXiv.org Machine LearningJun-13-2020

Bayesian optimisation (BO) has been widely used for hyperparameter optimisation but its application in neural architecture search (NAS) is limited due to the non-continuous, high-dimensional and graph-like search spaces. Current approaches either rely on encoding schemes, which are not scalable to large architectures and ignore the implicit topological structure of architectures, or use graph neural networks, which require additional hyperparameter tuning and a large amount of observed data, which is particularly expensive to obtain in NAS. We propose a neat BO approach for NAS, which combines the Weisfeiler-Lehman graph kernel with a Gaussian process surrogate to capture the topological structure of architectures, without having to explicitly define a Gaussian process over high-dimensional vector spaces. We also harness the interpretable features learnt via the graph kernel to guide the generation of new architectures. We demonstrate empirically that our surrogate model is scalable to large architectures and highly data-efficient; competing methods require 3 to 20 times more observations to achieve equally good prediction performance as ours. We finally show that our method outperforms existing NAS approaches to achieve state-of-the-art results on NAS datasets.

architecture, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2006.07556

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Propositionalization and Embeddings: Two Sides of the Same Coin

Lavrač, Nada, Škrlj, Blaž, Robnik-Šikonja, Marko

arXiv.org Machine LearningJun-8-2020

Data preprocessing is an important component of machine learning pipelines, which requires ample time and resources. An integral part of preprocessing is data transformation into the format required by a given learning algorithm. This paper outlines some of the modern data processing techniques used in relational learning that enable data fusion from different input data types and formats into a single table data representation, focusing on the propositionalization and embedding data transformation approaches. While both approaches aim at transforming data into tabular data format, they use different terminology and task definitions, are perceived to address different goals, and are used in different contexts. This paper contributes a unifying framework that allows for improved understanding of these two data transformation techniques by presenting their unified definitions, and by explaining the similarities and differences between the two approaches as variants of a unified complex data transformation task. In addition to the unifying framework, the novelty of this paper is a unifying methodology combining propositionalization and embeddings, which benefits from the advantages of both in solving complex data transformation and learning tasks. We present two efficient implementations of the unifying methodology: an instance-based PropDRM approach, and a feature-based PropStar approach to data transformation and learning, together with their empirical evaluation on several relational problems. The results show that the new algorithms can outperform existing relational learners and can solve much larger problems.

data mining, logic & formal reasoning, machine learning, (25 more...)

arXiv.org Machine Learning

doi: 10.1007/s10994-020-05890-8

2006.0441

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(11 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(8 more...)

Add feedback

Unsupervised Graph Representation by Periphery and Hierarchical Information Maximization

Bandyopadhyay, Sambaran, Aggarwal, Manasvi, Murty, M. Narasimha

arXiv.org Machine LearningJun-8-2020

Deep representation learning on non-Euclidean data types, such as graphs, has gained significant attention in recent years. Invent of graph neural networks has improved the state-of-the-art for both node and the entire graph representation in a vector space. However, for the entire graph representation, most of the existing graph neural networks are trained on a graph classification loss in a supervised way. But obtaining labels of a large number of graphs is expensive for real world applications. Thus, we aim to propose an unsupervised graph neural network to generate a vector representation of an entire graph in this paper. For this purpose, we combine the idea of hierarchical graph neural networks and mutual information maximization into a single framework. We also propose and use the concept of periphery representation of a graph and show its usefulness in the proposed algorithm which is referred as GraPHmax. We conduct thorough experiments on several real-world graph datasets and compare the performance of GraPHmax with a diverse set of both supervised and unsupervised baseline algorithms. Experimental results show that we are able to improve the state-of-the-art for multiple graph level tasks on several real-world datasets, while remain competitive on the others.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Machine Learning

2006.04696

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Earballs: Neural Transmodal Translation

Port, Andrew, Kim, Chelhwon, Patel, Mitesh

arXiv.org Machine LearningJun-5-2020

As is expressed in the adage "a picture is worth a thousand words", when using spoken language to communicate visual information, brevity can be a challenge. This work describes a novel technique for leveraging machine learned feature embeddings to translate visual (and other types of) information into a perceptual audio domain, allowing users to perceive this information using only their aural faculty. The system uses a pretrained image embedding network to extract visual features and embed them in a compact subset of Euclidean space -- this converts the images into feature vectors whose $L^2$ distances can be used as a meaningful measure of similarity. A generative adversarial network (GAN) is then used to find a distance preserving map from this metric space of feature vectors into the metric space defined by a target audio dataset equipped with either the Euclidean metric or a mel-frequency cepstrum-based psychoacoustic distance metric. We demonstrate this technique by translating images of faces into human speech-like audio. For both target audio metrics, the GAN successfully found a metric preserving mapping, and in human subject tests, users were able to accurately classify audio translations of faces.

artificial intelligence, feature vector, machine learning, (15 more...)

arXiv.org Machine Learning

2005.13291

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.99)

Add feedback

Demystifying Orthogonal Monte Carlo and Beyond

Lin, Han, Chen, Haoxian, Zhang, Tianyi, Laroche, Clement, Choromanski, Krzysztof

arXiv.org Machine LearningMay-27-2020

Orthogonal Monte Carlo (OMC) is a very effective sampling algorithm imposing structural geometric conditions (orthogonality) on samples for variance reduction. Due to its simplicity and superior performance as compared to its Quasi Monte Carlo counterparts, OMC is used in a wide spectrum of challenging machine learning applications ranging from scalable kernel methods to predictive recurrent neural networks, generative models and reinforcement learning. However theoretical understanding of the method remains very limited. In this paper we shed new light on the theoretical principles behind OMC, applying theory of negatively dependent random variables to obtain several new concentration results. We also propose a novel extensions of the method leveraging number theory techniques and particle algorithms, called Near-Orthogonal Monte Carlo (NOMC). We show that NOMC is the first algorithm consistently outperforming OMC in applications ranging from kernel methods to approximating distances in probabilistic metric spaces.

artificial intelligence, estimator, machine learning, (19 more...)

arXiv.org Machine Learning

2005.1359

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
(11 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Information-Theoretic Generalization Bounds for Meta-Learning and Applications

Jose, Sharu Theresa, Simeone, Osvaldo

arXiv.org Machine LearningMay-14-2020

Meta-learning, or "learning to learn", refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that uses either separate within-task training and test sets, like MAML, or joint within-task training and test sets, like Reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed, under given technical conditions, for the two classes via novel Individual Task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Machine Learning

2005.04372

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)

Add feedback

Multi-Instance Multi-Label Learning for Gene Mutation Prediction in Hepatocellular Carcinoma

Xu, Kaixin, Zhao, Ziyuan, Gu, Jiapan, Zeng, Zeng, Ying, Chan Wan, Choon, Lim Kheng, Hua, Thng Choon, Chow, Pierce KH

arXiv.org Machine LearningMay-8-2020

Gene mutation prediction in hepatocellular carcinoma (HCC) is of great diagnostic and prognostic value for personalized treatments and precision medicine. In this paper, we tackle this problem with multi-instance multi-label learning to address the difficulties on label correlations, label representations, etc. Furthermore, an effective oversampling strategy is applied for data imbalance. Experimental results have shown the superiority of the proposed approach.

artificial intelligence, inductive learning, machine learning, (14 more...)

arXiv.org Machine Learning

2005.04073

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > Canada (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.47)

Add feedback

A learning problem whose consistency is equivalent to the non-existence of real-valued measurable cardinals

Pestov, Vladimir G.

arXiv.org Machine LearningMay-4-2020

We show that the $k$-nearest neighbour learning rule is universally consistent in a metric space $X$ if and only if it is universally consistent in every separable subspace of $X$ and the density of $X$ is less than every real-measurable cardinal. In particular, the $k$-NN classifier is universally consistent in every metric space whose separable subspaces are sigma-finite dimensional in the sense of Nagata and Preiss if and only if there are no real-valued measurable cardinals. The latter assumption is relatively consistent with ZFC, however the consistency of the existence of such cardinals cannot be proved within ZFC. Our results were inspired by an example sketched by C\'erou and Guyader in 2006 at an intuitive level of rigour.

artificial intelligence, machine learning, measurable cardinal, (17 more...)

arXiv.org Machine Learning

2005.01886

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Israel (0.04)
South America > Brazil > Bahia > Salvador (0.04)
(6 more...)

Genre: Research Report (0.70)

Industry: Education > Focused Education > Special Education (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.59)

Add feedback

Feature extraction and similar image search with OpenCV for newbies

#artificialintelligenceMay-1-2020, 14:46:10 GMT

Image features For this task, first of all, we need to understand what is an Image Feature and how we can use it. Image feature is a simple image pattern, based on which we can describe what we see on the image. For example cat eye will be a feature on a image of a cat. The main role of features in computer vision(and not only) is to transform visual information into the vector space. Ok, but how to get this features from the image?

algorithm, extraction and similar image search, feature vector, (12 more...)

#artificialintelligence

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining > Feature Extraction (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)

Add feedback