Inductive Learning


Hypergraph based semi-supervised learning algorithms applied to speech recognition problem: a novel approach

arXiv.org Machine Learning

Most network-based speech recognition methods are based on the assumption that the labels of two adjacent speech samples in the network are likely to be the same. However, assuming the pairwise relationship between speech samples is not complete. The information a group of speech samples that show very similar patterns and tend to have similar labels is missed. The natural way overcoming the information loss of the above assumption is to represent the feature data of speech samples as the hypergraph. Thus, in this paper, the three un-normalized, random walk, and symmetric normalized hypergraph Laplacian based semi-supervised learning methods applied to hypergraph constructed from the feature data of speech samples in order to predict the labels of speech samples are introduced. Experiment results show that the sensitivity performance measures of these three hypergraph Laplacian based semi-supervised learning methods are greater than the sensitivity performance measures of the Hidden Markov Model method (the current state of the art method applied to speech recognition problem) and graph based semi-supervised learning methods (i.e. the current state of the art network-based method for classification problems) applied to network created from the feature data of speech samples.


Yann LeCun @EPFL - "Self-supervised learning: could machines learn like humans?"

#artificialintelligence

Conference by Yann LeCun, computer scientist working in machine learning, computer vision, mobile robotics and computational neuroscience, who sees self-supervised learning as a potential solution for problems in reinforcement learning, as it has the advantage of taking both input and output as part of a complete system, making it effective for example in image completing, image transferring, time sequence data prediction, etc. While the model's complexity increases with the addition of feedback information, self-supervised learning models significantly reduce human involvement in the process.


Supervised Learning – Everything You Need To Know

#artificialintelligence

Supervised learning – A blessing we have in this machines era. It helps to depict inputs to outputs. It uses labelled training data to deduce a function which has set of training examples. The majority of practical machine learning uses supervised learning as on date. AILabPage defines Machine Learning as "A focal point where business, data and experience meets emerging technology and decides to work together".


Forget dumping games designers for AI – turns out it takes two to tango

#artificialintelligence

AI can get pretty good at creating content like images and videos, so researchers are trying to get them to design game levels. Machines are okay working on their own and can regurgitate the same material seen in the numerous training examples fed by its human creators. It's fine if what you're after is more of the same thing, but that's boring for games. Game designing bots need more creativity and the best place to learn is off humans. A team of researchers from the Georgia Institute of Technology conducted a series of experiments where humans partnered up with bots to come up with new levels in Super Mario, a popular Nintendo platform game.


From Data to AI with the Machine Learning Canvas (Part III)

#artificialintelligence

I like to think of ML tasks as questions in a certain format, for which the system we're building gives answers. The question has to be about a certain "object" of the real world (which we call the input). In the supervised learning paradigm -- which we're focusing on in this series -- we would make the system learn from example objects AND from the answers for each of them. The inputs in those questions are an email and a property. The Data Sources listed in the LEARN part of the Canvas (see Part II) should provide information about these inputs.


Tangent-Normal Adversarial Regularization for Semi-supervised Learning

arXiv.org Machine Learning

The ever-increasing size of modern datasets combined with the difficulty of obtaining label information has made semi-supervised learning of significant practical importance in modern machine learning applications. Compared with supervised learning, the key difficulty in semi-supervised learning is how to make full use of the unlabeled data. In order to utilize manifold information provided by unlabeled data, we propose a novel regularization called the tangent-normal adversarial regularization, which is composed by two parts. The two terms complement with each other and jointly enforce the smoothness along two different directions that are crucial for semi-supervised learning. One is applied along the tangent space of the data manifold, aiming to enforce local invariance of the classifier on the manifold, while the other is performed on the normal space orthogonal to the tangent space, intending to impose robustness on the classifier against the noise causing the observed data deviating from the underlying data manifold. Both of the two regularizers are achieved by the strategy of virtual adversarial training. Our method has achieved state-of-the-art performance on semi-supervised learning tasks on both artificial dataset and FashionMNIST dataset.


Salesforce open-sources TransmogrifAI, the machine learning library that powers Einstein

#artificialintelligence

Data scientists spend weeks and months not only preprocessing the data on which the models are to be trained, but extracting useful features (i.e., the data types) from that data, narrowing down algorithms, and ultimately building (or attempting to build) a system that performs well not just within the confines of a lab, but in the real world. Salesforce's new toolkit aims to ease that burden somewhat. On GitHub today, the San Francisco-based cloud computing company published TransmogrifAI, an automated machine learning library for structured data -- the kind of searchable, neatly categorized data found in spreadsheets and databases -- that performs feature engineering, feature selection, and model training in just three lines of code. It's written in Scala and built on top of Apache Spark (some of the same technologies that power Salesforce AI platform Einstein) and was designed from the ground up for scalability. To that end, it can process datasets ranging from dozens to millions of rows and run on clustered machines on top of Spark or an off-the-shelf laptop.


Use Amazon Mechanical Turk with Amazon SageMaker for supervised learning Amazon Web Services

#artificialintelligence

Supervised learning needs labels, or annotations, that tell the algorithm what the right answers are in the training phases of your project. In fact, many of the examples of using MXNet, TensorFlow, and PyTorch start with annotated data sets you can use to explore the various features of those frameworks. Unfortunately, when you move from the examples to application, it's much less common to have a fully annotated set of data at your fingertips. This tutorial will show you how you can use Amazon Mechanical Turk (MTurk) from within your Amazon SageMaker notebook to get annotations for your data set and use them for training. TensorFlow provides an example of using an Estimator to classify irises using a neural network classifier.


False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs

arXiv.org Artificial Intelligence

Generating large quantities of quality labeled data in medical imaging is very time consuming and expensive. The performance of supervised algorithms for various tasks on imaging has improved drastically over the years, however the availability of data to train these algorithms have become one of the main bottlenecks for implementation. To address this, we propose a semi-supervised learning method where pseudo-negative labels from unlabeled data are used to further refine the performance of a pulmonary nodule detection network in chest radiographs. After training with the proposed network, the false positive rate was reduced to 0.1266 from 0.4864 while maintaining sensitivity at 0.89.


Cycle Consistent Adversarial Denoising Network for Multiphase Coronary CT Angiography

arXiv.org Artificial Intelligence

In coronary CT angiography, a series of CT images are taken at different levels of radiation dose during the examination. Although this reduces the total radiation dose, the image quality during the low-dose phases is significantly degraded. To address this problem, here we propose a novel semi-supervised learning technique that can remove the noises of the CT images obtained in the low-dose phases by learning from the CT images in the routine dose phases. Although a supervised learning approach is not possible due to the differences in the underlying heart structure in two phases, the images in the two phases are closely related so that we propose a cycle-consistent adversarial denoising network to learn the non-degenerate mapping between the low and high dose cardiac phases. Experimental results showed that the proposed method effectively reduces the noise in the low-dose CT image while the preserving detailed texture and edge information. Moreover, thanks to the cyclic consistency and identity loss, the proposed network does not create any artificial features that are not present in the input images. Visual grading and quality evaluation also confirm that the proposed method provides significant improvement in diagnostic quality.