The objective of this course is to give you a wholistic understanding of machine learning, covering theory, application, and inner workings of supervised, unsupervised, and deep learning algorithms. In this series, we'll be covering linear regression, K Nearest Neighbors, Support Vector Machines (SVM), flat clustering, hierarchical clustering, and neural networks. For each major algorithm that we cover, we will discuss the high level intuitions of the algorithms and how they are logically meant to work. Next, we'll apply the algorithms in code using real world data sets along with a module, such as with Scikit-Learn. Finally, we'll be diving into the inner workings of each of the algorithms by recreating them in code, from scratch, ourselves, including all of the math involved.
Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories.
This short paper is describing a demonstrator that is complementing the paper "Towards Cross-Media Feature Extraction" in these proceedings. The demo is exemplifying the use of textual resources, out of which semantic information can be extracted, for supporting the semantic annotation and indexing of associated video material in the soccer domain. Entities and events extracted from textual data are marked-up with semantic classes derived from an ontology modeling the soccer domain. We show further how extracted Audio-Video features by video analysis can be taken into account for additional annotation of specific soccer event types, and how those different types of annotation can be combined.
Creating mood sensing technology has become very popular in recent years. There is a wide range of companies trying to detect your emotions from what you write, the tone of your voice, or from the expressions on your face. All of these companies offer their technology online through cloud-based programming interfaces (APIs). As part of my offline emotion sensing hardware (Project Jammin), I have already built early prototypes of facial expression and speech content recognition for emotion detection. In this short article I describe the missing part, a voice tone analyzer.