Education
Have You Tried Using a 'Nearest Neighbor Search'?
Roughly a year and a half ago, I had the privelage of taking a graduate "Introduction to Machine Learning" course under the tutelage of the fantastic Professor Leslie Kaelbling. While I learned a great deal over the course of the semester, there was one minor point that she made to the class which stuck with me more than I expected it to at the time: before using a really fancy or sophisticated or "in-vogue" machine learning algorithm to solve your problem, try a simple Nearest Neighbor Search first. Let's say I gave you a bunch of data points, each with a location in space and a value, and then asked you to predict the value of a new point in space. Perhaps the values of you data are binary (just s and -s) and you've heard of Support Vector Machines. Should you give that a shot?
The Future Of Big Data Is Bigger Than You Can Possibly Imagine
Imagine a world without government, schools, a legal system, law enforcement, or companies. It's a world unlike the one we currently live in--but based on the evolution of technology and how we use it--representative of what the world may become. Imagine a computer infrastructure that could--with global knowledge and the ability to enact precise tweaks to the social and economic structure--drive the evolution of society. This is the idea behind the Universal Graph. In mathematics, this is a graph (or network) in which a piece of information can be connected with other pieces of information until all finite information is integrated.
Bilingual Distributed Word Representations from Document-Aligned Comparable Data
Vuliฤ, Ivan, Moens, Marie-Francine
We propose a new model for learning bilingual word representations from non-parallel document-aligned data. Following the recent advances in word representation learning, our model learns dense real-valued word vectors, that is, bilingual word embeddings (BWEs). Unlike prior work on inducing BWEs which heavily relied on parallel sentence-aligned corpora and/or readily available translation resources such as dictionaries, the article reveals that BWEs may be learned solely on the basis of document-aligned comparable data without any additional lexical resources nor syntactic information. We present a comparison of our approach with previous state-of-the-art models for learning bilingual word representations from comparable data that rely on the framework of multilingual probabilistic topic modeling (MuPTM), as well as with distributional local context-counting models. We demonstrate the utility of the induced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2) suggesting word translations in context for polysemous words. Our simple yet effective BWE-based models significantly outperform the MuPTM-based and context-counting representation models from comparable data as well as prior BWE-based models, and acquire the best reported results on both tasks for all three tested language pairs.
Using "The Machine Stops" for Teaching Ethics in Artificial Intelligence and Computer Science
Burton, Emanuelle (University of Chicago) | Goldsmith, Judy (University of Kentucky) | Mattei, Nicholas (Data61 and University of New South Wales)
A key front for ethical questions in artificial intelligence, and computer science more generally, is teaching students how to engage with the questions they will face in their professional careers based on the tools and technologies we teach them. ย In past work (and current teaching) we have advocated for the use of science fiction as an appropriate tool which enables AI researchers to engage students and the public on the current state and potential impacts of AI. We present teaching suggestions for E.M. Forster's 1909 story, "The Machine Stops," to teach topics in computer ethics. ย In particular, we use the story to examine ethical issues related to being constantly available for remote contact, physically isolated, and dependent on a machine --- all without mentioning computer games or other media to which students have strong emotional associations. We give a high-level view of common ethical theories and indicate how they inform the questions raised by the story and afford a structure for thinking about how to address them.
Simultaneous Influencing and Mapping for Health Interventions
Marcolino, Leandro Soriano (University of Southern California) | Lakshminarayanan, Aravind (Indian Institute of Technology, Madras) | Yadav, Amulya (University of Southern California) | Tambe, Milind (University of Southern California)
Influence Maximization is an active topic, but it was always assumed full knowledge of the social network graph. However, the graph may actually be unknown beforehand. For example, when selecting a subset of a homeless population to attend interventions concerning health, we deal with a network that is not fully known. Hence, we introduce the novel problem of simultaneously influencing and mapping (i.e., learning) the graph. We study a class of algorithms, where we show that: (i) traditional algorithms may have arbitrarily low performance; (ii) we can effectively influence and map when the independence of objectives hypothesis holds; (iii) when it does not hold, the upper bound for the influence loss converges to 0. We run extensive experiments over four real-life social networks, where we study two alternative models, and obtain significantly better results in both than traditional approaches.
Create your apps with the help of cloud machine learning
The machine learning solution from Google offers learning services with pre-trained models as well as the option of generating your own tailor-made models. A neural net-based platform, it performs better and more accurately than other learning systems on the market. The technology is currently available for developers in limited preview phase. The cloud machine learning tool can be used with some of the technologies that Google employs in its services, such as voice searches and translations in Gmail, therefore speeding up the development process. Its main advantages include greater speed, scalability and usability for all the applications featured in these services.
Visualizing and Understanding Recurrent Networks SkillsCast
Recurrent Neural Networks (RNNs), and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data. I will summarize my own experience with training these models for automated image captioning and for generating text character by character, with a particular focus on understanding the source of their impressive performance and their limitations.
Building online communities: Numenta
We caught up with Matt Taylor from Numenta -- an organization whose mission is to lead a new era of machine intelligence and build computer systems around the principles of the brain. Matt shared his thoughts and insights on the open source community around their exciting projects. Find out what he says, and check out the Numenta community channel on Gitter. Tell us about a little bit about yourself and the Numenta community. How did it all begin?
Deep Learning for Internet of Things Using H2O
H2O is feature-rich open source machine learning platform known for its R and Spark integration and it's ease of use. This is an overview of using H2O deep learning for data science with the Internet of Things. H2O is an Open Source machine learning platform for smarter applications. At the Data Science for IoT course, we have been following H2O for features such as Open Source, R integration, Spark integration, Deep Learning and it's ease of use. This blog is authored by Sibanjan Das and Ajit Jaokar as part of our work at the Data Science for IoT course exploring H2O Deep Learning for Internet of Things.