If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Human–Robot Interaction challenges Artificial Intelligence in many regards: dynamic, partially unknown environments that were not originally designed for robots; a broad variety of situations with rich semantics to understand and interpret; physical interactions with humans that requires fine, low-latency yet socially acceptable control strategies; natural and multi-modal communication which mandates common-sense knowledge and the representation of possibly divergent mental models. This article is an attempt to characterise these challenges and to exhibit a set of key decisional issues that need to be addressed for a cognitive robot to successfully share space and tasks with a human. We identify first the needed individual and collaborative cognitive skills: geometric reasoning and situation assessment based on perspective-taking and affordance analysis; acquisition and representation of knowledge models for multiple agents (humans and robots, with their specificities); situated, natural and multi-modal dialogue; human-aware task planning; human–robot joint task achievement. The article discusses each of these abilities, presents working implementations, and shows how they combine in a coherent and original deliberative architecture for human–robot interaction. Supported by experimental results, we eventually show how explicit knowledge management, both symbolic and geometric, proves to be instrumental to richer and more natural human–robot interactions by pushing for pervasive, human-level semantics within the robot's deliberative system.
The word2vec method based on skip-gram with negative sampling (Mikolov et al., 2013)  was published in 2013 and had a large impact on the field, mainly through its accompanying software package, which enabled efficient training of dense word representations and a straightforward integration into downstream models. In some respects, we have come far since then: Word embeddings have established themselves as an integral part of Natural Language Processing (NLP) models. In other aspects, we might as well be in 2013 as we have not found ways to pre-train word embeddings that have managed to supersede the original word2vec. This post will focus on the deficiencies of word embeddings and how recent approaches have tried to resolve them. If not otherwise stated, this post discusses pre-trained word embeddings, i.e. word representations that have been learned on a large corpus using word2vec and its variants.
Sure, Facebook has "M", Google has "Google Now", and Siri's voice isn't always that of a woman. But it does feel worth noting that (typically male-dominated) engineering groups routinely give women's names to the things you issue commands to. Is artificial intelligence work about Adams making Eves? The response to this critique is usually about the voices people trust and find easy to understand. Adrienne LaFrance over at The Atlantic does a good job discussing those points, so go read her article.
This reminds me a bit of what /u/sixwings used to say. I think the idea was that (most) neural networks were still basically just rule-based systems and that they all used supervised learning (even the reinforcement/unsupervised learning ones). I will also note that often the network's inputs and outputs are symbolic in the sense that we associate them with local and (somewhat) interpretable meanings (although this is a bit more debatable for things like pixels). Under all of this lies a question of what "a GOFAI approach" is. Neural networks have certainly been around for a very long time, so someone could say they're old(-fashioned), good and AI...
There is a dance--precisely choreographed and executed--that we perform throughout our lives. This is the dance formed by our movements. Our movements are our actions and the final outcome of our decision making processes. Single actions are built into reusable sequences, sequences are composed into complex routines, routines are arranged into elegant choreographies, and so the complexity of human action is realised. This synergy, the composition of actions into increasingly complex units, suggests the desirability of a modular and hierarchical approach to the selection and execution of actions.
In July 2015, Google's public-relations machine was in full-on crisis mode. Earlier that year, the search giant announced Photos, an AI-driven app that used machine-learning to automatically tag and organize your pictures based on the people, places and things depicted in them. It was an exciting step forward, but Photos wasn't perfect. While the app was capable of recognizing some faces, it mistook others. It would have been easy to pass this off as a routine software bug if it weren't for the nature of the failure.
We are awash with text, from books, papers, blogs, tweets, news, and increasingly text from spoken utterances. Working with text is hard as it requires drawing upon knowledge from diverse domains such as linguistics, machine learning, statistical methods, and these days, deep learning. Deep learning methods are starting to out-compete the classical and statistical methods on some challenging natural language processing problems with singular and simpler models. In this crash course, you will discover how you can get started and confidently develop deep learning for natural language processing problems using Python in 7 days. This is a big and important post.
Accepted to be published in IEEE Access (2017). We propose the use of a coupled 3D Convolutional Neural Network (CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features. The proposed architecture will incorporate both spatial and temporal information jointly to effectively find the correlation between temporal information for different modalities. By using a relatively small network architecture and much smaller dataset for training, our proposed method surpasses the performance of the existing similar methods for audio-visual matching which use CNNs for feature representation. We also demonstrate that an effective pair selection method can significantly increase the performance.
The content reconstructions from lower layers (a,b,c) are almost exact replicas of the original image. In the higher layers of the network however, the detailed pixel information is lost while the high-level structures and details remain the same (d,e). Then, the style representation draws connections between the different features in different layers of the CNN. This creates images that match the style on an increasing scale as you move through the network's hierarchy.
In machine learning, there are many ways to build a product or solution and each way assumes something different. Many times, it's not obvious how to navigate and identify which assumptions are reasonable. People new to machine learning make mistakes, which in hindsight will often feel silly. I've created a list of the top mistakes that novice machine learning engineers make. Hopefully, you can learn from these common errors and create more robust solutions that bring real value.