Collaborating Authors

A Survey of Available Corpora for Building Data-Driven Dialogue Systems Artificial Intelligence

During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective.

Alphabet's Next Billion-Dollar Business: 10 Industries To Watch - CB Insights Research


Alphabet is using its dominance in the search and advertising spaces -- and its massive size -- to find its next billion-dollar business. From healthcare to smart cities to banking, here are 10 industries the tech giant is targeting. With growing threats from its big tech peers Microsoft, Apple, and Amazon, Alphabet's drive to disrupt has become more urgent than ever before. The conglomerate is leveraging the power of its first moats -- search and advertising -- and its massive scale to find its next billion-dollar businesses. To protect its current profits and grow more broadly, Alphabet is edging its way into industries adjacent to the ones where it has already found success and entering new spaces entirely to find opportunities for disruption. Evidence of Alphabet's efforts is showing up in several major industries. For example, the company is using artificial intelligence to understand the causes of diseases like diabetes and cancer and how to treat them. Those learnings feed into community health projects that serve the public, and also help Alphabet's effort to build smart cities. Elsewhere, Alphabet is using its scale to build a better virtual assistant and own the consumer electronics software layer. It's also leveraging that scale to build a new kind of Google Pay-operated checking account. In this report, we examine how Alphabet and its subsidiaries are currently working to disrupt 10 major industries -- from electronics to healthcare to transportation to banking -- and what else might be on the horizon. Within the world of consumer electronics, Alphabet has already found dominance with one product: Android. Mobile operating system market share globally is controlled by the Linux-based OS that Google acquired in 2005 to fend off Microsoft and Windows Mobile. Today, however, Alphabet's consumer electronics strategy is being driven by its work in artificial intelligence. Google is building some of its own hardware under the Made by Google line -- including the Pixel smartphone, the Chromebook, and the Google Home -- but the company is doing more important work on hardware-agnostic software products like Google Assistant (which is even available on iOS).

Welcome to the Hotel California of Artificial Intelligence


In 1977, the Eagles released "Hotel California", a song about drugs and the effects an addiction has on people. Putting "We are all just prisoners here, of our own device" in the context of our today's digital lifestyle we find a lot of truth. There is a reason why Google provides most of its services for free or why Amazon wants us to have an Echo in every home or why Facebook has become our directory of "friends". What looks pretty convenient is a threat. It is a threat to the end consumers but also a threat to the established economy.

Nonparametric Bayesian Topic Modelling with the Hierarchical Pitman-Yor Processes Machine Learning

The Dirichlet process and its extension, the Pitman-Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them to latent variable models for text analytics. In particular, we propose a general framework for designing these Bayesian models, which are called topic models in the computer science community. We then propose a specific nonparametric Bayesian topic model for modelling text from social media. We focus on tweets (posts on Twitter) in this article due to their ease of access. We find that our nonparametric model performs better than existing parametric models in both goodness of fit and real world applications.