Goto

Collaborating Authors

 IBM Research


Bernoulli Embeddings for Graphs

AAAI Conferences

Just as semantic hashing can accelerate information retrieval, binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for nodes in a graph. By imagining the embeddings as independent coin flips of varying bias, continuous optimization techniques can be applied to the approximate expected loss. Embeddings optimized in this fashion consistently outperform the quantization of both spectral graph embeddings and various learned real-valued embeddings, on both ranking and pre-ranking tasks for a variety of datasets.


Water Advisor - A Data-Driven, Multi-Modal, Contextual Assistant to Help With Water Usage Decisions

AAAI Conferences

We demonstrate Water Advisor, a multi-modal assistant to help non-experts make sense of complex water quality data and apply it to their specific needs. A user can chat with the tool about water quality and activities of interest, and the system tries to advise using available water data for a location, applicable water regulations and relevant parameters using AI methods. Figure 1: Sample advisories - by EPA for Flint residents (left) and by state for visitors (right; Washington State).


Dataset Evolver: An Interactive Feature Engineering Notebook

AAAI Conferences

We present DATASET EVOLVER, an interactive Jupyter notebook-based tool to support data scientists perform feature engineering for classification tasks. It provides users with suggestions on new features to construct, based on automated feature engineering algorithms. Users can navigate the given choices in different ways, validate the impact, and selectively accept the suggestions. DATASET EVOLVER is a pluggable feature engineering framework where several exploration strategies could be added. It currently includes meta-learning based exploration and reinforcement learning based exploration. The suggested features are constructed using well-defined mathematical functions and are easily interpretable. Our system provides a mixed-initiative system of a user being assisted by an automated agent to efficiently and effectively solve the complex problem of feature engineering. It reduces the effort of a data scientist from hours to minutes.




WatsonPaths: Scenario-Based Question Answering and Inference over Unstructured Information

AI Magazine

We present WatsonPaths, a novel system that can answer scenario-based questions. These include medical questions that present a patient summary and ask for the most likely diagnosis or most appropriate treatment. WatsonPaths builds on the IBM Watson question answering system. WatsonPaths breaks down the input scenario into individual pieces of information, asks relevant subquestions of Watson to conclude new information, and represents these results in a graphical model. Probabilistic inference is performed over the graph to conclude the answer. On a set of medical test preparation questions, WatsonPaths shows a significant improvement in accuracy over multiple baselines.


Ethical Considerations in Artificial Intelligence Courses

AI Magazine

The recent surge in interest in ethics in artificial intelligence may leave many educators wondering how to address moral, ethical, and philosophical issues in their AI courses. As instructors we want to develop curriculum that not only prepares students to be artificial intelligence practitioners, but also to understand the moral, ethical, and philosophical impacts that artificial intelligence will have on society. In this article we provide practical case studies and links to resources for use by AI educators. We also provide concrete suggestions on how to integrate AI ethics into a general artificial intelligence course and how to teach a stand-alone artificial intelligence ethics course.


Compressed Path Databases with Ordered Wildcard Substitutions

AAAI Conferences

Compressed path databases (CPDs) are a state-of-the-art approach to path planning, a core AI problem. In the Grid-based Path Planning Competition, the CPD-based SRC path planning system was the fastest competitor with respect to both computing full optimal paths and computing the first moves of an optimal path. However, on large maps, CPDs can require a significant amount of memory, which can be a serious practical bottleneck. We present an approach that significantly reduces the size of a CPD. Our approach replaces part of the data encoded in a CPD with wildcards ("don’t care" symbols), maintaining the ability to compute optimal paths for all pairs of nodes of an undirected graph. We show that using wildcards in a way that maximizes the memory savings is NP-hard. We consider heuristics that achieve a good performance in practice. We implement our ideas on top of SRC and provide a detailed empirical analysis. Average memory savings can reach a factor of 2. Our first-k-moves lag (i.e., the time before knowing the first k optimal forward moves) increases, but it can be kept within competitive values. The speed of computing full optimal paths improves slightly.


New Results for the GEO-CAPE Observation Scheduling Problem

AAAI Conferences

A challenging Earth-observing satellite scheduling problem was recently studied in (Frank, Do and Tran 2016) for which the best resolution approach so far on the proposed benchmark is a time-indexed Mixed Integer Linear Program (MILP) formulation. This MILP formulation produces feasible solutions but is not able to prove optimality or to provide tight optimality gaps, making it difficult to assess the quality of existing solutions. In this paper, we first introduce an alternative disjunctive MILP formulation that manages to close more than half of the instances of the benchmark. This MILP formulation is then relaxed to provide good bounds on optimal values for the unsolved instances. We then propose a CP Optimizer model that consistently outperforms the original time-indexed MILP formulation, reducing the optimality gap by more than 4 times. This Constraint Programming (CP) formulation is very concise: we give its complete OPL implementation in the paper. Some improvements of this CP model are reported resulting in an approach that produces optimal or near-optimal solutions (optimality gap smaller than 1%) for about 80% of the instances. Unlike the MILP formulations, it is able to quickly produce good quality schedules and it is expected to be flexible enough to handle the changing requirements of the application.


Predicting Movie Genre Preferences from Personality and Values of Social Media Users

AAAI Conferences

We propose a novel technique to predict a user’s movie genre preference from her psycholinguistic attributes obtained from user social media interactions. In particular, we build machine learning based classification models that take user tweets as input to derive her psychological attributes: personality and value scores, and gives her movie genre preference as output. We train these models using user tweets in Twitter, and her reviews and ratings of movies of different genres in Internet movie database (IMDb). We exploit a key concept of psychology, i.e., an individual’s personality and values may influence her choice in performing different actions in real life. We have investigated how personality and values independently and collectively influence a user preference on different movie genres. Our proposed model can be used for recommending movies to social media users.