Goto

Collaborating Authors

 Xu, Hong


Decision Tree Learning-Inspired Dynamic Variable Ordering for the Weighted CSP

AAAI Conferences

The weighted constraint satisfaction problem (WCSP) is a powerful mathematical framework for combinatorial optimization. The branch and bound search paradigm is very successful in solving the WCSP but critically depends on the ordering in which variables are instantiated. In this paper, we introduce a new framework for dynamic variable ordering for solving the WCSP. This framework is inspired by regression decision tree learning. Variables are ordered dynamically based on samples of random assignments of values to variables as well as their corresponding total weights. Within this framework, we propose four variable ordering heuristics (sdr, inv-sdr, rr and inv-rr). We compare them with many other state-of-the-art dynamic variable ordering heuristics, and show that sdr and rr outperform them on many real-world and random benchmark instances.


Model Asset eXchange: Path to Ubiquitous Deep Learning Deployment

arXiv.org Machine Learning

A recent trend observed in traditionally challenging fields such as computer vision and natural language processing has been the significant performance gains shown by deep learning (DL). In many different research fields, DL models have been evolving rapidly and become ubiquitous. Despite researchers' excitement, unfortunately, most software developers are not DL experts and oftentimes have a difficult time following the booming DL research outputs. As a result, it usually takes a significant amount of time for the latest superior DL models to prevail in industry. This issue is further exacerbated by the common use of sundry incompatible DL programming frameworks, such as Tensorflow, PyTorch, Theano, etc. To address this issue, we propose a system, called Model Asset Exchange (MAX), that avails developers of easy access to state-of-the-art DL models. Regardless of the underlying DL programming frameworks, it provides an open source Python library (called the MAX framework) that wraps DL models and unifies programming interfaces with our standardized RESTful APIs. These RESTful APIs enable developers to exploit the wrapped DL models for inference tasks without the need to fully understand different DL programming frameworks. Using MAX, we have wrapped and open-sourced more than 30 state-of-the-art DL models from various research fields, including computer vision, natural language processing and signal processing, etc. In the end, we selectively demonstrate two web applications that are built on top of MAX, as well as the process of adding a DL model to MAX.


Saec: Similarity-Aware Embedding Compression in Recommendation Systems

arXiv.org Machine Learning

Production recommendation systems rely on embedding methods to represent various features. An impeding challenge in practice is that the large embedding matrix incurs substantial memory footprint in serving as the number of features grows over time. We propose a similarity-aware embedding matrix compression method called Saec to address this challenge. Saec clusters similar features within a field to reduce the embedding matrix size. Saec also adopts a fast clustering optimization based on feature frequency to drastically improve clustering time. We implement and evaluate Saec on Numerous, the production distributed machine learning system in Tencent, with 10-day worth of feature data from QQ mobile browser. Testbed experiments show that Saec reduces the number of embedding vectors by two orders of magnitude, compresses the embedding size by ~27x, and delivers the same AUC and log loss performance.


Stanza: Layer Separation for Distributed Training in Deep Learning

arXiv.org Machine Learning

The parameter server architecture is prevalently used for distributed deep learning. Each worker machine in a parameter server system trains the complete model, which leads to a hefty amount of network data transfer between workers and servers. We empirically observe that the data transfer has a non-negligible impact on training time. To tackle the problem, we design a new distributed training system called Stanza. Stanza exploits the fact that in many models such as convolution neural networks, most data exchange is attributed to the fully connected layers, while most computation is carried out in convolutional layers. Thus, we propose layer separation in distributed training: the majority of the nodes just train the convolutional layers, and the rest train the fully connected layers only. Gradients and parameters of the fully connected layers no longer need to be exchanged across the cluster, thereby substantially reducing the data transfer volume. We implement Stanza on PyTorch and evaluate its performance on Azure and EC2. Results show that Stanza accelerates training significantly over current parameter server systems: on EC2 instances with Tesla V100 GPU and 10Gb bandwidth for example, Stanza is 1.34x--13.9x faster for common deep learning models.


Learning Embeddings of Directed Networks with Text-Associated Nodes---with Applications in Software Package Dependency Networks

arXiv.org Machine Learning

A network embedding consists of a vector representation for each node in the network. Network embeddings have shown their usefulness in node classification and visualization in many real-world application domains, such as social networks and web networks. While directed networks with text associated with each node, such as citation networks and software package dependency networks, are commonplace, to the best of our knowledge, their embeddings have not been specifically studied. In this paper, we create PCTADW-1 and PCTADW-2, two algorithms based on NNs that learn embeddings of directed networks with text associated with each node. We create two new labeled directed networks with text-associated node: The package dependency networks in two popular GNU/Linux distributions, Debian and Fedora. We experimentally demonstrate that the embeddings produced by our NNs resulted in node classification with better quality than those of various baselines on these two networks. We observe that there exist systematic presence of analogies (similar to those in word embeddings) in the network embeddings of software package dependency networks. To the best of our knowledge, this is the first time that such a systematic presence of analogies is observed in network and document embeddings. This may potentially open up a new venue for better understanding networks and documents algorithmically using their embeddings as well as for better human understanding of network and document embeddings.


Measuring Territorial Control in Civil Wars Using Hidden Markov Models: A Data Informatics-Based Approach

arXiv.org Machine Learning

Territorial control is a key aspect shaping the dynamics of civil war. Despite its importance, we lack data on territorial control that are fine-grained enough to account for subnational spatio-temporal variation and that cover a large set of conflicts. To resolve this issue, we propose a theoretical model of the relationship between territorial control and tactical choice in civil war and outline how Hidden Markov Models (HMMs) are suitable to capture theoretical intuitions and estimate levels of territorial control. We discuss challenges of using HMMs in this application and mitigation strategies for future work.


An Efficient Implementation of Belief Function Propagation

arXiv.org Artificial Intelligence

The local computation technique (Shafer et al. 1987, Shafer and Shenoy 1988, Shenoy and Shafer 1986) is used for propagating belief functions in so called a Markov Tree. In this paper, we describe an efficient implementation of belief function propagation on the basis of the local computation technique. The presented method avoids all the redundant computations in the propagation process, and so makes the computational complexity decrease with respect to other existing implementations (Hsia and Shenoy 1989, Zarley et al. 1988). We also give a combined algorithm for both propagation and re-propagation which makes the re-propagation process more efficient when one or more of the prior belief functions is changed.


A Decision Calculus for Belief Functions in Valuation-Based Systems

arXiv.org Artificial Intelligence

Valuation-based system (VBS) provides a general framework for representing knowledge and drawing inferences under uncertainty. Recent studies have shown that the semantics of VBS can represent and solve Bayesian decision problems (Shenoy, 1991a). The purpose of this paper is to propose a decision calculus for Dempster-Shafer (D-S) theory in the framework of VBS. The proposed calculus uses a weighting factor whose role is similar to the probabilistic interpretation of an assumption that disambiguates decision problems represented with belief functions (Strat 1990). It will be shown that with the presented calculus, if the decision problems are represented in the valuation network properly, we can solve the problems by using fusion algorithm (Shenoy 1991a). It will also be shown the presented decision calculus can be reduced to the calculus for Bayesian probability theory when probabilities, instead of belief functions, are given.


A Belief-Function Based Decision Support System

arXiv.org Artificial Intelligence

In this paper, we present a decision support system based on belief functions and the pignistic transformation. The system is an integration of an evidential system for belief function propagation and a valuation-based system for Bayesian decision analysis. The two subsystems are connected through the pignistic transformation. The system takes as inputs the user's "gut feelings" about a situation and suggests what, if any, are to be tested and in what order, and it does so with a user friendly interface.