Institute of Software, Chinese Academy of Sciences
NuMWVC: A Novel Local Search for Minimum Weighted Vertex Cover Problem
Li, Ruizhi (Jilin University of Finance and Economics) | Cai, Shaowei (Institute of Software, Chinese Academy of Sciences) | Hu, Shuli (Northeast Normal University) | Yin, Minghao (Northeast Normal University) | Gao, Jian (Dalian Maritime University)
The minimum weighted vertex cover (MWVC) problem is a well known combinatorial optimization problem with important applications. This paper introduces a novel local search algorithm called NuMWVC for MWVC based on three ideas. First, four reduction rules are introduced during the initial construction phase. Second, the configuration checking with aspiration is proposed to reduce cycling problem. Moreover, a self-adaptive vertex removing strategy is proposed to save time.
Information-Theoretic Domain Adaptation Under Severe Noise Conditions
Wang, Wei (Institute of Software, Chinese Academy of Sciences) | Wang, Hao (360 Search Lab, Qihoo 360) | Ran, Zhi-Yong (Chongqing University of Posts and Telecommunications) | He, Ran (Institute of Automation, Chinese Academy of Sciences)
Cross-domain data reconstruction methods derive a shared transformation across source and target domains. These methods usually make a specific assumption on noise, which exhibits limited ability when the target data are contaminated by different kinds of complex noise in practice. To enhance the robustness of domain adaptation under severe noise conditions, this paper proposes a novel reconstruction based algorithm in an information-theoretic setting. Specifically, benefiting from the theoretical property of correntropy, the proposed algorithm is distinguished with: detecting the contaminated target samples without making any specific assumption on noise; greatly suppressing the negative influence of noise on cross-domain transformation. Moreover, a relative entropy based regularization of the transformation is incorporated to avoid trivial solutions with the reaped theoretic advantages, i.e., non-negativity and scale-invariance. For optimization, a half-quadratic technique is developed to minimize the non-convex information-theoretic objectives with explicitly guaranteed convergence. Experiments on two real-world domain adaptation tasks demonstrate the superiority of our method.
Distant Supervision via Prototype-Based Global Representation Learning
Han, Xianpei (Institute of Software, Chinese Academy of Sciences) | Sun, Le (Institute of Software, Chinese Academy of Sciences)
Distant supervision (DS) is a promising technique for relation extraction. Currently, most DS approaches build relation extraction models in local instance feature space, often suffer from the multi-instance problem and the missing label problem. In this paper, we propose a new DS method — prototype-based global representation learning, which can effectively resolve the multi-instance problem and the missing label problem by learning informative entity pair representations, and building discriminative extraction models at the entity pair level, rather than at the instance level. Specifically, we propose a prototype-based embedding algorithm, which can embed entity pairs into a prototype-based global feature space; we then propose a neural network model, which can classify entity pairs into target relation types by summarizing relevant information from multiple instances. Experimental results show that our method can achieve significant performance improvement over traditional DS methods.
Exploring Efficient Strategies for Minesweeper
Tu, Jinzheng (Tsinghua University) | Li, Tianhong (Tsinghua University) | Chen, Shiteng (Institute of Software, Chinese Academy of Sciences) | Zu, Chong (University of California, Berkeley) | Gu, Zhaoquan (The University of Hong Kong)
Minesweeper is a famous single-player computer game, in which the grid of blocks contains some mines and the player is to uncover (probe) all blocks that do not contain any mines. Many heuristic strategies have been prompted to play the game, but the rate of success is not high. In this paper, we explore efficient strategies for the Minesweeper game. First, we show a counterintuitive result that probing the corner blocks could increase the rate of success. Then, we present a series of heuristic strategies, and the combination of them could lead to better results. We also transplant the optimal procedure on the basis of our proposed methods, and it achieves the highest rate of success. Through extensive simulations, a combination of heuristic strategies, "PSEQ", yields a success rate of 81.627(8)%, 78.122(8)%, and 39.616(5)% for beginner, intermediate, and expert levels respectively, outperforming the state-of-the-art strategies. Moreover, the developed quasi-optimal methods, combining the optimal procedure and our heuristic methods, raise the success rate to at least 81.79(2)%, 78.22(3)%, and 40.06(2)% respectively.
Global Distant Supervision for Relation Extraction
Han, Xianpei (Institute of Software, Chinese Academy of Sciences) | Sun, Le (Institute of Software, Chinese Academy of Sciences)
Machine learning approaches to relation extraction are typically supervised and require expensive labeled data. To break the bottleneck of labeled data, a promising approach is to exploit easily obtained indirect supervision knowledge – which we usually refer to as distant supervision (DS). However, traditional DS methods mostly only exploit one specific kind of indirect supervision knowledge – the relations/facts in a given knowledge base, thus often suffer from the problem of lack of supervision. In this paper, we propose a global distant supervision model for relation extraction, which can: 1) compensate the lack of supervision with a wide variety of indirect supervision knowledge; and 2) reduce the uncertainty in DS by performing joint inference across relation instances. Experimental results show that, by exploiting the consistency between relation labels, the consistency between relations and arguments, and the consistency between neighbor instances using Markov logic, our method significantly outperforms traditional DS approaches.
Two Efficient Local Search Algorithms for Maximum Weight Clique Problem
Wang, Yiyuan (Northeast Normal University) | Cai, Shaowei (Institute of Software, Chinese Academy of Sciences) | Yin, Minghao (Northeast Normal University)
The Maximum Weight Clique problem (MWCP) is an important generalization of the Maximum Clique problem with wide applications. This paper introduces two heuristics and develops two local search algorithms for MWCP. Firstly, we propose a heuristic called strong configuration checking (SCC), which is a new variant of a recent powerful strategy called configuration checking (CC) for reducing cycling in local search. Based on the SCC strategy, we develop a local search algorithm named LSCC. Moreover, to improve the performance on massive graphs, we apply a low-complexity heuristic called Best from Multiple Selection (BMS) to select the swapping vertex pair quickly and effectively. The BMS heuristic is used to improve LSCC, resulting in the LSCC+BMS algorithm. Experiments show that the proposed algorithms outperform the state-of-the-art local search algorithm MN/TS and its improved version MN/TS+BMS on the standard benchmarks namely DIMACS and BHOSLIB, as well as a wide range of real world massive graphs.
A Joint Model for Entity Set Expansion and Attribute Extraction from Web Search Queries
Zhang, Zhenzhong (Institute of Software, Chinese Academy of Sciences) | Sun, Le (Institute of Software, Chinese Academy of Sciences) | Han, Xianpei (Institute of Software, Chinese Academy of Sciences)
Entity Set Expansion (ESE) and Attribute Extraction (AE) are usually treated as two separate tasks in Information Extraction (IE). However, the two tasks are tightly coupled, and each task can benefit significantly from the other by leveraging the inherent relationship between entities and attributes. That is, 1) an attribute is important if it is shared by many typical entities of a class; 2) an entity is typical if it owns many important attributes of a class. Based on this observation, we propose a joint model for ESE and AE, which models the inherent relationship between entities and attributes as a graph. Then a graph reinforcement algorithm is proposed to jointly mine entities and attributes of a specific class. Experimental results demonstrate the superiority of our method for discovering both new entities and new attributes.
Preference Planning for Markov Decision Processes
Li, Meilun (Beihang University) | She, Zhikun (Beihang University) | Turrini, Andrea (Institute of Software, Chinese Academy of Sciences) | Zhang, Lijun (Institute of Software, Chinese Academy of Sciences)
The classical planning problem can be enriched with quantitative and qualitative user-defined preferences on how the system behaves on achieving the goal. In this paper, we propose the probabilistic preference planning problem for Markov decision processes, where the preferences are based on an enriched probabilistic LTL-style logic. We develop P4Solver, an SMT-based planner computing the preferred plan by reducing the problem to quadratic programming problem, which can be solved using SMT solvers such as Z3. We illustrate the framework by applying our approach on two selected case studies.
Transfer Feature Representation via Multiple Kernel Learning
Wang, Wei (Institute of Software, Chinese Academy of Sciences) | Wang, Hao (Institute of Software, Chinese Academy of Sciences) | Zhang, Chen (Institute of Software, Chinese Academy of Sciences) | Xu, Fanjiang (Institute of Software, Chinese Academy of Sciences)
Learning an appropriate feature representation across source and target domains is one of the most effective solutions to domain adaptation problems. Conventional cross-domain feature learning methods rely on the Reproducing Kernel Hilbert Space (RKHS) induced by a single kernel. Recently, Multiple Kernel Learning (MKL), which bases classifiers on combinations of kernels, has shown improved performance in the tasks without distribution difference between domains. In this paper, we generalize the framework of MKL for cross-domain feature learning and propose a novel Transfer Feature Representation (TFR) algorithm. TFR learns a convex combination of multiple kernels and a linear transformation in a single optimization which integrates the minimization of distribution difference with the preservation of discriminating power across domains. As a result, standard machine learning models trained in the source domain can be reused for the target domain data. After rewritten into a differentiable formulation, TFR can be optimized by a reduced gradient method and reaches the convergence. Experiments in two real-world applications verify the effectiveness of our proposed method.
DynaDiffuse: A Dynamic Diffusion Model for Continuous Time Constrained Influence Maximization
Xie, Miao (University of Chinese Academy of Sciences) | Yang, Qiusong (Institute of Software, Chinese Academy of Sciences) | Wang, Qing (Institute of Software, Chinese Academy of Sciences) | Cong, Gao (Nanyang Technological University) | Melo, Gerard de (Tsinghua University/Microsoft Research Asia)
Studying the spread of phenomena in social networks is critical but still not fully solved. Existing influence maximization models assume a static network, disregarding its evolution over time. We introduce the continuous time constrained influence maximization problem for dynamic diffusion networks, based on a novel diffusion model called DynaDiffuse. Although the problem is NP-hard, the influence spread functions are monotonic and submodular, enabling fast approximations on top of an innovative stochastic model checking approach. Experiments on real social network data show that our model finds higher quality solutions and our algorithm outperforms state-of-art alternatives.