Goto

Collaborating Authors

 Search


AMER: Automatic Behavior Modeling and Interaction Exploration in Recommender System

arXiv.org Machine Learning

User behavior and feature interactions are crucial in deep learning-based recommender systems. There has been a diverse set of behavior modeling and interaction exploration methods in the literature. Nevertheless, the design of task-aware recommender systems still requires feature engineering and architecture engineering from domain experts. In this work, we introduce AMER, namely Automatic behavior Modeling and interaction Exploration in Recommender systems with Neural Architecture Search (NAS). The core contributions of AMER include the three-stage search space and the tailored three-step searching pipeline. In the first step, AMER searches for residual blocks that incorporate commonly used operations in the block-wise search space of stage 1 to model sequential patterns in user behavior. In the second step, it progressively investigates useful low-order and high-order feature interactions in the non-sequential interaction space of stage 2. Finally, an aggregation multi-layer perceptron (MLP) with shortcut connection is selected from flexible dimension settings of stage~3 to combine features extracted from the previous steps. For efficient and effective NAS, AMER employs the one-shot random search in all three steps. Further analysis reveals that AMER's search space could cover most of the representative behavior extraction and interaction investigation methods, which demonstrates the universality of our design. The extensive experimental results over various scenarios reveal that AMER could outperform competitive baselines with elaborate feature engineering and architecture engineering, indicating both effectiveness and robustness of the proposed method.


Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

arXiv.org Machine Learning

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.


OpEvo: An Evolutionary Method for Tensor Operator Optimization

arXiv.org Machine Learning

Training and inference efficiency of deep neural networks highly rely on the performance of tensor operators on hardware platforms. Manually optimized tensor operators have limitations in terms of supporting new operators or supporting new hardware platforms. Therefore, automatically optimizing device code configurations of tensor operators is getting increasingly attractive. However, current methods for tensor operator optimization usually suffer from poor sample-efficiency due to the combinatorial search space. In this work, we propose a novel evolutionary method, OpEvo, which efficiently explores the search spaces of tensor operators by introducing a topology-aware mutation operation based on q-random walk distribution to leverage the topological structures over the search spaces. Our comprehensive experiment results show that OpEvo can find the best configuration with the least number of trials and the lowest variance compared with state-of-the-art methods. All code of this work is available online.


Adaptation Strategies for Automated Machine Learning on Evolving Data

arXiv.org Machine Learning

Abstract--Automated Machine Learning (AutoML) systems have been shown to efficiently build good models for new datasets. However, it is often not clear how well they can adapt when the data evolves over time. The main goal of this study is to understand the effect of data stream challenges such as concept drift on the performance of AutoML methods, and which adaptation strategies can be employed to make them more robust. To that end, we propose 6 concept drift adaptation strategies and evaluate their effectiveness on different AutoML approaches. We do this for a variety of AutoML approaches for building machine learning pipelines, including those that leverage Bayesian optimization, genetic programming, and random search with automated stacking. These are evaluated empirically on real-world and synthetic data streams with different types of concept drift. Based on this analysis, we propose ways to develop more sophisticated and robust AutoML techniques. We propose six different adaptation strategies data-driven decision making [42].


Exact and heuristic methods for the discrete parallel machine scheduling location problem

arXiv.org Artificial Intelligence

Scheduling and facility location represent two classes of well-studied combinatorial optimization problems. The main motivation for studying them relies on the broad range of applications (e.g., in public services, industry, logistics, project management, production planning, data processing, etc.), as well as on the challenge in providing efficient solutions, since many of these problems are classified as NPhard (see, e.g., Pinedo 2009, Pinedo 2016, Drezner and Hamacher 2002, and Laporte et al. 2015). Since the 1960s, many works on these topics have been published, but only a few of them has focused on studying these problems in an integrated fashion. Due to the limited capacity of the computers of two decades ago, it was usual to solve integrated combinatorial optimization problems using sequential approaches, i.e., solving each problem separately in such a way that the solution of one represents an input to the other. However, this strategy does not guarantee the optimality of the overall solution and, in addition, the input solutions may not be feasible for the successor problems. With the recent advances in technology, especially in the computational field, solving integrated combinatorial optimization problems using integrated approaches is becoming more accessible. In this context, the ScheLoc problem combines the job scheduling and facility location in a single and integrated problem.


Copy that! Editing Sequences by Copying Spans

arXiv.org Machine Learning

Neural sequence-to-sequence models are finding increasing use in editing of documents, for example in correcting a text document or repairing source code. In this paper, we argue that common seq2seq models (with a facility to copy single tokens) are not a natural fit for such tasks, as they have to explicitly copy each unchanged token. We present an extension of seq2seq models capable of copying entire spans of the input to the output in one step, greatly reducing the number of decisions required during inference. This extension means that there are now many ways of generating the same output, which we handle by deriving a new objective for training and a variation of beam search for inference that explicitly handle this problem. In our experiments on a range of editing tasks of natural language and source code, we show that our new model consistently outperforms simpler baselines.


Propositionalization and Embeddings: Two Sides of the Same Coin

arXiv.org Machine Learning

Data preprocessing is an important component of machine learning pipelines, which requires ample time and resources. An integral part of preprocessing is data transformation into the format required by a given learning algorithm. This paper outlines some of the modern data processing techniques used in relational learning that enable data fusion from different input data types and formats into a single table data representation, focusing on the propositionalization and embedding data transformation approaches. While both approaches aim at transforming data into tabular data format, they use different terminology and task definitions, are perceived to address different goals, and are used in different contexts. This paper contributes a unifying framework that allows for improved understanding of these two data transformation techniques by presenting their unified definitions, and by explaining the similarities and differences between the two approaches as variants of a unified complex data transformation task. In addition to the unifying framework, the novelty of this paper is a unifying methodology combining propositionalization and embeddings, which benefits from the advantages of both in solving complex data transformation and learning tasks. We present two efficient implementations of the unifying methodology: an instance-based PropDRM approach, and a feature-based PropStar approach to data transformation and learning, together with their empirical evaluation on several relational problems. The results show that the new algorithms can outperform existing relational learners and can solve much larger problems.


Neural Architecture Search with Reinforce and Masked Attention Autoregressive Density Estimators

#artificialintelligence

Neural Architecture Search has become a focus of the Machine Learning community. Techniques span Bayesian optimization with Gaussian priors, evolutionary learning, reinforcement learning based on policy gradient, Q-learning, and Monte-Carlo tree search. In this paper, we present a reinforcement learning algorithm based on policy gradient that uses an attention-based autoregressive model to design the policy network. We demonstrate how performance can be further improved by training an ensemble of policy networks with shared parameters, each network conditioned on a different autoregressive factorization order. On the NASBench-101 search space, it outperforms most algorithms in the literature, including random search.


Think out of the package: Recommending package types for e-commerce shipments

arXiv.org Machine Learning

Multiple product attributes like dimensions, weight, fragility, liquid content etc. determine the package type used by e-commerce companies to ship products. Sub-optimal package types lead to damaged shipments, incurring huge damage related costs and adversely impacting the company's reputation for safe delivery. Items can be shipped in more protective packages to reduce damage costs, however this increases the shipment costs due to expensive packaging and higher transportation costs. In this work, we propose a multi-stage approach that trades-off between shipment and damage costs for each product, and accurately assigns the optimal package type using a scalable, computationally efficient linear time algorithm. A simple binary search algorithm is presented to find the hyper-parameter that balances between the shipment and damage costs. Our approach when applied to choosing package type for Amazon shipments, leads to significant cost savings of tens of millions of dollars in emerging marketplaces, by decreasing both the overall shipment cost and the number of in-transit damages. Our algorithm is live and deployed in the production system where, package types for more than 130,000 products have been modified based on the model's recommendation, realizing a reduction in damage rate of 24%.


Learning Architectures from an Extended Search Space for Language Modeling

arXiv.org Machine Learning

Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture search system. For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB. Moreover, the learned architectures show good transferability to other systems. E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity recognition (NER) tasks and CoNLL chunking task, indicating a promising line of research on large-scale pre-learned architectures.