Country
Federated Learning for Ranking Browser History Suggestions
Hartmann, Florian, Suh, Sunah, Komarzewski, Arkadiusz, Smith, Tim D., Segall, Ilana
Federated Learning is a new subfield of machine learning that allows fitting models without collecting the training data itself. Instead of sharing data, users collaboratively train a model by only sending weight updates to a server. To improve the ranking of suggestions in the Firefox URL bar, we make use of Federated Learning to train a model on user interactions in a privacy-preserving way. This trained model replaces a handcrafted heuristic, and our results show that users now type over half a character less to find what they are looking for. To be able to deploy our system to real users without degrading their experience during training, we design the optimization process to be robust. To this end, we use a variant of Rprop for optimization, and implement additional safeguards. By using a numerical gradient approximation technique, our system is able to optimize anything in Firefox that is currently based on handcrafted heuristics. Our paper shows that Federated Learning can be used successfully to train models in privacy-respecting ways.
Comprehensive decision-strategy space exploration for efficient territorial planning strategies
Billaud, Olivier, Soubeyrand, Maxence, Luque, Sandra, Lenormand, Maxime
Comprehensive decision-strategy space exploration for efficient territorial planning strategies Olivier Billaud, 1, Maxence Soubeyrand, 1, Sandra Luque, 1 and Maxime Lenormand 1, † 1 TETIS, Univ Montpellier, AgroParisTech, Cirad, CNRS, Irstea, Montpellier, France Multi-Criteria Decision Analysis (MCDA) is a well-known decision support tool that can be used in a wide variety of contexts. It is particularly useful for territorial planning in situations where several actors with different, and sometimes contradictory, point of views have to take a decision regarding land use development. While the impact of the weights used to represent the relative importance of criteria has been widely studied in the recent literature, the impact of order weights determination have rarely been investigated. This paper presents a spatial sensitivity analysis to assess the impact of order weights determination in Multi-Criteria Analysis by Ordered Weighted Averaging. We propose a methodology based on an efficient exploration of the decision-strategy space defined by the level of risk and tradeoff in the decision process. We illustrate our approach with a land use planning process in the South of France. The objective is to find suitable areas for urban development while preserving green areas and their associated ecosystem services. The ecosystem service approach has indeed the potential to widen the scope of traditional landscape-ecological planning by including ecosystem-based benefits, including social and economic benefits, green infrastructures and biophysical parameters in urban and territorial planning. We show that in this particular case the decision-strategy space can be divided into four clusters. Each of them is associated with a map summarizing the average spatial suitability distribution used to identify potential areas for urban development.
Autoencoding undirected molecular graphs with neural networks
Olsen, Jeppe Johan Waarkjær, Christensen, Peter Ebert, Hansen, Martin Hangaard, Johansen, Alexander Rosenberg
We propose a machine learning model, inspired by language modeling from natural language processing, which can automatically correct molecules in discrete representations using a structure rule learned from a collection of undirected molecular graphs. Using discrete representations of molecules allows cheap, fast, and coarse grained insights. We introduce an adaption on a modern neural network architecture, the Transformer, which can learn relationships between atoms and bonds. The algorithm thereby solves the unsupervised task of recovering partially observed molecules represented as undirected graphs. This is to our knowledge, the first work that can automatically learn any discrete molecular structure rule with input exclusively consisting of a training set of molecules. In this work the neural network successfully approximates the octet rule, relations in hypervalent molecules and ions when trained on the ZINC and QM9 dataset. These results provides encouraging evidence that neural networks can learn advanced molecular structure rules and dataset specific properties, as the transformer surpasses a strong octet-rule baseline.
Study of Distributed Robust Beamforming with Low-Rank and Cross-Correlation Techniques
In this work, we present a novel robust distributed beamforming (RDB) approach based on low-rank and cross-correlation techniques. The proposed RDB approach mitigates the effects of channel errors in wireless networks equipped with relays based on the exploitation of the cross-correlation between the received data from the relays at the destination and the system output and low-rank techniques. The relay nodes are equipped with an amplify-and-forward (AF) protocol and the channel errors are modeled using an additive matrix perturbation, which results in degradation of the system performance. The proposed method, denoted low-rank and cross-correlation RDB (LRCC-RDB), considers a total relay transmit power constraint in the system and the goal of maximizing the output signal-to-interference-plus-noise ratio (SINR). We carry out a performance analysis of the proposed LRCC-RDB technique along with a computational complexity study. The proposed LRCC-RDB does not require any costly online optimization procedure and simulations show an excellent performance as compared to previously reported algorithms.
Consider ethical and social challenges in smart grid research
Robu, Valentin, Flynn, David, Andoni, Merlinda, Mokhtar, Maizura
Artificial Intelligence and Machine Learning are increasingly seen as key technologies for buildin g more decentralised and resilient energy grids, but researchers must consider the ethical and social implications of their use E nergy grids are undergoing rapid changes, requiring new ways both to process the large amounts of data generated from the power system, but also - increasingly - to take smart operational decisions [1]. On the data side, the UK and most EU countries have committed to a target of offering a smart meter to every home by 2020 [ 2 ], with similar monitoring being installed in other parts of the energy network. This has led to some to refer to a "data tsunami", requiri ng development of new machine learning techniques to deal with the e nsuing challenge of extracting useful information from this data - often in real time. Another trend is the use of AI techniques (such as those from multi - agent systems, computational gam e theory and decision making under uncertainty) to take autonomous allocation and control decisions. This is driven increasingly by the moves towards more decentralised energy systems, where prosumers (consumers with own micro - generation and storage) can g enerate and source their own electricity through peer - to - peer (P2P) trading in local energy markets and community energy schemes.
An Optimized and Energy-Efficient Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks
Zini, Julia El, Rizk, Yara, Awad, Mariette
Recurrent neural networks (RNN) have been successfully applied to various sequential decision-making tasks, natural language processing applications, and time-series predictions. Such networks are usually trained through back-propagation through time (BPTT) which is prohibitively expensive, especially when the length of the time dependencies and the number of hidden neurons increase. To reduce the training time, extreme learning machines (ELMs) have been recently applied to RNN training, reaching a 99\% speedup on some applications. Due to its non-iterative nature, ELM training, when parallelized, has the potential to reach higher speedups than BPTT. In this work, we present \opt, an optimized parallel RNN training algorithm based on ELM that takes advantage of the GPU shared memory and of parallel QR factorization algorithms to efficiently reach optimal solutions. The theoretical analysis of the proposed algorithm is presented on six RNN architectures, including LSTM and GRU, and its performance is empirically tested on ten time-series prediction applications. \opt~is shown to reach up to 845 times speedup over its sequential counterpart and to require up to 20x less time to train than parallel BPTT.
Domain-Aware Dynamic Networks
Zhang, Tianyuan, Wu, Bichen, Wang, Xin, Gonzalez, Joseph, Keutzer, Kurt
Deep neural networks with more parameters and FLOPs have higher capacity and generalize better to diverse domains. But to be deployed on edge devices, the model's complexity has to be constrained due to limited compute resource. In this work, we propose a method to improve the model capacity without increasing inference-time complexity. Our method is based on an assumption of data locality: for an edge device, within a short period of time, the input data to the device are sampled from a single domain with relatively low diversity. Therefore, it is possible to utilize a specialized, low-complexity model to achieve good performance in that input domain. T o leverage this, we propose Domain-aware Dynamic Network (DDN), which is a high-capacity dynamic network in which each layer contains multiple weights. During inference, based on the input domain, DDN dynamically combines those weights into one single weight that specializes in the given domain. This way, DDN can keep the inference-time complexity low but still maintain a high capacity. Experiments show that without increasing the parameters, FLOPs, and actual latency, DDN achieves up to 2.6% higher AP50 than a static network on the BDD100K object-detection benchmark.
City2City: Translating Place Representations across Cities
Yabe, Takahiro, Tsubouchi, Kota, Shimizu, Toru, Sekimoto, Yoshihide, Ukkusuri, Satish V.
Large mobility datasets collected from various sources have allowed us to observe, analyze, predict and solve a wide range of important urban challenges. In particular, studies have generated place representations (or embeddings) from mobility patterns in a similar manner to word embeddings to better understand the functionality of different places within a city. However, studies have been limited to generating such representations of cities in an individual manner and has lacked an inter-city perspective, which has made it difficult to transfer the insights gained from the place representations across different cities. In this study, we attempt to bridge this research gap by treating \textit{cities} and \textit{languages} analogously. We apply methods developed for unsupervised machine language translation tasks to translate place representations across different cities. Real world mobility data collected from mobile phone users in 2 cities in Japan are used to test our place representation translation methods. Translated place representations are validated using landuse data, and results show that our methods were able to accurately translate place representations from one city to another.
Label Dependent Deep Variational Paraphrase Generation
Shakeri, Siamak, Sethy, Abhinav
Generating paraphrases that are lexically similar but sema nti-cally different is a challenging task. Paraphrases of this f orm can be used to augment data sets for various NLP tasks such as machine reading comprehension and question answering with nontrivial negative examples. In this article, we pro - pose a deep variational model to generate paraphrases conditioned on a label that specifies whether the paraphrases are semantically related or not. We also present new training recipes and KL regularization techniques that improve the performance of variational paraphrasing models. Our pr o-posed model demonstrates promising results in enhancing th e generative power of the model by employing label-dependent generation on paraphrasing datasets.
Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization
Tran-The, Hung, Gupta, Sunil, Rana, Santu, Venkatesh, Svetha
Scaling Bayesian optimisation (BO) to high-dimensional search spaces is a active and open research problems particularly when no assumptions are made on function structure. The main reason is that at each iteration, BO requires to find global maximisation of acquisition function, which itself is a non-convex optimization problem in the original search space. With growing dimensions, the computational budget for this maximisation gets increasingly short leading to inaccurate solution of the maximisation. This inaccuracy adversely affects both the convergence and the efficiency of BO. We propose a novel approach where the acquisition function only requires maximisation on a discrete set of low dimensional subspaces embedded in the original high-dimensional search space. Our method is free of any low dimensional structure assumption on the function unlike many recent high-dimensional BO methods. Optimising acquisition function in low dimensional subspaces allows our method to obtain accurate solutions within limited computational budget. We show that in spite of this convenience, our algorithm remains convergent. In particular, cumulative regret of our algorithm only grows sub-linearly with the number of iterations. More importantly, as evident from our regret bounds, our algorithm provides a way to trade the convergence rate with the number of subspaces used in the optimisation. Finally, when the number of subspaces is "sufficiently large", our algorithm's cumulative regret is at most $\mathcal{O}^{*}(\sqrt{T\gamma_T})$ as opposed to $\mathcal{O}^{*}(\sqrt{DT\gamma_T})$ for the GP-UCB of Srinivas et al. (2012), reducing a crucial factor $\sqrt{D}$ where $D$ being the dimensional number of input space.