Goto

Collaborating Authors

 Country


Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms

arXiv.org Machine Learning

In this paper, we propose an adaptive stopping rule for kernel-based gradient descent (KGD) algorithms. We introduce the empirical effective dimension to quantify the increments of iterations in KGD and derive an implementable early stopping strategy. We analyze the performance of the adaptive stopping rule in the framework of learning theory. Using the recently developed integral operator approach, we rigorously prove the opti-mality of the adaptive stopping rule in terms of showing the optimal learning rates for KGD equipped with this rule. Furthermore, a sharp bound on the number of iterations in KGD equipped with the proposed early stopping rule is also given to demonstrate its computational advantage. Introduction In financial studies, clinical medicine, gene analysis and engineering applications, data of input-output pairs are collected to pursue the relation between input and output. Kernel methods [14], which map input points from the input space to some feature space and then synthesize the estimator in the feature space, have been widely used for this purpose. The research was supported by the National Natural Science Foundation of China [Grant Nos. To overcome their computational and storage bottlenecks, numerous techniques (e,g, the distributed learning [41, 22], localized learning [27] and random sketching [17, 39]) have been further developed to equip kernel methods in the recent era of big data. KGD, as a popular realization of kernel methods, succeeds in avoiding the saturation of KRR in theory [16], reducing the computational burden of KPCA in computation [13], and benefiting in scalability when compared with KPLS and KCG [23]. Therefore, it has been widely used in regression [40], classification [37] and minimum error entropy principle analysis [21].


D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multiple High-dimensional Datasets

arXiv.org Machine Learning

Such studies include The Cancer Genome Atlas (TCGA; Hoadley et al., 2018) with multi-platform genomic data for tumor samples, and Human Connectome Project (HCP; Van Essen et al., 2013) with multi-modal brain images of healthy adults, among many others (Crawford et al., 2016; Jensen et al., 2017). The use of multiple data types can allow us to enhance understanding the etiology of many complex diseases, such as cancers (Ciriello et al., 2015; Campbell et al., 2018) and neurodegenerative diseases (Weiner et al., 2013; Saeed et al., 2017). Researchers hence have became highly interested in studying the shared information and individual features across multi-type datasets through separating their common and distinctive variation structures (van der Kloet et al., 2016; Smilde et al., 2017; Li et al., 2018). Let Y k R p k n be the k -th row-mean centered dataset obtained on a common set of n objects for k 1,...,K, where p k is the number of variables for the k -th dataset. One popular approach for disentangling their common and distinctive variation structures is to decompose each data matrix into Y k X k E k C k D k E k for k 1,...,K, (1) where { X k} K k 1 are low-rank signal matrices with { E k} K k 1 being additive noise matrices, { C k} K k 1 are low-rank common-variation matrices that represent the signal data coming from the common mechanism shared across all datasets, and { D k} K k 1are low-rank distinctive-variation matrices each from the distinctive mechanism of each single dataset that is not shared by all.


Streaming automatic speech recognition with the transformer model

arXiv.org Machine Learning

Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR). Recently, the transformer architecture, which uses self-attention to model temporal context information, has been shown to achieve significantly lower word error rates (WERs) compared to recurrent neural network (RNN) based system architectures. Despite its success, the practical usage is limited to offline ASR tasks, since encoder-decoder architectures typically require an entire speech utterance as input. In this work, we propose a transformer based end-to-end ASR system for streaming ASR, where an output must be generated shortly after each spoken word. To achieve this, we apply time-restricted self-attention for the encoder and triggered attention for the encoder-decoder attention mechanism. Our proposed streaming transformer architecture achieves 2.7% and 7.0% WER for the "clean" and "other" test data of LibriSpeech, which to our knowledge is the best published streaming end-to-end ASR result for this task.


MushroomRL: Simplifying Reinforcement Learning Research

arXiv.org Machine Learning

MushroomRL is an open-source Python library developed to simplify the process of implementing and running Reinforcement Learning (RL) experiments. Compared to other available libraries, MushroomRL has been created with the purpose of providing a comprehensive and flexible framework to minimize the effort in implementing and testing novel RL methodologies. Indeed, the architecture of MushroomRL is built in such a way that every component of an RL problem is already provided, and most of the time users can only focus on the implementation of their own algorithms and experiments. The result is a library from which RL researchers can significantly benefit in the critical phase of the empirical analysis of their works. MushroomRL stable code, tutorials and documentation can be found at https://github.com/MushroomRL/mushroom-rl.


Non-Parametric Learning of Lifted Restricted Boltzmann Machines

arXiv.org Artificial Intelligence

We consider the problem of discriminatively learning restricted Boltzmann machines in the presence of relational data. Unlike previous approaches that employ a rule learner (for structure learning) and a weight learner (for parameter learning) sequentially, we develop a gradient-boosted approach that performs both simultaneously. Our approach learns a set of weak relational regression trees, whose paths from root to leaf are conjunctive clauses and represent the structure, and whose leaf values represent the parameters. When the learned relational regression trees are transformed into a lifted RBM, its hidden nodes are precisely the conjunctive clauses derived from the relational regression trees. This leads to a more interpretable and explainable model. Our empirical evaluations clearly demonstrate this aspect, while displaying no loss in effectiveness of the learned models.


Knowledge Graphs for Innovation Ecosystems

arXiv.org Artificial Intelligence

Innovation ecosystems can be naturally described as a collection of networked entities, such as experts, institutions, projects, technologies and products. Representing in a machine-readable form these entities and their relations is not entirely attainable, due to the existence of abstract concepts such as knowledge and due to the confidential, non-public nature of this information, but even its partial depiction is of strong interest. The representation of innovation ecosystems incarnated as knowledge graphs would enable the generation of reports with new insights, the execution of advanced data analysis tasks. An ontology to capture the essential entities and relations is presented, as well as the description of data sources, which can be used to populate innovation knowledge graphs. Finally, the application case of the Universidad Politecnica de Madrid is presented, as well as an insight of future applications.


Algorithms for Optimizing Fleet Staging of Air Ambulances

arXiv.org Artificial Intelligence

In a disaster situation, air ambulance rapid response will often be the determining factor in patient survival. Obstacles intensify this circumstance, with geographical remoteness and limitations in vehicle placement making it an arduous task. Considering these elements, the arrangement of responders is a critical decision of the utmost importance. Utilizing real mission data, this research structured an optimal coverage problem with integer linear programming. For accurate comparison, the Gurobi optimizer was programmed with the developed model and timed for performance. A solution implementing base ranking followed by both local and Tabu search-based algorithms was created. The local search algorithm proved insufficient for maximizing coverage, while the Tabu search achieved near-optimal results. In the latter case, the total vehicle travel distance was minimized and the runtime significantly outperformed the one generated by Gurobi. Furthermore, variations utilizing parallel CUDA processing further decreased the algorithmic runtime. These proved superior as the number of test missions increased, while also maintaining the same minimized distance.


Debate Dynamics for Human-comprehensible Fact-checking on Knowledge Graphs

arXiv.org Artificial Intelligence

We propose a novel method for fact-checking on knowledge graphs based on debate dynamics. The underlying idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments -- paths in the knowledge graph -- with the goal to justify the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, referred to as the judge, decides whether the fact is true or false. The two agents can be considered as sparse feature extractors that present interpretable evidence for either the thesis or the antithesis. In contrast to black-box methods, the arguments enable the user to gain an understanding for the decision of the judge. Moreover, our method allows for interactive reasoning on knowledge graphs where the users can raise additional arguments or evaluate the debate taking common sense reasoning and external information into account. Such interactive systems can increase the acceptance of various AI applications based on knowledge graphs and can further lead to higher efficiency, robustness, and fairness.


A Probabilistic Simulator of Spatial Demand for Product Allocation

arXiv.org Artificial Intelligence

Connecting consumers with relevant products is a very important problem in both online and offline commerce. In physical retail, product placement is an effective way to connect consumers with products. However, selecting product locations within a store can be a tedious process. Moreover, learning important spatial patterns in offline retail is challenging due to the scarcity of data and the high cost of exploration and experimentation in the physical world. To address these challenges, we propose a stochastic model of spatial demand in physical retail. We show that the proposed model is more predictive of demand than existing baselines. We also perform a preliminary study into different automation techniques and show that an optimal product allocation policy can be learned through Deep Q-Learning.


GRIDS: Interactive Layout Design with Integer Programming

arXiv.org Artificial Intelligence

Grid layouts are used by designers to spatially organise user interfaces when sketching and wireframing. However, their design is largely time consuming manual work. This is challenging due to combinatorial explosion and complex objectives, such as alignment, balance, and expectations regarding positions. This paper proposes a novel optimisation approach for the generation of diverse grid-based layouts. Our mixed integer linear programming (MILP) model offers a rigorous yet efficient method for grid generation that ensures packing, alignment, grouping, and preferential positioning of elements. Further, we present techniques for interactive diversification, enhancement, and completion of grid layouts (Figure 1). These capabilities are demonstrated using GRIDS1, a wireframing tool that provides designers with real-time layout suggestions. We report findings from a ratings study (N = 13) and a design study (N = 16), lending evidence for the benefit of computational grid generation during early stages of design.