Goto

Collaborating Authors

 Hardware


Towards Big Topic Modeling

arXiv.org Machine Learning

To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalability problem. To reduce the communication complexity among processors for a better scalability, we propose a novel communication-efficient parallel topic modeling architecture based on power law, which consumes orders of magnitude less communication time when the number of topics is large. We combine the proposed communication-efficient parallel architecture with the online belief propagation (OBP) algorithm referred to as POBP for big topic modeling tasks. Extensive empirical results confirm that POBP has the following advantages to solve the big topic modeling problem: 1) high accuracy, 2) communication-efficient, 3) fast speed, and 4) constant memory usage when compared with recent state-of-the-art parallel LDA algorithms on the multi-processor architecture.


Optimally fuzzy temporal memory

arXiv.org Artificial Intelligence

Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register---a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacrifices the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system.


Event management for large scale event-driven digital hardware spiking neural networks

arXiv.org Artificial Intelligence

The interest in brain-like computation has led to the design of a plethora of innovative neuromorphic systems. Individually, spiking neural networks (SNNs), event-driven simulation and digital hardware neuromorphic systems get a lot of attention. Despite the popularity of event-driven SNNs in software, very few digital hardware architectures are found. This is because existing hardware solutions for event management scale badly with the number of events. This paper introduces the structured heap queue, a pipelined digital hardware data structure, and demonstrates its suitability for event management. The structured heap queue scales gracefully with the number of events, allowing the efficient implementation of large scale digital hardware event-driven SNNs. The scaling is linear for memory, logarithmic for logic resources and constant for processing time. The use of the structured heap queue is demonstrated on field-programmable gate array (FPGA) with an image segmentation experiment and a SNN of 65~536 neurons and 513~184 synapses. Events can be processed at the rate of 1 every 7 clock cycles and a 406$\times$158 pixel image is segmented in 200 ms.


Realtime Simulation of a Cerebellar Spiking Network Model Towards Neuroprosthesis

AAAI Conferences

Neuroprosthesis aims to supersede a damaged or degenerated brain caused by accidents or aging by an artificial brain that simulates and thereby restores the impaired brain functions. To replace the real brain, the artificial brain has to simulate the same functions of the real brain, and the simulation has to be conducted in realtime. We have built a large-scale spiking network model of the cerebellum that is composed of more than 100,000 neuron units and acts as a versatile supervised learning machine for spatiotemporal information. We implement it on a graphics processing unit (GPU) to conduct the numerical simulation in realtime owing to the parallel computing capability of GPUs. We propose to use the present model towards neuroprosthesis.


A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units

arXiv.org Machine Learning

The graphics processing unit (GPU) has emerged as a powerful and cost effective processor for general performance computing. GPUs are capable of an order of magnitude more floating-point operations per second as compared to modern central processing units (CPUs), and thus provide a great deal of promise for computationally intensive statistical applications. Fitting complex statistical models with a large number of parameters and/or for large datasets is often very computationally expensive. In this study, we focus on Gaussian process (GP) models -- statistical models commonly used for emulating expensive computer simulators. We demonstrate that the computational cost of implementing GP models can be significantly reduced by using a CPU+GPU heterogeneous computing system over an analogous implementation on a traditional computing system with no GPU acceleration. Our small study suggests that GP models are fertile ground for further implementation on CPU+GPU systems.


Pictures of Processes: Automated Graph Rewriting for Monoidal Categories and Applications to Quantum Computing

arXiv.org Artificial Intelligence

This work is about diagrammatic languages, how they can be represented, and what they in turn can be used to represent. More specifically, it focuses on representations and applications of string diagrams. String diagrams are used to represent a collection of processes, depicted as "boxes" with multiple (typed) inputs and outputs, depicted as "wires". If we allow plugging input and output wires together, we can intuitively represent complex compositions of processes, formalised as morphisms in a monoidal category. [...] The first major contribution of this dissertation is the introduction of a discretised version of a string diagram called a string graph. String graphs form a partial adhesive category, so they can be manipulated using double-pushout graph rewriting. Furthermore, we show how string graphs modulo a rewrite system can be used to construct free symmetric traced and compact closed categories on a monoidal signature. The second contribution is in the application of graphical languages to quantum information theory. We use a mixture of diagrammatic and algebraic techniques to prove a new classification result for strongly complementary observables. [...] We also introduce a graphical language for multipartite entanglement and illustrate a simple graphical axiom that distinguishes the two maximally-entangled tripartite qubit states: GHZ and W. [...] The third contribution is a description of two software tools developed in part by the author to implement much of the theoretical content described here. The first tool is Quantomatic, a desktop application for building string graphs and graphical theories, as well as performing automated graph rewriting visually. The second is QuantoCoSy, which performs fully automated, model-driven theory creation using a procedure called conjecture synthesis.


Position Paper: Embracing Heterogeneity—Improving Energy Efficiency for Interactive Services on Heterogeneous Data Center Hardware

AAAI Conferences

Data centers today are heterogeneous: they have servers from multiple generations and multiple vendors; server machines have multiple cores that are capable of running at difference speeds, and some have general purpose graphics processing units (GPGPU). Hardware trends indicate that future processors will have heterogeneous cores with different speeds and capabilities. This environment enables new advances in power saving and application optimization. It also poses new challenges, as current systems software is ill-suited for heterogeneity. In this position paper, we focus on interactive applications and outline some of the techniques to embrace heterogeneity. We show that heterogeneity can be exploited to deliver interactive services in an energy-efficient manner. For example, our initial study suggests that neither high-end nor low-end servers alone are very effective in servicing a realistic workload, which typically has requests with varying service demands. High-end servers achieve good throughput but the energy costs are high. Low-end servers are energy-efficient for short requests, but they may not be able to serve long requests at the desired quality of service. In this work, we show that a heterogeneous system can be a better choice than an equivalent homogeneous system to deliver interactive services in a cost-effective manner, transforming heterogeneity from a resource management nightmare to an asset. We highlight some of the challenges and opportunities and the role of AI and machine learning techniques for hosting large interactive services in data centers.


DHPA* and SHPA*: Efficient Hierarchical Pathfinding in Dynamic and Static Game Worlds

AAAI Conferences

In 2004, Botea et al. published the HPA* algorithm (Hierarchical Pathfinding A*), which is the first detailed study of hierarchical pathfinding in games. However, HPA* can be optimized. In this paper, we introduce the DHPA* and SHPA* hierarchical pathfinding algorithms, along with a metric for comparing the dynamic performance of pathfinding algorithms in games. We show that DHPA* searches up to an order of magnitude faster than HPA*, but consumes significantly more memory and produces slightly less optimal paths. The SHPA* algorithm searches up to five times faster than HPA* and consumes less memory, but it also produces slightly less optimal paths, and is only fit for static environments. When comparing the algorithms in dynamic environments, our results show that DHPA* often outperforms HPA*. Unlike many other hierarchical pathfinding algorithms, both solutions presented in this paper maintain a read-only terrain representation during search, which simplifies programming and debugging, and improves multithreaded performance.


Perfect Hashing for State Space Exploration on the GPU

AAAI Conferences

This paper exploits parallel computing power of graphics cards to accelerate state space search. We illustrate that modern graphics processing units (GPUs) have the potential to speed up breadth-first search significantly. For a bitvector representation of the search frontier, GPU algorithms with one and two bits per state are presented. Efficient perfect hash functions and their inverse are explored in order to achieve enhanced compression. We report maximal speed-ups of up to a factor of 27 wrt. single core CPU computation.


Use of a Quantum Computer and the Quick Medical Reference To Give an Approximate Diagnosis

arXiv.org Artificial Intelligence

The Quick Medical Reference (QMR) is a compendium of statistical knowledge connecting diseases to findings (symptoms). The information in QMR can be represented as a Bayesian network. The inference problem (or, in more medical language, giving a diagnosis) for the QMR is to, given some findings, find the probability of each disease. Rejection sampling and likelihood weighted sampling (a.k.a. likelihood weighting) are two simple algorithms for making approximate inferences from an arbitrary Bayesian net (and from the QMR Bayesian net in particular). Heretofore, the samples for these two algorithms have been obtained with a conventional "classical computer". In this paper, we will show that two analogous algorithms exist for the QMR Bayesian net, where the samples are obtained with a quantum computer. We expect that these two algorithms, implemented on a quantum computer, can also be used to make inferences (and predictions) with other Bayesian nets.