Goto

Collaborating Authors

 Problem-Independent Architectures


Inception Network Deep Learning Architecture

#artificialintelligence

Inception Network Deep Learning Architecture Sharing some of my experience on simplifying the cost & how to go with best combination of hyper parameters.


Improving the Privacy and Accuracy of ADMM-Based Distributed Algorithms

arXiv.org Machine Learning

Alternating direction method of multiplier (ADMM) is a popular method used to design distributed versions of a machine learning algorithm, whereby local computations are performed on local data with the output exchanged among neighbors in an iterative fashion. During this iterative process the leakage of data privacy arises. A differentially private ADMM was proposed in prior work (Zhang & Zhu, 2017) where only the privacy loss of a single node during one iteration was bounded, a method that makes it difficult to balance the tradeoff between the utility attained through distributed computation and privacy guarantees when considering the total privacy loss of all nodes over the entire iterative process. We propose a perturbation method for ADMM where the perturbed term is correlated with the penalty parameters; this is shown to improve the utility and privacy simultaneously. The method is based on a modified ADMM where each node independently determines its own penalty parameter in every iteration and decouples it from the dual updating step size. The condition for convergence of the modified ADMM and the lower bound on the convergence rate are also derived.


Cyc - Wikipedia

#artificialintelligence

The need for a massive symbolic artificial intelligence project of this ilk was born in the early 1980s out of a large number of experiences early AI researchers had, in the previous 25 years, wherein their AI programs would generate encouraging early results but then fail to "scale up"--fail to cope with novel situations and problems outside the narrow area they were conceived and engineered to cope with. Douglas Lenat and Alan Kay publicized this need,[1][2][3] and organized a meeting at Stanford in 1983 to consider the problem; the back-of-the-envelope calculations by them and colleagues including Marvin Minsky, Allen Newell, Edward Feigenbaum, and John McCarthy indicated that that effort would require between 1000 and 3000 person-years of effort, hence not fit into the standard academic project model. Fortuitously, events within a year of that meeting enabled that Manhattan-Project-sized effort to get underway. The project was started in July,1984 as the flagship project of the 400-person Microelectronics and Computer Technology Corporation, a research consortium started by two dozen large United States based corporations "to counter a then ominous Japanese effort in AI, the so-called "fifth-generation" project."[4] The US Government reacted to the Fifth Generation threat by passing the National Cooperative Research Act of 1984, which for the first time allowed US companies to "collude" on long-term high-risk high-payoff research, and MCC and Sematech sprang up to take advantage of that ten-year opportunity.


Sustainable Deep Learning Architectures require Manageability

#artificialintelligence

This is a very important consideration that is often overlooked by many in the field of Artificial Intelligence (AI). I suspect there are very few academic researchers who understand this aspect. The work performed in academe is distinctly different from the work required to make a product that is sustainable and economically viable. It is the difference between computer code that is written to demonstrate a new discovery and code that is written to support the operations of a company. The former kind turns to be exploratory and throwaway while the the latter kind tends to be exploitive and requires sustainability.


Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries

arXiv.org Machine Learning

Through browsing the web, engaging in online social networks and interacting with connected devices, we are producing ever growing amounts of sensitive personal data. This has fueled the massive development of innovative personalized services which extract value from users' data using machine learning techniques. In today's dominant approach, users hand over their personal data to the service provider, who stores everything on centralized or tightly coupled systems hosted in data centers. Unfortunately, this poses important risks regarding the privacy of users. To mitigate these risks, some approaches have been proposed to learn from datasets owned by several parties who do not want to disclose their data. However, they typically suffer from some drawbacks: (partially) homomorphic encryption schemes (Paillier, 1999; Graepel et al., 2012; Aslett et al., 2015) require the existence of a trusted third party, secure multi-party computation techniques (Yao, 1982; Lindell and Pinkas, 2009) are generally intractable when the number of parties is large, and exchanging noisy sketches of the data through (local) differential privacy (Dwork, 2006; Duchi et al., 2012) only provides approximate solutions which are quite inaccurate in the highly distributed setting considered here. Furthermore, many of these techniques are not robust to the presence of malicious parties who may try to manipulate the outcome of the algorithm. In this paper, our goal is to design a massively distributed protocol to collaboratively compute averages over the data of thousands to millions of users (some of them honest-but-curious and some corrupted by a malicious party), with arbitrary accuracy and in a way that preserves their privacy. For machine learning algorithms whose sufficient statistics are averages (e.g., kernel-based algorithms in primal space and decision trees), this could be used as a primitive to privately learn more complex models.


Learning architectures based on quantum entanglement: a simple matrix product state algorithm for image recognition

arXiv.org Machine Learning

It is a fundamental, but still elusive question whether methods based on quantum mechanics, in particular on quantum entanglement, can be used for classical information processing and machine learning. Even partial answer to this question would bring important insights to both fields of both machine learning and quantum mechanics. In this work, we implement simple numerical experiments, related to pattern/images classification, in which we represent the classifiers by quantum matrix product states (MPS). Classical machine learning algorithm is then applied to these quantum states. We explicitly show how quantum features (i.e., single-site and bipartite entanglement) can emerge in such represented images; entanglement characterizes here the importance of data, and this information can be practically used to improve the learning procedures. Thanks to the low demands on the dimensions and number of the unitary matrices, necessary to construct the MPS, we expect such numerical experiments could open new paths in classical machine learning, and shed at same time lights on generic quantum simulations/computations.


Active Online Learning Architecture for Multimodal Sensor-based ADL Recognition

AAAI Conferences

Long-term observation of changes in Activities of Daily Living (ADL) is important for assisting older people to stay active longer by preventing aging-associated diseases such as disuse syndrome. Previous studies have proposed a number of ways to detect the state of a person using a single type of sensor data. However, for recognizing more complicated state, properly integrating multiple sensor data is essential, but the technology remains a challenge. In addition, previous methods lack abilities to deal with misclassified data unknown at the training phase. In this paper, we propose an architecture for multimodal sensor-based ADL recognition which spontaneously acquires knowledge from data of unknown label type. Evaluation experiments are conducted to test the architecture's abilities to recognize ADL and construct data-driven reactive planning by integrating three types of dataflows, acquire new concepts, and expand existing concepts semi-autonomously and in real time. By adding extension plugins to Fluentd, we expended its functions and developed an extended model, Fluentd++. The results of the evaluation experiments indicate that the architecture is able to achieve the above required functions satisfactorily.


An Improved Oscillating-Error Classifier with Branching

arXiv.org Machine Learning

This paper extends the earlier work, based on an oscillating error correction technique [7]. The method uses an error correction update that includes a very simple rule, of either adding or subtracting the error adjustment, based on whether the variable value is currently larger or smaller than the desired value. This has relations with cellular automata [2], where the small add or subtract decision gives the classifier an added dimension of flexibility. The results reported in the first paper were unusually good over a wide range of datasets and it was subsequently found that an error had been made in how the classifier decides on the correct output category. The earlier paper measured the error amount between the desired output value and the value produced by the corresponding classifier. If the error was small enough, then the classification was considered to be correct. When training the classifier, the data rows for each category would be put together and averaged. The classifier would then try to learn these average values, but that would lead to distinct weight sets for each output category. It was overlooked that even if the desired output category correctly classified the input data row, one of the other category weight sets could produce an even smaller error.


An Efficient Deep Neural Architecture for Multilingual Sentiment Analysis in Twitter

AAAI Conferences

Sentiment analysis of tweets is often monolingual and the models provided by machine learning classifiers are usually not applicable across distinct languages. Cross-language sentiment classification usually relies on machine translation strategies in which a source language is translated to the desired target language. Machine translation is costly and the provided results are limited by the quality of the translation that is performed. In this paper, we propose an efficient translation-free deep neural architecture for performing multilingual sentiment analysis of tweets. Our proposed approach benefits from a cost-effective character-based embedding and from optimized convolutions to learn from multiple distinct languages. The resulting model is capable of learning latent features from all languages used during training at once and it does not require any translation process to be performed whatsoever. We empirically evaluate the efficiency and effectiveness of the proposed approach in tweet corpora from four different languages and we show that it presents the best trade-off among four distinct state-of-the-art deep neural architectures for sentiment analysis.


A Deep Neural Architecture for Kitchen Activity Recognition

AAAI Conferences

Computer-based human activity recognition of daily living has recently attracted much interest due to its applicability to ambient assisted living. Such applications require the automatic recognition of high-level activities composed of multiple actions performed by human beings in a given environment. We propose a deep neural architecture for kitchen activity recognition, which uses an ensemble of machine learning models and hand-crafted features to extract more information of the data. Experiments show that our approach achieves the state-of-the-art for identifying cooking actions in a well-known kitchen dataset.