AITopics

Ororbia, Alexander, Mali, Ankur, Kifer, Daniel, Giles, C. Lee

Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively

arXiv.org Machine LearningMay-25-2019

In lifelong learning systems, especially those based on artificial neural networks, one of the biggest obstacles is the severe inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this paper, we present a new connectionist model, the Sequential Neural Coding Network, and its learning procedure, grounded in the neurocognitive theory of predictive coding. The architecture experiences significantly less forgetting as compared to standard neural models and outperforms a variety of previously proposed remedies and methods when trained across multiple task datasets in a stream-like fashion. The promising performance demonstrated in our experiments offers motivation that directly incorporating mechanisms prominent in real neuronal systems, such as competition, sparse activation patterns, and iterative input processing, can create viable pathways for tackling the challenge of lifelong machine learning.

artificial intelligence, machine learning, s-ncn, (16 more...)

1905.10696

Country:

Africa > Mali (0.05)
Oceania > New Zealand (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)

Genre: Research Report (0.70)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningMay-25-2019

ArSentD-LEV: A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic Levantine Tweets

Baly, Ramy, Khaddaj, Alaa, Hajj, Hazem, El-Hajj, Wassim, Shaban, Khaled Bashir

Sentiment analysis is a highly subjective and challenging task. Its complexity further increases when applied to the Arabic language, mainly because of the large variety of dialects that are unstandardized and widely used in the Web, especially in social media. While many datasets have been released to train sentiment classifiers in Arabic, most of these datasets contain shallow annotation, only marking the sentiment of the text unit, as a word, a sentence or a document. In this paper, we present the Arabic Sentiment Twitter Dataset for the Levantine dialect (ArSenTD-LEV). Based on findings from analyzing tweets from the Levant region, we created a dataset of 4,000 tweets with the following annotations: the overall sentiment of the tweet, the target to which the sentiment was expressed, how the sentiment was expressed, and the topic of the tweet. Results confirm the importance of these annotations at improving the performance of a baseline sentiment classifier. They also confirm the gap of training in a certain domain, and testing in another domain.

machine learning, natural language, tweet, (18 more...)

1906.0183

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Syria (0.05)
Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.05)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningMay-25-2019

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

Hulsebos, Madelon, Hu, Kevin, Bakker, Michiel, Zgraggen, Emanuel, Satyanarayan, Arvind, Kraska, Tim, Demiralp, Çağatay, Hidalgo, César

Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.

artificial intelligence, machine learning, natural language, (18 more...)

1905.10688

Country:

North America > United States > Alaska > Anchorage Municipality > Anchorage (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceMay-24-2019, 09:57:37 GMT

The Amazing Ways Chinese Face Recognition Company Megvii (Face ) Uses AI And Machine Vision

Megvii Technology, a Chinese company, founded in 2011 and widely known for its Face system, is one of the world leaders in facial recognition and artificial intelligence technology. While they might be best known for Face, Megvii uses artificial intelligence and machine vision in a variety of amazing ways. Megvii was the concept conceived by friends and Tsinghua University graduates Yin Qui, Yang Mu, and Tang Wenbin. After tremendous success in China (especially since they were able to train algorithms from China's vast pool of data) with clients such as Ant Financial, Vivo (smartphones), Didi Chuxing (ride-sharing) and investments from Bank of China, the State-Owned Venture Capital Fund, China-Russian Investment Fund and other private investors including Ant Financial (Alibaba's payment affiliate), Megvii is ready to go global. They have projects slated in the coming year for Japan, Europe, the Middle East, Southeast Asia, and the United States and have secured a distributor in Thailand.

artificial intelligence, chinese face recognition company megvii, megvii, (10 more...)

#artificialintelligence

Country:

North America > United States (0.26)
Europe > Middle East (0.26)
Asia > Thailand (0.26)
(5 more...)

Industry: Banking & Finance (0.93)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (0.78)

arXiv.org Artificial IntelligenceMay-24-2019

Using Deep Networks and Transfer Learning to Address Disinformation

Dhamani, Numa, Azunre, Paul, Gleason, Jeffrey L., Corcoran, Craig, Honke, Garrett, Kramer, Steve, Morgan, Jonathon

We also demonstrate the the detection of inflammatory, inauthentic, or otherwise ability to use this architecture to transfer knowledge nefarious communication. Character-level convolutional from labeled data in one domain to related neural networks (CNNs) are particularly well-suited for (supervised and unsupervised) tasks. Characterlevel this task--as opposed to a word-level model--because they neural networks and transfer learning are allow for non-vernacular discourse, misspelling, and other particularly valuable tools in the disinformation social media features (e.g., emoticons) to be learned without space because of the messy nature of social media, the constraint of fixed vocabularies (Zhang et al., 2015). We lack of labeled data, and the multi-channel tactics implement an adaptation of a neural network architecture of influence campaigns. We demonstrate their effectiveness recently demonstrated to be effective for text classification in several tasks relevant for detecting (Zhang et al., 2015; Józefowicz et al., 2016). The method disinformation: spam emails, review bombing, is purely content-based and does not require any additional political sentiment, and conversation clustering.

artificial intelligence, deep network and transfer learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

1905.10412

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > New York > Broome County > Binghamton (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
Africa > Nigeria (0.04)

Genre: Research Report (0.50)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Hawkins, Cole, Zhang, Zheng

Bayesian Tensorized Neural Networks with Automatic Rank Selection

Tensor decomposition is an effective approach to compress over-parameterized neural networks and to enable their deployment on resource-constrained hardware platforms. However, directly applying tensor compression in the training process is a challenging task due to the difficulty of choosing a proper tensor rank. In order to achieve this goal, this paper proposes a Bayesian tensorized neural network. Our Bayesian method performs automatic model compression via an adaptive tensor rank determination. We also present approaches for posterior density calculation and maximum a posteriori (MAP) estimation for the end-to-end training of our tensorized neural network. We provide experimental validation on a fully connected neural network, a CNN and a residual neural network where our work produces $7.4\times$ to $137\times$ more compact neural networks directly from the training.

artificial intelligence, bayesian inference, machine learning, (14 more...)

1905.10478

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Bosman, Anna Sergeevna, Engelbrecht, Andries, Helbig, Mardé

Loss Surface Modality of Feed-Forward Neural Network Architectures

It has been argued in the past that high-dimensional neural networks do not exhibit local minima capable of trapping an optimisation algorithm. However, the relationship between loss surface modality and the neural architecture parameters, such as the number of hidden neurons per layer and the number of hidden layers, remains poorly understood. This study employs fitness landscape analysis to study the modality of neural network loss surfaces under various feed-forward architecture settings. An increase in the problem dimensionality is shown to yield a more searchable and more exploitable loss surface. An increase in the hidden layer width is shown to effectively reduce the number of local minima, and simplify the shape of the global attractor. An increase in the architecture depth is shown to sharpen the global attractor, thus making it more exploitable.

architecture, artificial intelligence, machine learning, (16 more...)

1905.10268

Country:

Africa > South Africa > Gauteng > Pretoria (0.05)
Europe > United Kingdom > Wales (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Curtó, J. D., Zarza, H. C.

Doctor of Crosswise: Reducing Over-parametrization in Neural Networks

Dr. of Crosswise proposes a new architecture to reduce over-parametrization in Neural Networks. It introduces an operand for rapid computation in the framework of Deep Learning that leverages learned weights. The formalism is described in detail providing both an accurate elucidation of the mechanics and the theoretical implications.

artificial intelligence, learning, machine learning, (16 more...)

1905.10324

Country:

Africa > Senegal > Kolda Region > Kolda (0.05)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

Teng, Yunfei, Gao, Wenbo, Chalus, Francois, Choromanska, Anna, Goldfarb, Donald, Weller, Adrian

We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines.

experiment, minimizer, stationary point, (16 more...)

1905.10395

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)