AITopics

Paeedeh, Naeem, Ghiasi-Shirazi, Kamaledin

Improving the Backpropagation Algorithm with Consequentialism Weight Updates over Mini-Batches

Least mean squares (LMS) is a particular case of the backpropagation (BP) algorithm applied to single-layer neural networks with the mean squared error (MSE) loss. One drawback of the LMS is that the instantaneous weight update is proportional to the square of the norm of the input vector. Normalized least mean squares (NLMS) algorithm amends this drawback by dividing the weight changes by the square of the norm of the input vector. The affine projection algorithm (APA) improved the NLMS algorithm to weight update over a batch of recently seen samples. However, the application of NLMS and APA had been limited to single-layer networks and adaptive filters. In this paper, we consider a virtual target for each neuron of a multi-layer neural network and show that the BP algorithm is equivalent to training the weights of each layer using these virtual targets and the LMS algorithm. We also introduce a consequentialism interpretation of the NLMS and the APA algorithms that justifies their use in multi-layer neural networks. Given any optimization algorithm based on the BP over mini-batches, we propose a novel consequentialism method for updating the weights.Consequently, our proposed weight update can be applied both to plain stochastic gradient descent (SGD) and to momentum methods like RMSProp, Adam, and NAG. These ideas helped us to update the weights more carefully in such a way that minimization of the loss for one sample of the mini-batch does not interfere with other samples in that mini-batch. Our experiments show the usefulness of the proposed method in optimizing deep neural network architectures.

algorithm, deep learning, neural network, (21 more...)

2003.05164

Country: Asia > Middle East > Iran (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Zukerman, Jenny, Tirer, Tom, Giryes, Raja

BP-DIP: A Backprojection based Deep Image Prior

Deep neural networks are a very powerful tool for many computer vision tasks, including image restoration, exhibiting state-of-the-art results. However, the performance of deep learning methods tends to drop once the observation model used in training mismatches the one in test time. In addition, most deep learning methods require vast amounts of training data, which are not accessible in many applications. To mitigate these disadvantages, we propose to combine two image restoration approaches: (i) Deep Image Prior (DIP), which trains a convolutional neural network (CNN) from scratch in test time using the given degraded image. It does not require any training data and builds on the implicit prior imposed by the CNN architecture; and (ii) a backprojection (BP) fidelity term, which is an alternative to the standard least squares loss that is usually used in previous DIP works. We demonstrate the performance of the proposed method, termed BP-DIP, on the deblurring task and show its advantages over the plain DIP, with both higher PSNR values and better inference run-time.

deep learning, kernel, neural network, (21 more...)

2003.05417

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Coordinate-wise Armijo's condition: General case

Truong, Tuyen Trung

Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<\alpha <1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $\delta >0$ satisfies Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} f(x-\delta \partial _xf,y-\delta \partial _yf)-f(x,y)\leq -\alpha \delta (||\partial _xf||^2+||\partial _yf||^2). \end{eqnarray*} In one previous paper, we proposed the following {\bf coordinate-wise} Armijo's condition. Fix again $0<\alpha <1$. A pair of positive numbers $\delta _1,\delta _2>0$ satisfies the coordinate-wise variant of Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} [f(x-\delta _1\partial _xf(x,y), y-\delta _2\partial _y f(x,y))]-[f(x,y)]\leq -\alpha (\delta _1||\partial _xf(x,y)||^2+\delta _2||\partial _yf(x,y)||^2). \end{eqnarray*} Previously we applied this condition for functions of the form $f(x,y)=f(x)+g(y)$, and proved various convergent results for them. For a general function, it is crucial - for being able to do real computations - to have a systematic algorithm for obtaining $\delta _1$ and $\delta _2$ satisfying the coordinate-wise version of Armijo's condition, much like Backtracking for the usual Armijo's condition. In this paper we propose such an algorithm, and prove according convergent results. We then analyse and present experimental results for some functions such as $f(x,y)=a|x|+y$ (given by Asl and Overton in connection to Wolfe's method), $f(x,y)=x^3 sin (1/x) + y^3 sin(1/y)$ and Rosenbrock's function.

armijo, backtracking gd, coordinate-wise backtracking gd, (14 more...)

2003.05252

Country:

Europe > Norway > Eastern Norway > Oslo (0.05)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Closure Properties for Private Classification and Online Prediction

Alon, Noga, Beimel, Amos, Moran, Shay, Stemmer, Uri

Let H be a class of boolean functions and consider acomposed class H' that is derived from H using some arbitrary aggregation rule (for example, H' may be the class of all 3-wise majority votes of functions in H). We upper bound the Littlestone dimension of H' in terms of that of H. The bounds are proved using combinatorial arguments that exploit a connection between the Littlestone dimension and Thresholds. As a corollary, we derive closure properties for online learning and private PAC learning. The derived bounds on the Littlestone dimension exhibit an undesirable super-exponential dependence. For private learning, we prove close to optimal bounds that circumvents this suboptimal dependency. The improved bounds on the sample complexity of private learning are derived algorithmically via transforming a private learner for the original class H to a private learner for the composed class H'. Using the same ideas we show that any (proper or improper) private algorithm that learns a class of functions H in the realizable case (i.e., when the examples are labeled by some function in the class) can be transformed to a private algorithm that learns the class H in the agnostic case.

algorithm, learner, probability, (12 more...)

2003.04509

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(8 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)

Blanchet, Jose, Xu, Renyuan, Zhou, Zhengyuan

Delay-Adaptive Learning in Generalized Linear Contextual Bandits

The growing availability of user-specific data has welcomed the exciting era of personalized recommendation, a paradigm that uncovers the heterogeneity across individuals and provides tailored service decisions that lead to improved outcomes. Such heterogeneity is ubiquitous across a variety of application domains (including online advertising, medical treatment assignment, product/news recommendation ([29], [9],[11],[7],[42])) and manifests itself as different individuals responding differently to the recommended items. Rising to this opportunity, contextual bandits ([8, 39, 22, 1, 3]) have emerged to be the predominant mathematical formalism that provides an elegant and powerful formulation: its three core components, the features (representing individual characteristics), the actions (representing the recommendation), and the rewards (representing the observed feedback), capture the salient aspects of the problem and provide fertile ground for developing algorithms that balance exploring and exploiting users' heterogeneity. As such, the last decade has witnessed extensive research efforts in developing effective and efficient contextual bandits algorithms. In particular, two types of algorithms-upper confidence bounds (UCB) based algorithms ([29, 20, 15, 26, 30]) and Thompson sampling (TS) based algorithms ([4, 5, 40, 41, 2])-stand out from this flourishing and fruitful line of work: their theoretical guarantees have been analyzed in many settings, often yielding (near-)optimal regret bounds; their empirical performance have been thoroughly validated, often providing insights into their practical efficacy (including the consensus that TS based algorithms, although sometimes suffering from intensive computation for posterior updates, are generally more effective than their UCB counterparts, whose performance can be sensitive to hyper-parameter tuning). To a large extent, these two family of algorithms have been widely deployed in many modern recommendation engines.

algorithm, bandit, contextual bandit, (14 more...)

2003.05174

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York (0.04)
North America > United States > Hawaii (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.55)
Information Technology > Services (0.48)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Daulbaev, Talgat, Katrutsa, Alexandr, Markeeva, Larisa, Gusak, Julia, Cichocki, Andrzej, Oseledets, Ivan

Interpolated Adjoint Method for Neural ODEs

In this paper, we propose a method, which allows us to alleviate or completely avoid the notorious problem of numerical instability and stiffness of the adjoint method for training neural ODE. On the backward pass, we propose to use the machinery of smooth function interpolation to restore the trajectory obtained during the forward integration. We show the viability of our approach, both in theory and practice.

adjoint method, anode, backward pass, (14 more...)

2003.05271

Country:

Asia > Russia (0.15)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Nye, Maxwell I., Solar-Lezama, Armando, Tenenbaum, Joshua B., Lake, Brenden M.

Learning Compositional Rules via Neural Program Synthesis

arXiv.org Artificial IntelligenceMar-11-2020

Many aspects of human reasoning, including language, require learning rules from very little data. Humans can do this, often learning systematic rules from very few examples, and combining these rules to form compositional rule-based systems. Current neural architectures, on the other hand, often fail to generalize in a compositional manner, especially when evaluated in ways that vary systematically from training. In this work, we present a neuro-symbolic model which learns entire rule systems from a small set of examples. Instead of directly predicting outputs from inputs, we train our model to induce the explicit system of rules governing a set of previously seen examples, drawing upon techniques from the neural program synthesis literature. Our rule-synthesis approach outperforms neural meta-learning techniques in three domains: an artificial instruction-learning domain used to evaluate human learning, the SCAN challenge datasets, and learning rule-based translations of number words into integers for a wide range of human languages.

grammar, learning compositional rule, support example, (13 more...)

arXiv.org Artificial Intelligence

2003.05562

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Peponakis, Manolis, Mastora, Anna, Kapidakis, Sarantos, Doerr, Martin

Expressiveness and machine processability of Knowledge Organization Systems (KOS): An analysis of concepts and relations

arXiv.org Artificial IntelligenceMar-11-2020

This study considers the expressiveness (that is the expressive power or expressivity) of different types of Knowledge Organization Systems (KOS) and discusses its potential to be machine-processable in the context of the Semantic Web. For this purpose, the theoretical foundations of KOS are reviewed based on conceptualizations introduced by the Functional Requirements for Subject Authority Data (FRSAD) and the Simple Knowledge Organization System (SKOS); natural language processing techniques are also implemented. Applying a comparative analysis, the dataset comprises a thesaurus (Eurovoc), a subject headings system (LCSH) and a classification scheme (DDC). These are compared with an ontology (CIDOC-CRM) by focusing on how they define and handle concepts and relations. It was observed that LCSH and DDC focus on the formalism of character strings (nomens) rather than on the modelling of semantics; their definition of what constitutes a concept is quite fuzzy, and they comprise a large number of complex concepts. By contrast, thesauri have a coherent definition of what constitutes a concept, and apply a systematic approach to the modelling of relations. Ontologies explicitly define diverse types of relations, and are by their nature machine-processable. The paper concludes that the potential of both the expressiveness and machine processability of each KOS is extensively regulated by its structural rules. It is harder to represent subject headings and classification schemes as semantic networks with nodes and arcs, while thesauri are more suitable for such a representation. In addition, a paradigm shift is revealed which focuses on the modelling of relations between concepts, rather than the concepts themselves.

knowledge organization system, ontology, relation, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s00799-019-00269-0

2003.05258

Country:

Europe > Netherlands > South Holland > The Hague (0.04)
Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(14 more...)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.93)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.92)

arXiv.org Artificial IntelligenceMar-11-2020

Magic: the Gathering is as Hard as Arithmetic

Biderman, Stella

Magic: the Gathering is a popular and famously complicated card game about magical combat. Recently, several authors including Chatterjee and Ibsen-Jensen (2016) and Churchill, Biderman, and Herrick (2019) have investigated the computational complexity of playing Magic optimally. In this paper we show that the ``mate-in-$n$'' problem for Magic is $\Delta^0_n$-hard and that optimal play in two-player Magic is non-arithmetic in general. These results apply to how real Magic is played, can be achieved using standard-size tournament legal decks, and do not rely on stochasticity or hidden information. Our paper builds upon the construction that Churchill, Biderman, and Herrick (2019) used to show that this problem was at least as hard as the halting problem.

construction, gathering, turing machine, (15 more...)

arXiv.org Artificial Intelligence

2003.05119

Country:

North America > United States > Illinois (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Germany (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Games > Computer Games (0.76)