lrn
- Asia > Middle East > Israel (0.04)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Supplementary Material A Proof of Theorem 3.1 (Realizable Case - Positive Result) Theorem (Restatement of Theorem 3.1)
Let H be a hypothesis class with VC dimension d and let 2 (0, 1) . Then there exists a learner Lrn having -adversarial risk " To prove Theorem 3.1, we will use the S SPV and let n 1 / be the sample size. By applying linearity of expectation we get E " 1 t To prove Theorem 3.1, we will need an optimal learner as an input learner for SPV . Theorem 3.1 can now be immediately inferred as a direct application of Lemma A.1 and Theorem A.2 . The impossibility result in Theorem 3.3 extends to randomized learning rules.
Learning with Confidence
We characterize a notion of confidence that arises in learning or updating beliefs: the amount of trust one has in incoming information and its impact on the belief state. This learner's confidence can be used alongside (and is easily mistaken for) probability or likelihood, but it is fundamentally a different concept -- one that captures many familiar concepts in the literature, including learning rates and number of training epochs, Shafer's weight of evidence, and Kalman gain. We formally axiomatize what it means to learn with confidence, give two canonical ways of measuring confidence on a continuum, and prove that confidence can always be represented in this way. Under additional assumptions, we derive more compact representations of confidence-based learning in terms of vector fields and loss functions. These representations induce an extended language of compound "parallel" observations. We characterize Bayes Rule as the special case of an optimizing learner whose loss representation is a linear expectation.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Online Learning of Neural Networks
Daniely, Amit, Mehalel, Idan, Mossel, Elchanan
We study online learning of feedforward neural networks with the sign activation function that implement functions from the unit ball in $\mathbb{R}^d$ to a finite label set $\{1, \ldots, Y\}$. First, we characterize a margin condition that is sufficient and in some cases necessary for online learnability of a neural network: Every neuron in the first hidden layer classifies all instances with some margin $γ$ bounded away from zero. Quantitatively, we prove that for any net, the optimal mistake bound is at most approximately $\mathtt{TS}(d,γ)$, which is the $(d,γ)$-totally-separable-packing number, a more restricted variation of the standard $(d,γ)$-packing number. We complement this result by constructing a net on which any learner makes $\mathtt{TS}(d,γ)$ many mistakes. We also give a quantitative lower bound of approximately $\mathtt{TS}(d,γ) \geq \max\{1/(γ\sqrt{d})^d, d\}$ when $γ\geq 1/2$, implying that for some nets and input sequences every learner will err for $\exp(d)$ many times, and that a dimension-free mistake bound is almost always impossible. To remedy this inevitable dependence on $d$, it is natural to seek additional natural restrictions to be placed on the network, so that the dependence on $d$ is removed. We study two such restrictions. The first is the multi-index model, in which the function computed by the net depends only on $k \ll d$ orthonormal directions. We prove a mistake bound of approximately $(1.5/γ)^{k + 2}$ in this model. The second is the extended margin assumption. In this setting, we assume that all neurons (in all layers) in the network classify every ingoing input from previous layer with margin $γ$ bounded away from zero. In this model, we prove a mistake bound of approximately $(\log Y)/ γ^{O(L)}$, where L is the depth of the network.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Czechia > Prague (0.04)
- (2 more...)
Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps
Schmittle, Matt, Baijal, Rohan, Hatch, Nathan, Scalise, Rosario, Castro, Mateo Guaman, Talia, Sidharth, Khetarpal, Khimya, Boots, Byron, Srinivasa, Siddhartha
A robot navigating an outdoor environment with no prior knowledge of the space must rely on its local sensing to perceive its surroundings and plan. This can come in the form of a local metric map or local policy with some fixed horizon. Beyond that, there is a fog of unknown space marked with some fixed cost. A limited planning horizon can often result in myopic decisions leading the robot off course or worse, into very difficult terrain. Ideally, we would like the robot to have full knowledge that can be orders of magnitude larger than a local cost map. In practice, this is intractable due to sparse sensing information and often computationally expensive. In this work, we make a key observation that long-range navigation only necessitates identifying good frontier directions for planning instead of full map knowledge. To this end, we propose Long Range Navigator (LRN), that learns an intermediate affordance representation mapping high-dimensional camera images to `affordable' frontiers for planning, and then optimizing for maximum alignment with the desired goal. LRN notably is trained entirely on unlabeled ego-centric videos making it easy to scale and adapt to new platforms. Through extensive off-road experiments on Spot and a Big Vehicle, we find that augmenting existing navigation stacks with LRN reduces human interventions at test-time and leads to faster decision making indicating the relevance of LRN. https://personalrobotics.github.io/lrn
On Volume Minimization in Conformal Regression
Bars, Batiste Le, Humbert, Pierre
We study the question of volume optimality in split conformal regression, a topic still poorly understood in comparison to coverage control. Using the fact that the calibration step can be seen as an empirical volume minimization problem, we first derive a finite-sample upper-bound on the excess volume loss of the interval returned by the classical split method. This important quantity measures the difference in length between the interval obtained with the split method and the shortest oracle prediction interval. Then, we introduce EffOrt, a methodology that modifies the learning step so that the base prediction function is selected in order to minimize the length of the returned intervals. In particular, our theoretical analysis of the excess volume loss of the prediction sets produced by EffOrt reveals the links between the learning and calibration steps, and notably the impact of the choice of the function class of the base predictor. We also introduce Ad-EffOrt, an extension of the previous method, which produces intervals whose size adapts to the value of the covariate. Finally, we evaluate the empirical performance and the robustness of our methodologies.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
On Optimal Learning Under Targeted Data Poisoning
Hanneke, Steve, Karbasi, Amin, Mahmoody, Mohammad, Mehalel, Idan, Moran, Shay
Consider the task of learning a hypothesis class $\mathcal{H}$ in the presence of an adversary that can replace up to an $\eta$ fraction of the examples in the training set with arbitrary adversarial examples. The adversary aims to fail the learner on a particular target test point $x$ which is known to the adversary but not to the learner. In this work we aim to characterize the smallest achievable error $\epsilon=\epsilon(\eta)$ by the learner in the presence of such an adversary in both realizable and agnostic settings. We fully achieve this in the realizable setting, proving that $\epsilon=\Theta(\mathtt{VC}(\mathcal{H})\cdot \eta)$, where $\mathtt{VC}(\mathcal{H})$ is the VC dimension of $\mathcal{H}$. Remarkably, we show that the upper bound can be attained by a deterministic learner. In the agnostic setting we reveal a more elaborate landscape: we devise a deterministic learner with a multiplicative regret guarantee of $\epsilon \leq C\cdot\mathtt{OPT} + O(\mathtt{VC}(\mathcal{H})\cdot \eta)$, where $C > 1$ is a universal numerical constant. We complement this by showing that for any deterministic learner there is an attack which worsens its error to at least $2\cdot \mathtt{OPT}$. This implies that a multiplicative deterioration in the regret is unavoidable in this case. Finally, the algorithms we develop for achieving the optimal rates are inherently improper. Nevertheless, we show that for a variety of natural concept classes, such as linear classifiers, it is possible to retain the dependence $\epsilon=\Theta_{\mathcal{H}}(\eta)$ by a proper algorithm in the realizable setting. Here $\Theta_{\mathcal{H}}$ conceals a polynomial dependence on $\mathtt{VC}(\mathcal{H})$.
- Asia > Middle East > Israel (0.04)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Difference between Local Response Normalization and Batch Normalization
Normalization has become an important of deep neural networks that compensates for the unbounded nature of certain activation functions such as ReLU, ELU etc. With these activation function, the output layers is not constrained within a bounded range (such as [-1,1] for tanh), rather they can grow as high as the training allows it. To limit the unbounded activation from increasing the output layer values, normalization is used just before the activation function. There are two common normalization techniques used in deep neural networks and are often misunderstood by the beginners. In this tutorial, a detailed explanation of both the normalization techniques will be discussed highlighting their key differences.