AITopics | Daniely, Amit

Collaborating Authors

Daniely, Amit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SGD Learns the Conjugate Kernel Class of the Network

Daniely, Amit

Neural Information Processing SystemsDec-31-2017

While stochastic gradient decent (SGD) from a random initialization is probably the most popular supervised learning algorithm today, we have very few results that depicts conditions that guarantee its success. Indeed, to the best of our knowledge, Andoni et al. [2014] provides the only known result of this form, and it is valid in a rather restricted setting. Namely, for depth-2 networks, where the underlying distribution is Gaussian, the algorithm is full gradient decent (rather than SGD), and the task is regression when the learnt function is a constant degree polynomial. We build on the framework of Daniely et al. [2016] to establish guarantees on SGD in a rather general setting. Daniely et al. [2016] defined a framework that associates a reproducing kernel to a network architecture.

activation, artificial intelligence, neural network, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SGD Learns the Conjugate Kernel Class of the Network

Daniely, Amit

arXiv.org Machine LearningMay-19-2017

We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of the network, as defined in Daniely, Frostig and Singer. The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guarantee for the standard neural network learning algorithm for networks of depth more that two. As corollaries, it follows that for neural networks of any depth between $2$ and $\log(n)$, SGD is guaranteed to learn, in polynomial time, constant degree polynomials with polynomially bounded coefficients. Likewise, it follows that SGD on large enough networks can learn any continuous function (not in polynomial time), complementing classical expressivity results.

activation, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1702.08503

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

Daniely, Amit, Frostig, Roy, Singer, Yoram

arXiv.org Artificial IntelligenceMay-19-2017

We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning. We show that initial representations generated by common random initializations are sufficiently rich to express all functions in the dual kernel space. Hence, though the training objective is hard to optimize in the worst case, the initial weights form a good starting point for optimization. Our dual view also reveals a pragmatic and aesthetic perspective of neural networks and underscores their expressive power.

activation, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

1602.05897

Country: Europe > United Kingdom > England (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

Daniely, Amit, Frostig, Roy, Singer, Yoram

Neural Information Processing SystemsDec-31-2016

We develop a general duality between neural networks and compositional kernel Hilbert spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly describes both a family of neural networks and a kernel space. Random neural networks are generated from a skeleton through node replication followed by sampling from a normal distribution to assign weights. The kernel space consists of functions that arise by compositions, averaging, and non-linear transformations governed by the skeleton's graph topology and activation functions. We prove that random networks induce representations which approximate the kernel space. In particular, it follows that random weight initialization often yields a favorable starting point for optimization despite the worst-case intractability of training neural networks.

artificial intelligence, neural network, skeleton, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England (0.14)
Europe > Spain (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

More data speeds up training time in learning halfspaces over sparse vectors

Daniely, Amit, Linial, Nati, Shalev-Shwartz, Shai

Neural Information Processing SystemsDec-31-2013

The increased availability of data in recent years led several authors to ask whether it is possible to use data as a {\em computational} resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a {\em natural supervised learning problem} --- we consider agnostic PAC learning of halfspaces over $3$-sparse vectors in $\{-1,1,0\}^n$. This class is inefficiently learnable using $O\left(n/\epsilon^2\right)$ examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random $\mathrm{3CNF}$ formulas is hard, efficiently learning this class using $O\left(n/\epsilon^2\right)$ examples is impossible. We further show that under stronger hardness assumptions, even $O\left(n^{1.499}/\epsilon^2\right)$ examples do not suffice. On the other hand, we show a new algorithm that learns this class efficiently using $\tilde{\Omega}\left(n^2/\epsilon^2\right)$ examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.15)

Industry: Education > Focused Education > Special Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)

Add feedback

Multiclass Learning Approaches: A Theoretical Comparison with Implications

Daniely, Amit, Sabato, Sivan, Shwartz, Shai S.

Neural Information Processing SystemsDec-31-2012

We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension $d$, and in particular from the class of halfspaces over $\reals^d$. We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions. Our proof technique employs tools from VC theory to analyze the \emph{approximation error} of hypothesis classes. This is in sharp contrast to most, if not all, previous uses of VC theory, which only deal with estimation error.

approximation error, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.36)

Add feedback