AITopics | approximation curve

Collaborating Authors

approximation curve

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Is Deeper Better only when Shallow is Good?

Eran Malach, Shai Shalev-Shwartz

Neural Information Processing SystemsFeb-12-2026, 08:47:27 GMT

While current works account for the importance ofdepth for the expressivepower ofneural-networks, itremains an open question whether these benefits are exploited during a gradient-based optimization process.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Is Deeper Better only when Shallow is Good?

Eran Malach, Shai Shalev-Shwartz

Neural Information Processing SystemsOct-2-2025, 20:31:27 GMT

Neural Information Processing Systems http://nips.cc/

approximation curve, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America (0.28)
Asia > Middle East > Israel (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Reviews: Is Deeper Better only when Shallow is Good?

Neural Information Processing SystemsJan-24-2025, 04:14:09 GMT

This is a good paper that suggests excellent directions for new work. The key point is captured in this statement: "we conjecture that a distribution which cannot be approximated by a shallow network cannot be learned using a gradient-based algorithm, even when using a deep architecture." The authors provide first steps towards investigating this claim. There has been a small amount of work on the typical expressivity of neural networks, in addition to the "worst-case approach." See the papers "Complexity of linear regions in deep networks" and "Deep ReLU Networks Have Surprisingly Few Activation Patterns" by Hanin and Rolnick, which prove that while the number of linear regions can be made to grow exponentially with the depth, the typical number of linear regions is much smaller. See also "Do deep nets really need to be deep?" by Ba and Caruana, which indicates that once deep networks have learned a function, shallow networks can often be trained to distill the deep networks without appreciable performance loss.

deep network, deeper better only, linear region, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Is Deeper Better only when Shallow is Good?

Malach, Eran, Shalev-Shwartz, Shai

arXiv.org Machine LearningMar-8-2019

Understanding the power of depth in feed-forward neural networks is an ongoing challenge in the field of deep learning theory. While current works account for the importance of depth for the expressive power of neural-networks, it remains an open question whether these benefits are exploited during a gradient-based optimization process. In this work we explore the relation between expressivity properties of deep networks and the ability to train them efficiently using gradient-based algorithms. We give a depth separation argument for distributions with fractal structure, showing that they can be expressed efficiently by deep networks, but not with shallow ones. These distributions have a natural coarse-to-fine structure, and we show that the balance between the coarse and fine details has a crucial effect on whether the optimization process is likely to succeed. We prove that when the distribution is concentrated on the fine details, gradient-based algorithms are likely to fail. Using this result we prove that, at least in some distributions, the success of learning deep networks depends on whether the distribution can be well approximated by shallower networks, and we conjecture that this property holds in general.

approximation curve, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1903.03488

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback