AITopics | Chuang, Isaac

Collaborating Authors

Chuang, Isaac

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Parameter Symmetry Breaking and Restoration Determines the Hierarchical Learning in AI Systems

Ziyin, Liu, Xu, Yizhou, Poggio, Tomaso, Chuang, Isaac

arXiv.org Machine LearningFeb-7-2025

More and more phenomena that are virtually universal in the learning process have been discovered in contemporary AI systems. These phenomena are shared by models with different architectures, trained on different datasets, and with different training techniques. The existence of these universal phenomena calls for one or a few universal explanations. However, until today, most of the phenomena are instead described by narrow theories tailored to explain each phenomenon separately - often focusing on specific models trained on specific tasks or loss functions and in isolation from other interesting phenomena that are indispensable parts of the deep learning phenomenology. Certainly, it is desirable to have a universal perspective, if not a universal theory, that explains as many phenomena as possible. In the spirit of science, a universal perspective should be independent of system details such as variations in minor architecture definitions, choice of loss functions, training techniques, etc. A universal theory would give the field a simplified paradigm for thinking about and understanding AI systems and a potential design principle for a new generation of more efficient and capable models.

artificial intelligence, machine learning, symmetry, (16 more...)

arXiv.org Machine Learning

2502.053

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Formation of Representations in Neural Networks

Ziyin, Liu, Chuang, Isaac, Galanti, Tomer, Poggio, Tomaso

arXiv.org Artificial IntelligenceOct-3-2024

Understanding neural representations will help open the black box of neural networks and advance our scientific understanding of modern AI systems. However, how complex, structured, and transferable representations emerge in modern neural networks has remained a mystery. Building on previous results, we propose the Canonical Representation Hypothesis (CRH), which posits a set of six alignment relations to universally govern the formation of representations in most hidden layers of a neural network. Under the CRH, the latent representations (R), weights (W), and neuron gradients (G) become mutually aligned during training. This alignment implies that neural networks naturally learn compact representations, where neurons and weights are invariant to task-irrelevant transformations. We then show that the breaking of CRH leads to the emergence of reciprocal power-law relations between R, W, and G, which we refer to as the Polynomial Alignment Hypothesis (PAH). We present a minimal-assumption theory demonstrating that the balance between gradient noise and regularization is crucial for the emergence the canonical representation. The CRH and PAH lead to an exciting possibility of unifying major key deep learning phenomena, including neural collapse and the neural feature ansatz, in a single framework.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2410.03006

Country:

North America > United States (0.46)
Europe > Switzerland (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Recurrent Neural Networks in the Eye of Differential Equations

Niu, Murphy Yuezhen, Horesh, Lior, Chuang, Isaac

arXiv.org Machine LearningApr-29-2019

To understand the fundamental trade-offs between training stability, temporal dynamics and architectural complexity of recurrent neural networks~(RNNs), we directly analyze RNN architectures using numerical methods of ordinary differential equations~(ODEs). We define a general family of RNNs--the ODERNNs--by relating the composition rules of RNNs to integration methods of ODEs at discrete time steps. We show that the degree of RNN's functional nonlinearity $n$ and the range of its temporal memory $t$ can be mapped to the corresponding stage of Runge-Kutta recursion and the order of time-derivative of the ODEs. We prove that popular RNN architectures, such as LSTM and URNN, fit into different orders of $n$-$t$-ODERNNs. This exact correspondence between RNN and ODE helps us to establish the sufficient conditions for RNN training stability and facilitates more flexible top-down designs of new RNN architectures using large varieties of toolboxes from numerical integration of ODEs. We provide such an example: Quantum-inspired Universal computing Neural Network~(QUNN), which reduces the required number of training parameters from polynomial in both data length and temporal memory length to only linear in temporal memory length.

architecture, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1904.12933

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback