AITopics | Newhouse, Laker

Collaborating Authors

Newhouse, Laker

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Modular Duality in Deep Learning

Bernstein, Jeremy, Newhouse, Laker

arXiv.org Machine LearningDec-6-2024

An old idea in optimization theory says that since the gradient is a dual vector it may not be subtracted from the weights without first being mapped to the primal space where the weights reside. We take this idea seriously in this paper and construct such a duality map for general neural networks. Our map, which we call modular dualization, forms a unifying theoretical basis for training algorithms that are a) fast and b) scalable. Modular dualization involves first assigning operator norms to layers based on the semantics of each layer, and then using these layerwise norms to recursively induce a duality map on the weight space of the full neural architecture. We conclude by deriving GPU-friendly algorithms for dualizing Embed, Linear and Conv2D layers -- the latter two methods are based on a rectangular Newton-Schulz iteration (Kovarik, 1970; Bj\"orck & Bowie, 1971). A variant of our methods was used to set speed records for training NanoGPT. Overall, we hope that our theory of modular duality will yield a next generation of fast and scalable optimizers for general neural architectures.

artificial intelligence, duality map, machine learning, (17 more...)

arXiv.org Machine Learning

2410.21265

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Old Optimizer, New Norm: An Anthology

Bernstein, Jeremy, Newhouse, Laker

arXiv.org Artificial IntelligenceDec-6-2024

Deep learning optimizers are often motivated through a mix of convex and approximate second-order theory. We select three such methods -- Adam, Shampoo and Prodigy -- and argue that each method can instead be understood as a squarely first-order method without convexity assumptions. In fact, after switching off exponential moving averages, each method is equivalent to steepest descent under a particular norm. By generalizing this observation, we chart a new design space for training algorithms. Different operator norms should be assigned to different tensors based on the role that the tensor plays within the network. For example, while linear and embedding layers may have the same weight space of $\mathbb{R}^{m\times n}$, these layers play different roles and should be assigned different norms. We hope that this idea of carefully metrizing the neural architecture might lead to more stable, scalable and indeed faster training.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2409.20325

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

An Assessment of Model-On-Model Deception

Heitkoetter, Julius, Gerovitch, Michael, Newhouse, Laker

arXiv.org Artificial IntelligenceMay-10-2024

The trustworthiness of highly capable language models is put at risk when they are able to produce deceptive outputs. Moreover, when models are vulnerable to deception it undermines reliability. In this paper, we introduce a method to investigate complex, model-on-model deceptive scenarios. We create a dataset of over 10,000 misleading explanations by asking Llama-2 7B, 13B, 70B, and GPT-3.5 to justify the wrong answer for questions in the MMLU. We find that, when models read these explanations, they are all significantly deceived. Worryingly, models of all capabilities are successful at misleading others, while more capable models are only slightly better at resisting deception. We recommend the development of techniques to detect and defend against deception. Since the release of OpenAI's ChatGPT, large language models (LLMs) have revolutionized information accessibility by providing precise answers and supportive explanations to complex queries (Spatharioti et al., 2023; Caramancion, 2024; OpenAI, 2022). However, LLMs have also demonstrated a propensity to hallucinate explanations that are convincing but incorrect (Zhang et al., 2023; Walters & Wilder, 2023; Xu et al., 2024).

explanation, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2405.12999

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation

Chen, Zhuo, Newhouse, Laker, Chen, Eddie, Luo, Di, Soljačić, Marin

arXiv.org Artificial IntelligenceMay-24-2023

Quantum many-body physics simulation has important impacts on understanding fundamental science and has applications to quantum materials design and quantum technology. However, due to the exponentially growing size of the Hilbert space with respect to the particle number, a direct simulation is intractable. While representing quantum states with tensor networks and neural networks are the two state-of-the-art methods for approximate simulations, each has its own limitations in terms of expressivity and inductive bias. To address these challenges, we develop a novel architecture, Autoregressive Neural TensorNet (ANTN), which bridges tensor networks and autoregressive neural networks. We show that Autoregressive Neural TensorNet parameterizes normalized wavefunctions, allows for exact sampling, generalizes the expressivity of tensor networks and autoregressive neural networks, and inherits a variety of symmetries from autoregressive neural networks. We demonstrate our approach on quantum state learning as well as finding the ground state of the challenging 2D $J_1$-$J_2$ Heisenberg model with different systems sizes and coupling parameters, outperforming both tensor networks and autoregressive neural networks. Our work opens up new opportunities for scientific simulations of quantum many-body physics and quantum technology.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2304.01996

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback