AITopics | Susskind, Josh

Collaborating Authors

Susskind, Josh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

Zhang, Yizhe, Gu, Jiatao, Wu, Zhuofeng, Zhai, Shuangfei, Susskind, Josh, Jaitly, Navdeep

arXiv.org Artificial IntelligenceOct-27-2023

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the difference between how a model is trained, and how it is used during inference. Denoising diffusion models provide an alternative approach in which a model can revisit and revise its output. However, they can be computationally expensive and prior efforts on text have led to models that produce less fluent output compared to autoregressive models, especially for longer text and paragraphs. In this paper, we propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs. The model achieves this by combining an autoregressive "decoding" module with a "planning" module that uses latent diffusion to generate semantic paragraph embeddings in a coarse-to-fine manner. The proposed method is evaluated on various conditional generation tasks, and results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text in an efficient manner.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.02531

Country:

Asia (1.00)
North America > United States > Maryland > Prince George's County (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Consumer Products & Services (1.00)
Banking & Finance (1.00)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Control3Diff: Learning Controllable 3D Diffusion Models from Single-view Images

Gu, Jiatao, Gao, Qingzhe, Zhai, Shuangfei, Chen, Baoquan, Liu, Lingjie, Susskind, Josh

arXiv.org Artificial IntelligenceOct-26-2023

Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets. However, 3D GANs do not provide straightforward ways to precisely control image synthesis. To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets. Control3Diff explicitly models the underlying latent distribution (optionally conditioned on external inputs), thus enabling direct control during the diffusion process. Moreover, our approach is general and applicable to any type of controlling input, allowing us to train it with the same diffusion objective without any auxiliary supervision. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various conditioning inputs such as images, sketches, and text prompts. Please see the project website (\url{https://jiataogu.me/control3diff}) for video comparisons.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.067

Country: Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

What Algorithms can Transformers Learn? A Study in Length Generalization

Zhou, Hattie, Bradley, Arwen, Littwin, Etai, Razin, Noam, Saremi, Omid, Susskind, Josh, Bengio, Samy, Nakkiran, Preetum

arXiv.org Machine LearningOct-24-2023

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.

machine learning, natural language, programming language, (17 more...)

arXiv.org Machine Learning

2310.16028

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Matryoshka Diffusion Models

Gu, Jiatao, Zhai, Shuangfei, Zhang, Yizhe, Susskind, Josh, Jaitly, Navdeep

arXiv.org Artificial IntelligenceOct-23-2023

Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion Models(MDM), an end-to-end framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions, which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024x1024 pixels, demonstrating strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.

artificial intelligence, machine learning, resolution, (16 more...)

arXiv.org Artificial Intelligence

2310.15111

Country: Asia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Abnar, Samira, Saremi, Omid, Dinh, Laurent, Wilson, Shantel, Bautista, Miguel Angel, Huang, Chen, Thilak, Vimal, Littwin, Etai, Gu, Jiatao, Susskind, Josh, Bengio, Samy

arXiv.org Artificial IntelligenceOct-13-2023

Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et al. (2021). We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i.e., the depth of the computation graph). Based on our observations, we propose a transformer-based architecture called Hyper-UT, which combines dynamic function generation from hyper networks with adaptive depth from Universal Transformers. This model demonstrates higher accuracy and a fairer allocation of computational resources when generalizing to higher numbers of computation steps. We conclude that mechanisms for adaptive depth and modularity complement each other in improving efficient generalization concerning example complexity. Additionally, to emphasize the broad applicability of our findings, we illustrate that in a standard image recognition task, Hyper- UT's performance matches that of a ViT model but with considerably reduced computational demands (achieving over 70\% average savings by effectively using fewer layers).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.08866

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Generative Modeling with Phase Stochastic Bridges

Chen, Tianrong, Gu, Jiatao, Dinh, Laurent, Theodorou, Evangelos A., Susskind, Josh, Zhai, Shuangfei

arXiv.org Artificial IntelligenceOct-12-2023

Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2310.07805

Country: Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)

Add feedback

Boolformer: Symbolic Regression of Logic Functions with Transformers

d'Ascoli, Stéphane, Bengio, Samy, Susskind, Josh, Abbé, Emmanuel

arXiv.org Artificial IntelligenceSep-21-2023

Deep neural networks, in particuler those based on the Transformer architecture [1], have lead to breakthroughs in computer vision [2] and language modelling [3], and have fuelled the hopes to accelerate scientific discovery [4]. However, their ability to perform simple logic tasks remains limited [5]. These tasks differ from traditional vision or language tasks in the combinatorial nature of their input space, which makes representative data sampling challenging. Reasoning tasks have thus gained major attention in the deep learning community, either with explicit reasoning in the logical domain, e.g., tasks in the realm of arithmetic and algebra [6, 7], algorithmic CLRS tasks [8] or LEGO [9], or implicit reasoning in other modalities, e.g., benchmarks such as Pointer Value Retrieval [10] and Clevr [11] for vision models, or LogiQA [12] and GSM8K [13] for language models. Reasoning also plays a key role in tasks which can be tackled via Boolean modelling, particularly in the fields of biology [14] and medecine [15].

formula, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.12207

Genre: Research Report (0.83)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation

Mousavi, Ali, Zhan, Xin, Bai, He, Shi, Peng, Rekatsinas, Theo, Han, Benjamin, Li, Yunyao, Pound, Jeff, Susskind, Josh, Schluter, Natalie, Ilyas, Ihab, Jaitly, Navdeep

arXiv.org Artificial IntelligenceSep-20-2023

Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the equivalence between the KG and the text in the dataset. Using cyclic evaluation we find that manually created WebNLG is much better than automatically created TeKGen and T-REx. Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation. We also construct two synthetic datasets using large language models (LLMs), and observe that these are conducive to models that perform significantly well on cyclic generation of text, but less so on cyclic generation of KGs, probably because of a lack of a consistent underlying ontology.

artificial intelligence, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2309.11669

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Zhai, Shuangfei, Likhomanenko, Tatiana, Littwin, Etai, Busbridge, Dan, Ramapuram, Jason, Zhang, Yizhe, Gu, Jiatao, Susskind, Josh

arXiv.org Artificial IntelligenceJul-25-2023

Training stability is of great importance to Transformers. In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers. In particular, we track the attention entropy for each attention head during the course of training, which is a proxy for model sharpness. We identify a common pattern across different architectures and tasks, where low attention entropy is accompanied by high training instability, which can take the form of oscillating loss or divergence. We denote the pathologically low attention entropy, corresponding to highly concentrated attention scores, as $\textit{entropy collapse}$. As a remedy, we propose $\sigma$Reparam, a simple and efficient solution where we reparametrize all linear layers with spectral normalization and an additional learned scalar. We demonstrate that $\sigma$Reparam successfully prevents entropy collapse in the attention layers, promoting more stable training. Additionally, we prove a tight lower bound of the attention entropy, which decreases exponentially fast with the spectral norm of the attention logits, providing additional motivation for our approach. We conduct experiments with $\sigma$Reparam on image classification, image self-supervised learning, machine translation, speech recognition, and language modeling tasks. We show that $\sigma$Reparam provides stability and robustness with respect to the choice of hyperparameters, going so far as enabling training (a) a Vision Transformer {to competitive performance} without warmup, weight decay, layer normalization or adaptive optimizers; (b) deep architectures in machine translation and (c) speech recognition to competitive performance without warmup and adaptive optimizers. Code is available at \url{https://github.com/apple/ml-sigma-reparam}.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.06296

Country:

Asia > Middle East > Israel (0.14)
North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Value function estimation using conditional diffusion models for control

Mazoure, Bogdan, Talbott, Walter, Bautista, Miguel Angel, Hjelm, Devon, Toshev, Alexander, Susskind, Josh

arXiv.org Artificial IntelligenceJun-9-2023

A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data. As the appetite for large models increases, it is imperative to address, sooner than later, the potential problem of running out of high-quality demonstrations. In this case, instead of collecting only new data via costly human demonstrations or risking a simulation-to-real transfer with uncertain effects, it would be beneficial to leverage vast amounts of readily-available low-quality data. Since classical control algorithms such as behavior cloning or temporal difference learning cannot be used on reward-free or action-free data out-of-the-box, this solution warrants novel training paradigms for continuous control. We propose a simple algorithm called Diffused Value Function (DVF), which learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model. This model can be efficiently learned from state sequences (i.e., without access to reward functions nor actions), and subsequently used to estimate the value of each action out-of-the-box. We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers, and show promising qualitative and quantitative results on challenging robotics benchmarks.

diffusion model, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2306.0729

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback