AITopics

Tokenization is widely used in large language models because it significantly improves performance. However, tokenization imposes several disadvantages, such as performance biases, increased adversarial vulnerability, decreased characterlevel modeling performance, and increased modeling complexity. To address these disadvantages without sacrificing performance, we propose SpaceByte, a novel byte-level decoder architecture that closes the performance gap between byte-level and subword autoregressive language modeling. SpaceByte consists of a byte-level Transformer model, but with extra larger transformer blocks inserted in the middle of the layers. We find that performance is significantly improved by applying these larger blocks only after certain bytes, such as space characters, which typically denote word boundaries. Our experiments show that for a fixed training and inference compute budget, SpaceByte outperforms other byte-level architectures and roughly matches the performance of tokenized Transformer architectures.

large language model, machine learning, spacebyte, (22 more...)

Neural Information Processing Systems

Country:

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.85)

Add feedback

Supplementary Material for " Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network "

Neural Information Processing SystemsMar-27-2025, 12:14:03 GMT

For the full data case, we have ς = 0, i.e., the gradient

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network

Neural Information Processing SystemsMar-27-2025, 12:14:00 GMT

Sufficient dimension reduction is a powerful tool to extract core information hidden in the high-dimensional data and has potentially many important applications in machine learning tasks. However, the existing nonlinear sufficient dimension reduction methods often lack the scalability necessary for dealing with large-scale data. We propose a new type of stochastic neural network under a rigorous probabilistic framework and show that it can be used for sufficient dimension reduction for large-scale data. The proposed stochastic neural network is trained using an adaptive stochastic gradient Markov chain Monte Carlo algorithm, whose convergence is rigorously studied in the paper as well. Through extensive experiments on real-world classification and regression problems, we show that the proposed method compares favorably with the existing state-of-the-art sufficient dimension reduction methods and is computationally more efficient for large-scale data.

artificial intelligence, machine learning, neural network, (16 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Add feedback

Identifying Latent State-Transition Processes for Individualized Reinforcement Learning

Neural Information Processing SystemsMar-27-2025, 12:13:54 GMT

The application of reinforcement learning (RL) involving interactions with individuals has grown significantly in recent years. These interactions, influenced by factors such as personal preferences and physiological differences, causally influence state transitions, ranging from health conditions in healthcare to learning progress in education. As a result, different individuals may exhibit different state-transition processes. Understanding individualized state-transition processes is essential for optimizing individualized policies. In practice, however, identifying these state-transition processes is challenging, as individual-specific factors often remain latent. In this paper, we establish the identifiability of these latent factors and introduce a practical method that effectively learns these processes from observed state-action trajectories. Experiments on various datasets show that the proposed method can effectively identify latent state-transition processes and facilitate the learning of individualized RL policies.

data mining, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (0.67)

Industry:

Health & Medicine > Consumer Health (1.00)
Education (1.00)
Information Technology > Security & Privacy (0.92)
Health & Medicine > Therapeutic Area > Endocrinology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

GeoTMI: Predicting Quantum Chemical Property with Easy-to-Obtain Geometry via Positional Denoising Jeheon Woo

Neural Information Processing SystemsMar-27-2025, 12:13:46 GMT

As quantum chemical properties have a dependence on their geometries, graph neural networks (GNNs) using 3D geometric information have achieved high prediction accuracy in many tasks. However, they often require 3D geometries obtained from high-level quantum mechanical calculations, which are practically infeasible, limiting their applicability to real-world problems. To tackle this, we propose a new training framework, GeoTMI, that employs denoising process to predict properties accurately using easy-to-obtain geometries (corrupted versions of correct geometries, such as those obtained from low-level calculations). Our starting point was the idea that the correct geometry is the best description of the target property. Hence, to incorporate information of the correct, GeoTMI aims to maximize mutual information between three variables: the correct and the corrupted geometries and the property. GeoTMI also explicitly updates the corrupted input to approach the correct geometry as it passes through the GNN layers, contributing to more effective denoising. We investigated the performance of the proposed method using 3D GNNs for three prediction tasks: molecular properties, a chemical reaction property, and relaxed energy in a heterogeneous catalytic system. Our results showed consistent improvements in accuracy across various tasks, demonstrating the effectiveness and robustness of GeoTMI.

artificial intelligence, geometry, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

903c5eb12f2389c4847574df90503d63-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 12:13:42 GMT

artificial intelligence, geometry, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Almost Surely Asymptotically Constant Graph Neural Networks

Neural Information Processing SystemsMar-27-2025, 12:13:36 GMT

We present a new angle on the expressive power of graph neural networks (GNNs) by studying how the predictions of real-valued GNN classifiers, such as those classifying graphs probabilistically, evolve as we apply them on larger graphs drawn from some random graph model. We show that the output converges to a constant function, which upper-bounds what these classifiers can uniformly express. This strong convergence phenomenon applies to a very wide class of GNNs, including state of the art models, with aggregates including mean and the attention-based mechanism of graph transformers. Our results apply to a broad class of random graph models, including sparse and dense variants of the Erdős-Rényi model, the stochastic block model, and the Barabási-Albert model. We empirically validate these findings, observing that the convergence phenomenon appears not only on random graphs but also on some real-world graphs.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

af2bb2b2280d36f8842e440b4e275152-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 12:13:35 GMT

artificial intelligence, health & medicine, machine learning, (16 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

af2bb2b2280d36f8842e440b4e275152-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 12:13:31 GMT

artificial intelligence, information technology services, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Data Science > Data Mining (0.94)

Add feedback

A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning

Neural Information Processing SystemsMar-27-2025, 12:13:19 GMT

The performance of modern reinforcement learning algorithms critically relies on tuning ever increasing numbers of hyperparameters. Often, small changes in a hyperparameter can lead to drastic changes in performance, and different environments require very different hyperparameter settings to achieve state-of-the-art performance reported in the literature. We currently lack a scalable and widely accepted approach to characterizing these complex interactions. This work proposes a new empirical methodology for studying, comparing, and quantifying the sensitivity of an algorithm's performance to hyperparameter tuning for a given set of environments. We then demonstrate the utility of this methodology by assessing the hyperparameter sensitivity of several commonly used normalization variants of PPO. The results suggest that several algorithmic performance improvements may, in fact, be a result of an increased reliance on hyperparameter tuning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Slagle - 2024 - SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Supplementary Material for " Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network "

Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network

Identifying Latent State-Transition Processes for Individualized Reinforcement Learning

GeoTMI: Predicting Quantum Chemical Property with Easy-to-Obtain Geometry via Positional Denoising Jeheon Woo

903c5eb12f2389c4847574df90503d63-Paper-Conference.pdf

Almost Surely Asymptotically Constant Graph Neural Networks

af2bb2b2280d36f8842e440b4e275152-Supplemental-Conference.pdf

af2bb2b2280d36f8842e440b4e275152-Paper-Conference.pdf

A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning