AITopics | natural language

Collaborating Authors

natural language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Locally Hierarchical Auto-Regressive Modeling for Image Generation Supplementary Document

Neural Information Processing SystemsMay-31-2025, 00:44:00 GMT

A.1 HQ-VAE For designing HQ-VAE, we modify the encoder and decoder architectures of VQ-GAN [A1]; we set the stride of the first convolution layer in the encoder and the last transposed convolution layer in the decoder as two as mentioned in the main paper. The encoder takes an image x and returns encoded feature map z. We adopt HQ-VAE equipped with pixel-unshuffle and -shuffle for resampling. We use HQ-VAE (16 16) with f = 16 for the two-level HQ-TVAE implementation, which gives concise visual codes for efficient training of HQ-Transformer. We train HQ-VAE (16 16) for 50 epochs on the ImageNet training split.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

Fast and Memory-Efficient Exact Attention with IO-Awareness

Neural Information Processing SystemsMay-31-2025, 00:41:52 GMT

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware-- accounting for reads and writes between levels of GPU memory.

lash, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.67)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (0.67)

Industry:

Information Technology (0.93)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

FastDrag: Manipulate Anything in One Step

Neural Information Processing SystemsMay-30-2025, 23:39:51 GMT

Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt n-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-step dragbased image editing method, i.e., FastDrag, to accelerate the editing process. Central to our approach is a latent warpage function (LWF), which simulates the behavior of a stretched material to adjust the location of individual pixels within the latent space. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Meanwhile, null regions emerging after applying LWF are addressed by our proposed bilateral nearest neighbor interpolation (BNNI) strategy.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Media (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Switch Head: Accelerating Transformers with Mixture-of-Experts Attention Róbert Csordás 1 Piotr Piękos 2 Jürgen Schmidhuber

Neural Information Processing SystemsMay-30-2025, 23:39:28 GMT

Despite many recent works on Mixture of Experts (MoEs) for resource-efficient Transformer language models, existing methods mostly focus on MoEs for feedforward layers. Previous attempts at extending MoE to the self-attention layer fail to match the performance of the parameter-matched baseline. Our novel SwitchHead is an effective MoE method for the attention layer that successfully reduces both the compute and memory requirements, achieving wall-clock speedup, while matching the language modeling performance of the baseline Transformer. Our novel MoE mechanism allows SwitchHead to compute up to 8 times fewer attention matrices than the standard Transformer. SwitchHead can also be combined with MoE feedforward layers, resulting in fully-MoE "SwitchAll" Transformers. For our 262M parameter model trained on C4, SwitchHead matches the perplexity of standard models with only 44% compute and 27% memory usage. Zero-shot experiments on downstream tasks confirm the performance of SwitchHead, e.g., achieving more than 3.5% absolute improvements on BliMP compared to the baseline with an equal compute resource.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia > Middle East (0.67)
North America > United States > California > Santa Clara County (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Chang Ma Cheng Yang

Neural Information Processing SystemsMay-30-2025, 23:23:20 GMT

Evaluating Large Language Models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications. However, the evaluation process presents substantial challenges. A primary obstacle is the benchmarking of agent performance across diverse scenarios within a unified framework, especially in maintaining partially-observable environments and ensuring multi-round interactions. Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.92)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation Jiahao Li1, Yang Lu

Neural Information Processing SystemsMay-30-2025, 23:22:57 GMT

Open-vocabulary semantic segmentation (OVSS) aims to segment unseen classes without corresponding labels. Existing Vision-Language Model (VLM)- based methods leverage VLM's rich knowledge to enhance additional explicit segmentation-specific networks, yielding competitive results, but at the cost of extensive training cost. To reduce the cost, we attempt to enable VLM to directly produce the segmentation results without any segmentation-specific networks. Prompt learning offers a direct and parameter-efficient approach, yet it falls short in guiding VLM for pixel-level visual classification. Therefore, we propose the Relationship Prompt Module (RPM), which generates the relationship prompt that directs VLM to extract pixel-level semantic embeddings suitable for OVSS. Moreover, RPM integrates with VLM to construct the Relationship Prompt Network (RPN), achieving OVSS without any segmentation-specific networks.

machine learning, natural language, segmentation, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Fujian Province (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Unity by Diversity: Improved Representation Learning for Multimodal VAEs

Neural Information Processing SystemsMay-30-2025, 23:02:58 GMT

Variational Autoencoders for multimodal data hold promise for many tasks in data analysis, such as representation learning, conditional generation, and imputation. Current architectures either share the encoder output, decoder input, or both across modalities to learn a shared representation. Such architectures impose hard constraints on the model. In this work, we show that a better latent representation can be obtained by replacing these hard constraints with a soft constraint. We propose a new mixture-of-experts prior, softly guiding each modality's latent representation towards a shared aggregate posterior. This approach results in a superior latent representation and allows each encoding to preserve information better from its uncompressed original features. In extensive experiments on multiple benchmark datasets and two challenging real-world datasets, we show improved learned latent representations and imputation of missing data modalities compared to existing methods.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

043ab21fc5a1607b381ac3896176dac6-AuthorFeedback.pdf

Neural Information Processing SystemsMay-30-2025, 23:01:55 GMT

Q7: What are word-word and word-paragraph co-occurrence statistics and how they are exploited?

artificial intelligence, natural language, similarity, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

MarioGPT: Open-Ended Text2Level Generation through Large Language Models

Neural Information Processing SystemsMay-30-2025, 23:01:01 GMT

Procedural Content Generation (PCG) is a technique to generate complex and diverse environments in an automated way. However, while generating content with PCG methods is often straightforward, generating meaningful content that reflects specific intentions and constraints remains challenging. Furthermore, many PCG algorithms lack the ability to generate content in an open-ended manner. Recently, Large Language Models (LLMs) have shown to be incredibly effective in many diverse domains. These trained LLMs can be fine-tuned, re-using information and accelerating training for new tasks. Here, we introduce MarioGPT, a fine-tuned GPT2 model trained to generate tile-based game levels, in our case Super Mario Bros levels. MarioGPT can not only generate diverse levels, but can be text-prompted for controllable level generation, addressing one of the key challenges of current PCG techniques. As far as we know, MarioGPT is the first text-to-level model and combined with novelty search it enables the generation of diverse levels with varying play-style dynamics (i.e.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-shot Forecasting of Multivariate Time Series

Neural Information Processing SystemsMay-30-2025, 22:41:13 GMT

Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on developing pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot forecasting tasks. However, they are limited by slow performance, high computational demands, and neglect of cross-channel and exogenous correlations. To address this, we introduce Tiny Time Mixers (TTM), a compact model (starting from 1M parameters) with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM, based on the light-weight TSMixer architecture, incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions with minimal model capacity. Additionally, it employs multi-level modeling to capture channel correlations and infuse exogenous signals during fine-tuning. TTM outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (0.92)

Industry: Information Technology (1.00)

Technology: