AITopics | background

Collaborating Authors

background

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What if the Universe Isn't as Uniform as Scientists Think?

WIREDJul-6-2026, 09:30:00 GMT

What if the Universe Isn't as Uniform as Scientists Think? A study based on 47 million galaxies found that the cosmic web retains patterns on enormous scales, which could force a reevaluation of a pillar of cosmology. One of the fundamental pillars of modern cosmology may be beginning to wobble. A study published in Nature has found evidence that the universe may not behave the same way in every direction on the largest observable scales. "What we found is a network of enormous filaments and walls of galaxies that remain aligned and interconnected across billions of light-years," says Francesco Sylos Labini, research director of physics at the Enrico Fermi Research Center in Italy and the study's lead author. What Should the Universe Look Like?

artificial intelligence, galaxy, universe, (10 more...)

WIRED

Country:

North America > United States (0.29)
Europe > Italy (0.25)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.48)

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

Robustness in Both Domains: CLIP Needs a Robust Text Encoder

Neural Information Processing SystemsJun-23-2026, 03:42:16 GMT

Adversarial input attacks can cause a significant shift of CLIP embeddings. This can affect the downstream robustness of models incorporating CLIP in the pipeline, such as text-to-image generative models or large vision language models. While some efforts have been done towards making the CLIP image encoders robust, the robustness of text encoders remains unexplored. In this work, we cover this gap in the literature. We propose LEAF: an efficient adversarial finetuning method for the text domain, with the ability to scale to large CLIP models. Our models significantly improve the zero-shot adversarial accuracy in the text domain, while maintaining the vision performance provided by robust image encoders. When combined with text-to-image diffusion models, we can improve the generation quality under adversarial noise. In multimodal retrieval tasks, LEAF improves the recall under adversarial noise over standard CLIP models. Finally, we show that robust text encoders facilitate better reconstruction of input text from its embedding via direct optimization.

large language model, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.28)
North America > Canada (0.28)

Industry: Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Mitigating Occlusions in Virtual Try-On via A Simple-Yet-Effective Mask-Free Framework

Neural Information Processing SystemsJun-23-2026, 02:04:36 GMT

This paper investigates the occlusion problems in virtual try-on (VTON) tasks. According to how they affect the try-on results, the occlusion issues of existing VTON methods can be grouped into two categories: (1) Inherent Occlusions, which are the ghosts of the clothing from reference input images that exist in the try-on results.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.30)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

Neural Information Processing SystemsJun-22-2026, 23:38:34 GMT

Multimodal large language models have unlocked new possibilities for various multimodal tasks. However, their potential in image manipulation detection remains unexplored. When directly applied to the IMD task, M-LLMs often produce reasoning texts that suffer from hallucinations and overthinking. To address this, we propose ForgerySleuth, which leverages M-LLMs to perform comprehensive clue fusion and generate segmentation outputs indicating specific regions that are tampered with. Moreover, we construct the ForgeryAnalysis dataset through the Chain-of-Clues prompt, which includes analysis and reasoning text to upgrade the image manipulation detection task. A data engine is also introduced to build a largerscale dataset for the pre-training phase. Our extensive experiments demonstrate the effectiveness of ForgeryAnalysis and show that ForgerySleuth significantly outperforms existing methods in generalization, robustness, and explainability.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (0.68)
Information Technology > Security & Privacy (0.67)
Media > Photography (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition

Neural Information Processing SystemsJun-22-2026, 16:39:27 GMT

Object-context shortcuts remain a persistent challenge in vision-language models, undermining zero-shot reliability when test-time scenes differ from familiar training co-occurrences. We recast this issue as a causal inference problem and ask: Would the prediction remain if the object appeared in a different environment? To answer this at inference time, we estimate object and background expectations within CLIP's representation space, and synthesize counterfactual embeddings by recombining object features with diverse alternative contexts sampled from external datasets, batch neighbors, or text-derived descriptions. By estimating the Total Direct Effect and simulating intervention, we further subtract background-only activation, preserving beneficial object-context interactions while mitigating hallucinated scores. Without retraining or prompt design, our method substantially improves both worst-group and average accuracy on context-sensitive benchmarks, establishing a new zero-shot state of the art. Beyond performance, our framework provides a lightweight representation-level counterfactual approach, offering a practical causal avenue for debiased and reliable multimodal reasoning. The implementation is available at https://github.com/peipeng98.

accuracy, large language model, natural language, (20 more...)

Neural Information Processing Systems

Country: Europe (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry:

Transportation > Ground > Road (0.45)
Transportation > Infrastructure & Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LayerCraft: Enhancing Text-to-Image Generation with CoTReasoning and Layered Object Integration

Neural Information Processing SystemsJun-22-2026, 13:52:12 GMT

Text-to-image (T2I) generation has made remarkable progress, yet existing systems still lack intuitive control over spatial composition, object consistency, and multistep editing. We present LayerCraft, a modular framework that uses large language models (LLMs) as autonomous agents to orchestrate structured, layered image generation and editing. LayerCraft supports two key capabilities: (1) structured generation from simple prompts via chain-of-thought (CoT) reasoning, enabling it to decompose scenes, reason about object placement, and guide composition in a controllable, interpretable manner; and (2) layered object integration, allowing users to insert and customize objects--such as characters or props--across diverse images or scenes while preserving identity, context, and style. The system comprises a coordinator agent, the ChainArchitect for CoT-driven layout planning, and the Object Integration Network (OIN) for seamless image editing using off-the-shelf T2I models without retraining. Through applications like batch collage editing and narrative scene generation, LayerCraft empowers non-experts to iteratively design, customize, and refine visual content with minimal manual effort.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Media (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement

Neural Information Processing SystemsJun-22-2026, 09:03:12 GMT

Latent Diffusion Models (LDMs) have markedly advanced the quality of image inpainting and local editing. However, the inherent latent compression often introduces pixel-level inconsistencies, such as chromatic shifts, texture mismatches, and visible seams along editing boundaries. Existing remedies, including backgroundconditioned latent decoding and pixel-space harmonization, usually fail to fully eliminate these artifacts in practice and do not generalize well across different latent representations or tasks. We introduce PixPerfect, a pixel-level refinement framework that delivers seamless, high-fidelity local edits across diverse LDM architectures and tasks. PixPerfect leverages (i) a differentiable discriminative pixel space that amplifies and suppresses subtle color and texture discrepancies, (ii) a comprehensive artifact simulation pipeline that exposes the refiner to realistic local editing artifacts during training, and (iii) a direct pixel-space refinement scheme that ensures broad applicability across diverse latent representations and tasks. Extensive experiments on inpainting, object removal, and insertion benchmarks demonstrate that PixPerfect substantially enhances perceptual fidelity and downstream editing performance, establishing a new standard for robust and high-fidelity localized image editing.

artificial intelligence, machine learning, proceedings, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models

Neural Information Processing SystemsJun-19-2026, 18:48:10 GMT

For large language models (LLMs), sparse autoencoders (SAEs) have been shown to decompose intermediate representations that often are not interpretable directly into sparse sums of interpretable features, facilitating better control and subsequent analysis. However, similar analysesTextand approaches have been lacking for text-toimage models. We investigate the possibility of using SAEs to learn interpretable features for SDXLTurbo, a few-step text-to-image diffusion model. To this end, SDXL Basewe train SAEs on the updates performed by transformer blocks within SDXL 25 steps Turbo's denoising U-net in its 1-step setting. Interestingly, we find that they generalize to 4-step SDXLTurbo and even to the multi-step SDXL base model (i.e., a different model) without additional training. In addition, we show that their learned features are interpretable, causally influence the generation process, and reveal specialization among the blocks.

intervention, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country: Asia > Middle East (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CGS-GAN: 3DConsistent Gaussian Splatting GANs for High Resolution Human Head Synthesis

Neural Information Processing SystemsJun-19-2026, 02:19:27 GMT

Recently, 3DGANs based on 3DGaussian splatting have been proposed for high quality synthesis of human heads. However, existing methods stabilize training and enhance rendering quality from steep viewpoints by conditioning the random latent vector on the current camera position. This compromises 3D consistency, as we observe significant identity changes when re-synthesizing the 3D head with each camera shift. Conversely, fixing the camera to a single viewpoint yields high-quality renderings for that perspective but results in poor performance for novel views. Removing view-conditioning typically destabilizes GAN training, often causing the training to collapse.

artificial intelligence, generator, machine learning, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Information Technology (1.00)
Media > Photography (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

7f05193e5487287a890df7fbc3554427-Paper-Conference.pdf

Neural Information Processing SystemsJun-18-2026, 23:29:12 GMT

These qualitative results exhibit the superior performance of our approach in transferring various reference components, including characters, garments, backgrounds, and motions, to synthesize the new target videos.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > UAE (0.28)

Genre: