AITopics | text image

Collaborating Authors

text image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Irreversible Attack: Fooling Scene Text Recognition via Multi-Population Coevolution Search

Neural Information Processing SystemsJun-23-2026, 02:03:07 GMT

Recent work has shown that scene text recognition (STR) models are vulnerable to adversarial examples. Different from non-sequential vision tasks, the output sequence of STR models contains rich information. However, existing adversarial attacks against STR models can only lead to a few incorrect characters in the predicted text. These attack results still carry partial information about the original prediction and could be easily corrected by an external dictionary or a language model. Therefore, we propose the Multi-Population Coevolution Search (MPCS) method to attack each character in the image. We first decompose the global optimization objective into sub-objectives to solve the attack pixel concentration problem existing in previous attack methods. While this distributed optimization paradigm brings a new joint perturbation shift problem, we propose a novel coevolution energy function to solve it. Experiments on recent STR models show the superiority of our method.

machine learning, pattern recognition, pixel, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Add feedback

Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis

Neural Information Processing SystemsJun-22-2026, 23:24:13 GMT

Existing Text Image Forgery Localization (T-IFL) methods often suffer from poor generalization due to the limited scale of real-world datasets and the distribution gap caused by synthetic data that fails to capture the complexity of real-world tampering. To tackle this issue, we propose Fourier Series-based Tampering Synthesis (FSTS), a structured and interpretable framework for synthesizing tampered text images. FSTS first collects 16,750 real-world tampering instances from five representative tampering types, using a structured pipeline that records human-performed editing traces via multi-format logs (e.g., video, PSD, and editing logs). By analyzing these collected parameters and identifying recurring behavioral patterns at both individual and population levels, we formulate a hierarchical modeling framework. Specifically, each individual tampering parameter is represented as a compact combination of basis operation-parameter configurations, while the population-level distribution is constructed by aggregating these behaviors. Since this formulation draws inspiration from the Fourier series, it enables an interpretable approximation using basis functions and their learned weights. By sampling from this modeled distribution, FSTS synthesizes diverse and realistic training data that better reflect real-world forgery traces. Extensive experiments across four evaluation protocols demonstrate that models trained with FSTS data achieve significantly improved generalization on real-world datasets. Dataset is available at Project Page.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Instance-Level Composed Image Retrieval

Neural Information Processing SystemsJun-17-2026, 20:47:15 GMT

The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training and evaluation data. We introduce a new evaluation dataset, i-CIR, which, unlike existing datasets, focuses on an instancelevel class definition. The goal is to retrieve images that contain the same particular object as the visual query, presented under a variety of modifications defined by textual queries. Its design and curation process keep the dataset compact to facilitate future research, while maintaining its challenge--comparable to retrieval among more than 40M random distractors--through a semi-automated selection of hard negatives.

large language model, machine learning, natural language, (24 more...)

Neural Information Processing Systems

Country: Europe > Czechia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Consumer Products & Services (0.67)
Media (0.67)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
(2 more...)

Add feedback

Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation Decoders

Neural Information Processing SystemsJun-17-2026, 11:38:42 GMT

The introduction of generative models has significantly advanced image superresolution (SR) in handling real-world degradations. However, they often incur fidelity-related issues, particularly distorting textual structures. In this paper, we introduce a novel diffusion-based SR framework, namely TADiSR, which integrates text-aware attention and joint segmentation decoders to recover not only natural details but also the structural fidelity of text regions in degraded real-world images. Moreover, we propose a complete pipeline for synthesizing high-quality images with fine-grained full-image text masks, combining realistic foreground text regions with detailed background content. Extensive experiments demonstrate that our approach substantially enhances text legibility in super-resolved images, achieving state-of-the-art performance across multiple evaluation metrics and exhibiting strong generalization to real-world scenarios. Our code is available at here.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.46)

Add feedback

TextDiffuser: Diffusion Models as Text Painters

Neural Information Processing SystemsApr-25-2026, 15:46:49 GMT

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout. Additionally, we contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs with text recognition, detection, and character-level segmentation annotations. We further collect the MARIO-Eval benchmark to serve as a comprehensive tool for evaluating text rendering quality.

large language model, machine learning, textdiffuser, (20 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report (0.46)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

TextDiffuser: Diffusion Models as Text Painters

Neural Information Processing SystemsFeb-19-2026, 02:07:35 GMT

TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout.

large language model, machine learning, textdiffuser, (20 more...)

Neural Information Processing Systems

Country: