AITopics | ocr

Collaborating Authors

ocr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mechanism Design via the Interim Relaxation

Neural Information Processing SystemsJun-22-2026, 18:11:04 GMT

We study revenue maximization for agents with additive preferences, subject to downward-closed constraints on the set of feasible allocations. In seminal work, Alaei [Ala14] introduced a powerful multi-to-single agent reduction based on an ex-ante relaxation of the multi-agent problem. This reduction employs a rounding procedure which is an online contention resolution scheme (OCRS) in disguise, a now widely-used method for rounding fractional solutions in online Bayesian and stochastic optimization problems. In this paper, we leverage our vantage point, 10 years after the work of Alaei, with a rich OCRS toolkit and modern approaches to analyzing multi-agent mechanisms; we introduce a general framework for designing non-sequential and sequential multi-agent, revenue-maximizing mechanisms, capturing a wide variety of problems Alaei's framework could not address. Our framework uses an interim relaxation, that is rounded to a feasible mechanism using what we call a two-level OCRS, which allows for some structured dependence between the activation of its input elements. For a wide family of constraints, we can construct such schemes using existing OCRSs as a black box; for other constraints, such as knapsack, we construct such schemes from scratch. We demonstrate numerous applications of our framework, including a sequential mechanism that guarantees a 2ee 1 3.16 approximation to the optimal revenue for the case of additive agents subject to matroid feasibility constraints. The simplicity of our developed two-level CRSs and OCRSs highlights the strength of our framework: even with a simple analysis, it yields state-of-the-art approximation guarantees across a wide range of settings. Finally, we show how it naturally extends to multi-parameter procurement auctions.

artificial intelligence, constraint, mechanism, (16 more...)

Neural Information Processing Systems

Country:

North America (0.46)
South America > Chile (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Mechanism Design via the Interim Relaxation

Neural Information Processing SystemsJun-14-2026, 04:06:23 GMT

We study revenue maximization for agents with additive preferences, subject to downward-closed constraints on the set of feasible allocations. In seminal work,~\citet{alaei2014bayesian} introduced a powerful multi-to-single agent reduction based on an ex-ante relaxation of the multi-agent problem. This reduction employs a rounding procedure which is an online contention resolution scheme (OCRS) in disguise, a now widely-used method for rounding fractional solutions in online Bayesian and stochastic optimization problems. In this paper, we leverage our vantage point, 10 years after the work of Alaei, with a rich OCRS toolkit and modern approaches to analyzing multi-agent mechanisms; we introduce a general framework for designing non-sequential and sequential multi-agent, revenue-maximizing mechanisms, capturing a wide variety of problems Alaei's framework could not address. Our framework uses an \emph{interim} relaxation, that is rounded to a feasible mechanism using what we call a two-level OCRS, which allows for some structured dependence between the activation of its input elements. For a wide family of constraints, we can construct such schemes using existing OCRSs as a black box; for other constraints, such as knapsack, we construct such schemes from scratch. We demonstrate numerous applications of our framework, including a sequential mechanism that guarantees a $\frac{2e}{e-1} \approx 3.16$ approximation to the optimal revenue for the case of additive agents subject to matroid feasibility constraints. Finally, we show how our framework can be easily extended to multi-parameter procurement auctions, where we provide an OCRS for Stochastic Knapsack that might be of independent interest.

artificial intelligence, name change, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

ffeb860479ccae44d84c0de32acd693d-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-18-2026, 04:22:47 GMT

american story, chronicling america, dataset, (15 more...)

Neural Information Processing Systems

Country:

North America > Panama (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Industry:

Law (1.00)
Information Technology (1.00)
Government > Regional Government (0.46)
Media > News (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Add feedback

open

Neural Information Processing SystemsFeb-16-2026, 11:44:11 GMT

We create GTA (a benchmark forGeneral Tool Agents) to evaluate the general tool-use ability ofLLMs inreal-worldscenarios. Who created the dataset (e.g., which team, research group) and on behalf of which entity(e.g.,company,institution,organization)?

artificial intelligence, large language model, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Industry:

Health & Medicine (0.46)
Consumer Products & Services (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Structured Extraction from Business Process Diagrams Using Vision-Language Models

Deka, Pritam, Devereux, Barry

arXiv.org Artificial IntelligenceDec-1-2025

Business Process Model and Notation (BPMN) is a widely adopted standard for representing complex business workflows. While BPMN diagrams are often exchanged as visual images, existing methods primarily rely on XML representations for computational analysis. In this work, we present a pipeline that leverages Vision-Language Models (VLMs) to extract structured JSON representations of BPMN diagrams directly from images, without requiring source model files or textual annotations. We also incorporate optical character recognition (OCR) for textual enrichment and evaluate the generated element lists against ground truth data derived from the source XML files. Our approach enables robust component extraction in scenarios where original source files are unavailable. We benchmark multiple VLMs and observe performance improvements in several models when OCR is used for text enrichment. In addition, we conducted extensive statistical analyses of OCR-based enrichment methods and prompt ablation studies, providing a clearer understanding of their impact on model performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.22448

Country: Europe > Greece (0.15)

Genre:

Workflow (0.85)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.88)

Add feedback

Gated Recurrent Convolution Neural Network for OCR

Neural Information Processing SystemsNov-21-2025, 16:02:04 GMT

Optical Character Recognition (OCR) aims to recognize text in natural images. Inspired by a recently proposed model for general image classification, Recurrent Convolution Neural Network (RCNN), we propose a new architecture named Gated RCNN (GRCNN) for solving this problem. Its critical component, Gated Recurrent Convolution Layer (GRCL), is constructed by adding a gate to the Recurrent Convolution Layer (RCL), the critical component of RCNN. The gate controls the context modulation in RCL and balances the feed-forward information and the recurrent information. In addition, an efficient Bidirectional Long Short-Term Memory (BLSTM) is built for sequence modeling. The GRCNN is combined with BLSTM to recognize text in natural images. The entire GRCNN-BLSTM model can be trained end-to-end. Experiments show that the proposed model outperforms existing methods on several benchmark datasets including the IIIT-5K, Street View Text (SVT) and ICDAR.

gated recurrent convolution neural network, name change, recurrent convolution neural network, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibility Data

Jiao, Yifeng, Liu, Yuchen, Zhang, Yu, Guo, Xin, Wu, Yushuai, Jiang, Chen, Li, Jiyang, Zhang, Hongwei, Han, Limei, Gao, Xin, Qi, Yuan, Cheng, Yuan

arXiv.org Artificial IntelligenceOct-28-2025

The advent of single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) offers an innovative perspective for deciphering regulatory mechanisms by assembling a vast repository of single-cell chromatin accessibility data. While foundation models have achieved significant success in single-cell transcriptomics, there is currently no foundation model for scATAC-seq that supports zero-shot high-quality cell identification and comprehensive multi-omics analysis simultaneously. Key challenges lie in the high dimensionality and sparsity of scATAC-seq data, as well as the lack of a standardized schema for representing open chromatin regions (OCRs). Here, we present ChromFound, a foundation model tailored for scATAC-seq. ChromFound utilizes a hybrid architecture and genome-aware tokenization to effectively capture genome-wide long contexts and regulatory signals from dynamic chromatin landscapes. Pretrained on 1.97 million cells from 30 tissues and 6 disease conditions, ChromFound demonstrates broad applicability across 6 diverse tasks. Notably, it achieves robust zero-shot performance in generating universal cell representations and exhibits excellent transferability in cell type annotation and cross-omics prediction. By uncovering enhancer-gene links undetected by existing computational methods, ChromFound offers a promising framework for understanding disease risk variants in the noncoding genome.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.12638

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

ffeb860479ccae44d84c0de32acd693d-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-10-2025, 23:56:01 GMT

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > Panama (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(8 more...)

Industry:

Law (1.00)
Information Technology (1.00)
Government > Regional Government (0.46)
Media > News (0.32)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

8a75ee6d4b2eb0b777f549a32a5a5c28-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 08:51:41 GMT

final answer, query, rtx 4 0 7 0, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.67)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.67)

Industry:

Consumer Products & Services (0.67)
Government > Regional Government > North America Government > United States Government (0.46)
Transportation > Air (0.46)
Food & Agriculture > Agriculture (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Communications (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR

Vempati, Shashank, Anand, Nishit, Talebailkar, Gaurav, Garai, Arpan, Arora, Chetan

arXiv.org Artificial IntelligenceSep-1-2025

Conventional optical character recognition (OCR) techniques segmented each character and then recognized. This made them prone to error in character segmentation, and devoid of context to exploit language models. Advances in sequence to sequence translation in last decade led to modern techniques first detecting words and then inputting one word at a time to a model to directly output full words as sequence of characters. This allowed better utilization of language models and bypass error-prone character segmentation step. We observe that the above transition in style has moved the bottleneck in accuracy to word segmentation. Hence, in this paper, we propose a natural and logical progression from word level OCR to line-level OCR. The proposal allows to bypass errors in word detection, and provides larger sentence context for better utilization of language models. We show that the proposed technique not only improves the accuracy but also efficiency of OCR. Despite our thorough literature survey, we did not find any public dataset to train and benchmark such shift from word to line-level OCR. Hence, we also contribute a meticulously curated dataset of 251 English page images with line-level annotations. Our experimentation revealed a notable end-to-end accuracy improvement of 5.4%, underscoring the potential benefits of transitioning towards line-level OCR, especially for document images. We also report a 4 times improvement in efficiency compared to word-based pipelines. With continuous improvements in large language models, our methodology also holds potential to exploit such advances. Project Website: https://nishitanand.github.io/line-level-ocr-website

machine learning, natural language, recognition, (20 more...)

arXiv.org Artificial Intelligence

2508.21693

Country:

North America > United States (0.46)
Asia > India (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback