AITopics | textboxe

Collaborating Authors

textboxe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary for Paper2Poster: Benchmarking Multimodal Poster Automation from Scientific Papers

Neural Information Processing SystemsJun-15-2026, 03:27:31 GMT

AAblation Study1 We conduct ablation studies to evaluate three key design choices in PosterAgent: (1) the binary-tree2 layout strategy for layout planning; (2) the inclusion of a commenter module as a visual critic; and3 (3) the use of in-context examples to enhance the visual perception capabilities of the commenter.4 We define the following variants:5 Direct: replacing the binary-tree layout with direct layout generation by an LLM;6 Tree: using the binary-tree layout strategy but removing the commenter module;7 Tree + Commenter: including the commenter module but without in-context examples;8 Tree + Commenter + IC: the full system, with both the commenter and in-context examples.9 All ablation variants are implemented using PosterAgent-4o, keeping all other components un-10 changed to isolate the effect of each factor. We visualize and compare results across five randomly11 selected papers from Paper2Poster, as shown in Figures 1 to 5.12 When prompting the LLM to directly generate poster layouts (Direct), the results are often structurally13 compromised (e.g., Figures 1a-3a), or resemble blog-style layouts that lack visual hierarchy and14 appeal (Figures 4a,5a). Fine-grained layout components, such as text boxes and figures, are especially15 challenging to synthesize in this setting: for instance, Figures1a-4a exhibit missing text boxes that16 leave noticeable blank areas, and Figure 4a fails to preserve the correct aspect ratio of figures.17

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

Soykan, Gürkan, Yuret, Deniz, Sezgin, Tevfik Metin

arXiv.org Artificial IntelligenceDec-27-2022

This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.

machine learning, natural language, pattern recognition, (18 more...)

arXiv.org Artificial Intelligence

2212.14674

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

Liao, Minghui (Huazhong University of Science and Technology) | Shi, Baoguang (Huazhong University of Science and Technology) | Bai, Xiang (Huazhong University of Science and Technology) | Wang, Xinggang (Huazhong University of Science and Technology) | Liu, Wenyu (Huazhong University of Science and Technology)

AAAI ConferencesFeb-14-2017

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression. TextBoxes outperforms competing methods in terms of text localization accuracy and is much faster, taking only 0.09s per image in a fast implementation. Furthermore, combined with a text recognizer, TextBoxes significantly outperforms state-of-the-art approaches on word spotting and end-to-end text recognition tasks.

artificial intelligence, machine learning, textboxe, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback