AITopics | donut

Collaborating Authors

donut

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Confidence Paradox: Can LLM Know When It's Wrong

Tripathi, Sahil, Nafis, Md Tabrez, Hussain, Imran, Gao, Jiechao

arXiv.org Artificial IntelligenceOct-29-2025

Document Visual Question Answering (DocVQA) models often produce overconfident or ethically misaligned responses, especially under uncertainty. Existing models like LayoutLMv3, UDOP, and DONUT focus on accuracy but lack ethical calibration. We propose HonestVQA, a model-agnostic, self-supervised framework that aligns model confidence with correctness using weighted loss and contrastive learning. We introduce two new metrics Honesty Score (H-Score) and Ethical Confidence Index (ECI)-to evaluate ethical alignment. HonestVQA improves accuracy and F1 by up to 4.3% across SpDocVQA, InfographicsVQA, and SROIE datasets, while reducing overconfidence. It also generalizes well across domains, achieving 78.9% accuracy and 76.1% F1-score.

large language model, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2506.23464

Country: Asia > India > NCT (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.35)

Add feedback

DONUT: Physics-aware Machine Learning for Real-time X-ray Nanodiffraction Analysis

Luo, Aileen, Zhou, Tao, Du, Ming, Holt, Martin V., Singer, Andrej, Cherukara, Mathew J.

arXiv.org Artificial IntelligenceJul-21-2025

Coherent X-ray scattering techniques are critical for investigating the fundamental structural properties of materials at the nanoscale. While advancements have made these experiments more accessible, real-time analysis remains a significant bottleneck, often hindered by artifacts and computational demands. In scanning X-ray nanodiffraction microscopy, which is widely used to spatially resolve structural heterogeneities, this challenge is compounded by the convolution of the divergent beam with the sample's local structure. To address this, we introduce DONUT (Diffraction with Optics for Nanobeam by Unsupervised Training), a physics-aware neural network designed for the rapid and automated analysis of nanobeam diffraction data. By incorporating a differentiable geometric diffraction model directly into its architecture, DONUT learns to predict crystal lattice strain and orientation in real-time. Crucially, this is achieved without reliance on labeled datasets or pre-training, overcoming a fundamental limitation for supervised machine learning in X-ray science. We demonstrate experimentally that DONUT accurately extracts all features within the data over 200 times more efficiently than conventional fitting methods.

artificial intelligence, machine learning, physics, (16 more...)

arXiv.org Artificial Intelligence

2507.14038

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Energy (0.69)
Government > Regional Government (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLMs can implicitly learn from mistakes in-context

Alazraki, Lisa, Mozes, Maximilian, Campos, Jon Ander, Tan, Yi Chern, Rei, Marek, Bartolo, Max

arXiv.org Artificial IntelligenceFeb-12-2025

Learning from mistakes is a fundamental feature of human intelligence. Previous work has shown that Large Language Models (LLMs) can also learn from incorrect answers when provided with a comprehensive rationale detailing why an answer is wrong or how to correct it. In this work, we examine whether LLMs can learn from mistakes in mathematical reasoning tasks when these explanations are not provided. We investigate if LLMs are able to implicitly infer such rationales simply from observing both incorrect and correct answers. Surprisingly, we find that LLMs perform better, on average, when rationales are eliminated from the context and incorrect answers are simply shown alongside correct ones. This approach also substantially outperforms chain-of-thought prompting in our evaluations. We show that these results are consistent across LLMs of different sizes and varying reasoning abilities. Further, we carry out an in-depth analysis, and show that prompting with both wrong and correct answers leads to greater performance and better generalisation than introducing additional, more diverse question-answer pairs into the context. Finally, we show that new rationales generated by models that have only observed incorrect and correct answers are scored equally as highly by humans as those produced with the aid of exemplar rationales. Our results demonstrate that LLMs are indeed capable of in-context implicit learning.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.0855

Country:

North America > United States (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.69)
Education (0.45)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs

Zmigrod, Ran, Shetty, Pranav, Sibue, Mathieu, Ma, Zhiqiang, Nourbakhsh, Armineh, Liu, Xiaomo, Veloso, Manuela

arXiv.org Artificial IntelligenceOct-20-2024

The rise of large language models (LLMs) for visually rich document understanding (VRDU) has kindled a need for prompt-response, document-based datasets. As annotating new datasets from scratch is labor-intensive, the existing literature has generated prompt-response datasets from available resources using simple templates. For the case of key information extraction (KIE), one of the most common VRDU tasks, past work has typically employed the template "What is the value for the {key}?". However, given the variety of questions encountered in the wild, simple and uniform templates are insufficient for creating robust models in research and industrial contexts. In this work, we present K2Q, a diverse collection of five datasets converted from KIE to a prompt-response format using a plethora of bespoke templates. The questions in K2Q can span multiple entities and be extractive or boolean. We empirically compare the performance of seven baseline generative models on K2Q with zero-shot prompting. We further compare three of these models when training on K2Q versus training on simpler templates to motivate the need of our work. We find that creating diverse and intricate KIE questions enhances the performance and robustness of VRDU models. We hope this work encourages future studies on data quality for generative model training.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.findings-emnlp.770

2410.15484

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(18 more...)

Genre: Research Report (0.82)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing

Zmigrod, Ran, Ma, Zhiqiang, Nourbakhsh, Armineh, Shah, Sameena

arXiv.org Artificial IntelligenceFeb-7-2024

Visually Rich Form Understanding (VRFU) poses a complex research problem due to the documents' highly structured nature and yet highly variable style and content. Current annotation schemes decompose form understanding and omit key hierarchical structure, making development and evaluation of end-to-end models difficult. In this paper, we propose a novel F1 metric to evaluate form parsers and describe a new content-agnostic, tree-based annotation scheme for VRFU: TreeForm. We provide methods to convert previous annotation schemes into TreeForm structures and evaluate TreeForm predictions using a modified version of the normalized tree-edit distance. We present initial baselines for our end-to-end performance metric and the TreeForm edit distance, averaged over the FUNSD and XFUND datasets, of 61.5 and 26.4 respectively. We hope that TreeForm encourages deeper research in annotating, modeling, and evaluating the complexities of form-like documents.

annotation, computational linguistic, treeform, (14 more...)

arXiv.org Artificial Intelligence

2402.05282

Country:

Oceania > Australia > New South Wales > Sydney (0.05)
North America > Canada > Quebec > Montreal (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(7 more...)

Genre:

Overview (0.68)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.54)

Add feedback

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Cao, Haoyu, Bao, Changcun, Liu, Chaohu, Chen, Huang, Yin, Kun, Liu, Hao, Liu, Yinsong, Jiang, Deqiang, Sun, Xing

arXiv.org Artificial IntelligenceSep-3-2023

We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. Unlike state-of-the-art approaches that rely on multi-stage technical schemes and are computationally expensive, SeRum converts document image understanding and recognition tasks into a local decoding process of the visual tokens of interest, using a content-aware token merge module. This mechanism enables the model to pay more attention to regions of interest generated by the query decoder, improving the model's effectiveness and speeding up the decoding speed of the generative scheme. We also designed several pre-training tasks to enhance the understanding and local awareness of the model. Experimental results demonstrate that SeRum achieves state-of-the-art performance on document understanding tasks and competitive results on text spotting tasks. SeRum represents a substantial advancement towards enabling efficient and effective end-to-end document understanding.

computational linguistic, information, serum, (15 more...)

arXiv.org Artificial Intelligence

2309.01131

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
(15 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(2 more...)

Add feedback

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

Lee, Kenton, Joshi, Mandar, Turc, Iulia, Hu, Hexiang, Liu, Fangyu, Eisenschlos, Julian, Khandelwal, Urvashi, Shaw, Peter, Chang, Ming-Wei, Toutanova, Kristina

arXiv.org Artificial IntelligenceJun-15-2023

Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to this diversity, previous work has typically relied on domain-specific recipes with limited sharing of the underlying data, model architectures, and objectives. We present Pix2Struct, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct is pretrained by learning to parse masked screenshots of web pages into simplified HTML. The web, with its richness of visual elements cleanly reflected in the HTML structure, provides a large source of pretraining data well suited to the diversity of downstream tasks. Intuitively, this objective subsumes common pretraining signals such as OCR, language modeling, image captioning. In addition to the novel pretraining strategy, we introduce a variable-resolution input representation and a more flexible integration of language and vision inputs, where language prompts such as questions are rendered directly on top of the input image. For the first time, we show that a single pretrained model can achieve state-of-the-art results in six out of nine tasks across four domains: documents, illustrations, user interfaces, and natural images.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.03347

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Consumer Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.44)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.34)

Add feedback

Supermassive black hole: First EVER full resolution photo is revealed

Daily Mail - Science & techApr-13-2023, 11:12:02 GMT

It is a thing of mesmerising beauty: humanity's first glimpse at the only full resolution photo of a supermassive black hole ever produced. This'orange donut', as it has been dubbed, sits at the heart of the Messier 87 galaxy 55 million light-years from Earth and in 2019 became the first black hole to be directly imaged by astronomers. Now, with the help of artificial intelligence (AI) machine learning, it has received its first official makeover -- and the results reveal that rather than being a'fuzzy donut', it is actually more of a'skinny donut'. Scientists say this new perspective of the supermassive black hole will'play a critical role in our ability to understand its behaviour' and could help explain how the stellar phenomenon'eats' matter. They called it a'golden opportunity' to learn more about black hole physics.

black hole, full resolution photo, galaxy, (13 more...)

Daily Mail - Science & tech

Country:

Europe (0.15)
North America > United States > Ohio (0.05)
North America > United States > New York (0.05)
(5 more...)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Solve a mystery box like a data scientist

#artificialintelligenceJan-20-2023, 13:25:13 GMT

What happens when a data scientist gets a riddle in form of a box? Of course he will (try) approach it as a data problem. In this article I will describe the whole process, and to be honest, it was not as easy as I thought. As with many problems, you can get completely lost and only by talking to a couple of friends, I got back on track again. As a data scientist, I like to approach this problem in a data manner. I realize that this method is far from the most obvious solution. But it was a very fun endeavor. Collecting too much data, train a transformer model to extract values from a video, and eventually use a minimizer to find the solution. This article is a summary of this (mostly) fun journey! I have divided this article in a couple of (for me) logical steps. All images in this article have been taken or are generated by me unless stated otherwise in the separate captions (which is none in this article).

artificial intelligence, dataset, machine learning, (18 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.81)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

What and Why Tidy Data?

#artificialintelligenceJul-12-2022, 18:55:38 GMT

Data scientists like to work with tidy data because it makes the data easier to work with. Visualizations, data manipulation, and modeling are made much easier when working with tidy data. Common coding environments for data science, including R Studio, Pandas in Python, and related packages have been designed to work well with tidy data. The first critical step in investigating a dataset is tidying. We will take a look at each rule from R for Data Science and see how you can format a data frame for each donut that you, as a data scientist/baker can use to visualize, explore, or model your data.

data science project, dataset, tidy data, (10 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.76)

Add feedback