AITopics | markdown

Collaborating Authors

markdown

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NVIDIA Nemotron Parse 1.1

Chumachenko, Kateryna, Deshmukh, Amala Sanjay, Seppanen, Jarno, Karmanov, Ilia, Chen, Chia-Chih, Voegtle, Lukas, Fischer, Philipp, Wawrzos, Marek, Motiian, Saeid, Ageev, Roman, Wu, Kedi, Milesi, Alexandre, Moosaei, Maryam, Pawelec, Krzysztof, Subramanian, Padmavathy, Samadi, Mehrzad, Yu, Xin, Dear, Celina, Stoddard, Sarah, Diamond, Jenna, Oliver, Jesse, Chraghchian, Leanna, Skelly, Patrick, Balough, Tom, Xu, Yao, Scowcroft, Jane Polak, Korzekwa, Daniel, Hanley, Darragh, Bhaskar, Sandip, Roman, Timo, Sapra, Karan, Tao, Andrew, Catanzaro, Bryan

arXiv.org Artificial IntelligenceNov-26-2025

We introduce Nemotron-Parse-1.1, a lightweight document parsing and OCR model that advances the capabilities of its predecessor, Nemoretriever-Parse-1.0. Nemotron-Parse-1.1 delivers improved capabilities across general OCR, markdown formatting, structured table parsing, and text extraction from pictures, charts, and diagrams. It also supports a longer output sequence length for visually dense documents. As with its predecessor, it extracts bounding boxes of text segments, as well as corresponding semantic classes. Nemotron-Parse-1.1 follows an encoder-decoder architecture with 885M parameters, including a compact 256M-parameter language decoder. It achieves competitive accuracy on public benchmarks making it a strong lightweight OCR solution. We release the model weights publicly on Huggingface, as well as an optimized NIM container, along with a subset of the training data as part of the broader Nemotron-VLM-v2 dataset. Additionally, we release Nemotron-Parse-1.1-TC which operates on a reduced vision token length, offering a 20% speed improvement with minimal quality degradation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.20478

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Hardware (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?

McMillan, Teague, Dominici, Gabriele, Gjoreski, Martin, Langheinrich, Marc

arXiv.org Artificial IntelligenceNov-4-2025

Large Language Models (LLMs) often produce explanations that do not faithfully reflect the factors driving their predictions. In healthcare settings, such unfaithfulness is especially problematic: explanations that omit salient clinical cues or mask spurious shortcuts can undermine clinician trust and lead to unsafe decision support. We study how inference and training-time choices shape explanation faithfulness, focusing on factors practitioners can control at deployment. We evaluate three LLMs (GPT-4.1-mini, LLaMA 70B, LLaMA 8B) on two datasets-BBQ (social bias) and MedQA (medical licensing questions), and manipulate the number and type of few-shot examples, prompting strategies, and training procedure. Our results show: (i) both the quantity and quality of few-shot examples significantly impact model faithfulness; (ii) faithfulness is sensitive to prompting design; (iii) the instruction-tuning phase improves measured faithfulness on MedQA. These findings offer insights into strategies for enhancing the interpretability and trustworthiness of LLMs in sensitive domains.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.24236

Country:

North America > United States (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning

Mahdavi, Hamed, Mahdavinia, Pouria, Farhadi, Alireza, Mohammadipour, Pegah, Malek, Samira, Daliri, Majid, Mohammadipour, Pedram, Hashemi, Alireza, Khasahmadi, Amir, Honavar, Vasant

arXiv.org Artificial IntelligenceNov-3-2025

State-of-the-art (SOTA) LLMs have progressed from struggling on proof-based Olympiad problems to solving most of the IMO 2025 problems, with leading systems reportedly handling 5 of 6 problems. Given this progress, we assess how well these models can grade proofs: detecting errors, judging their severity, and assigning fair scores beyond binary correctness. We study proof-analysis capabilities using a corpus of 90 Gemini 2.5 Pro-generated solutions that we grade on a 1-4 scale with detailed error annotations, and on MathArena solution sets for IMO/USAMO 2025 scored on a 0-7 scale. Our analysis shows that models can reliably flag incorrect (including subtly incorrect) solutions but exhibit calibration gaps in how partial credit is assigned. To address this, we introduce agentic workflows that extract and analyze reference solutions and automatically derive problem-specific rubrics for a multi-step grading process. We instantiate and compare different design choices for the grading workflows, and evaluate their trade-offs. Across our annotated corpus and MathArena, our proposed workflows achieve higher agreement with human grades and more consistent handling of partial credit across metrics. We release all code, data, and prompts/logs to facilitate future research.

category, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2510.27094

Country:

Europe > Austria > Vienna (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Pennsylvania (0.04)
(2 more...)

Genre:

Workflow (1.00)
Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Meet Your New Client: Writing Reports for AI -- Benchmarking Information Loss in Market Research Deliverables

Simmering, Paul F., Schulz, Benedikt, Tabino, Oliver, Wittenburg, Georg

arXiv.org Artificial IntelligenceAug-25-2025

As organizations adopt retrieval-augmented generation (RAG) for their knowledge management systems (KMS), traditional market research deliverables face new functional demands. While PDF reports and slides have long served human readers, they are now also "read" by AI systems to answer user questions. To future-proof reports being delivered today, this study evaluates information loss during their ingestion into RAG systems. It compares how well PDF and PowerPoint (PPTX) documents converted to Markdown can be used by an LLM to answer factual questions in an end-to-end benchmark. Findings show that while text is reliably extracted, significant information is lost from complex objects like charts and diagrams. This suggests a need for specialized, AI-native deliverables to ensure research insights are not lost in translation.

large language model, layout element, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2508.15817

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.40)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Marketing (0.62)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting

Braun, Christian, Lilienbeck, Alexander, Mentjukov, Daniel

arXiv.org Artificial IntelligenceMay-20-2025

Legal contracts possess an inherent, semantically vital structure (e.g., sections, clauses) that is crucial for human comprehension but whose impact on LLM processing remains under-explored. This paper investigates the effects of explicit input text structure and prompt engineering on the performance of GPT-4o and GPT-4.1 on a legal question-answering task using an excerpt of the CUAD. We compare model exact-match accuracy across various input formats: well-structured plain-text (human-generated from CUAD), plain-text cleaned of line breaks, extracted plain-text from Azure OCR, plain-text extracted by GPT-4o Vision, and extracted (and interpreted) Markdown (MD) from GPT-4o Vision. To give an indication of the impact of possible prompt engineering, we assess the impact of shifting task instructions to the system prompt and explicitly informing the model about the structured nature of the input. Our findings reveal that GPT-4o demonstrates considerable robustness to variations in input structure, but lacks in overall performance. Conversely, GPT-4.1's performance is markedly sensitive; poorly structured inputs yield suboptimal results (but identical with GPT-4o), while well-structured formats (original CUAD text, GPT-4o Vision text and GPT-4o MD) improve exact-match accuracy by ~20 percentage points. Optimizing the system prompt to include task details and an advisory about structured input further elevates GPT-4.1's accuracy by an additional ~10-13 percentage points, with Markdown ultimately achieving the highest performance under these conditions (79 percentage points overall exact-match accuracy). This research empirically demonstrates that while newer models exhibit greater resilience, careful input structuring and strategic prompt design remain critical for optimizing the performance of LLMs, and can significantly affect outcomes in high-stakes legal applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.12837

Country:

Europe > Germany (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > Canada (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ReaderLM-v2: Small Language Model for HTML to Markdown and JSON

Wang, Feng, Shi, Zesheng, Wang, Bo, Wang, Nan, Xiao, Han

arXiv.org Artificial IntelligenceMar-2-2025

We present ReaderLM-v2, a compact 1.5 billion parameter language model designed for efficient web content extraction. Our model processes documents up to 512K tokens, transforming messy HTML into clean Markdown or JSON formats with high accuracy -- making it an ideal tool for grounding large language models. The model's effectiveness results from two key innovations: (1) a three-stage data synthesis pipeline that generates high quality, diverse training data by iteratively drafting, refining, and critiquing web content extraction; and (2) a unified training framework combining continuous pre-training with multi-objective optimization. Intensive evaluation demonstrates that ReaderLM-v2 outperforms GPT-4o-2024-08-06 and other larger models by 15-20\% on carefully curated benchmarks, particularly excelling at documents exceeding 100K tokens, while maintaining significantly lower computational requirements.

content extraction, dataset, extraction, (14 more...)

arXiv.org Artificial Intelligence

2503.01151

Country:

Europe > Germany (0.28)
Europe > Hungary (0.14)
North America > United States (0.14)
(3 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Idiosyncrasies in Large Language Models

Sun, Mingjie, Yin, Yida, Xu, Zhiqiu, Kolter, J. Zico, Liu, Zhuang

arXiv.org Artificial IntelligenceFeb-17-2025

In this work, we unveil and study idiosyncrasies in Large Language Models (LLMs) -- unique patterns in their outputs that can be used to distinguish the models. To do so, we consider a simple classification task: given a particular text output, the objective is to predict the source LLM that generates the text. We evaluate this synthetic task across various groups of LLMs and find that simply fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. Notably, we achieve 97.1% accuracy on held-out validation data in the five-way classification problem involving ChatGPT, Claude, Grok, Gemini, and DeepSeek. Our further investigation reveals that these idiosyncrasies are rooted in word-level distributions. These patterns persist even when the texts are rewritten, translated, or summarized by an external LLM, suggesting that they are also encoded in the semantic content. Additionally, we leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies. Finally, we discuss the broader implications of our findings, particularly for training on synthetic data and inferring model similarity. Code is available at https://github.com/locuslab/llm-idiosyncrasies.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.1215

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Africa > Tanzania (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines

Ma, Zi-Ao, Lan, Tian, Tu, Rong-Cheng, Hu, Yong, Huang, Heyan, Mao, Xian-Ling

arXiv.org Artificial IntelligenceNov-25-2024

This paper investigates an intriguing task of Multi-modal Retrieval Augmented Multi-modal Generation (M$^2$RAG). This task requires foundation models to browse multi-modal web pages, with mixed text and images, and generate multi-modal responses for solving user queries, which exhibits better information density and readability. Given the early researching stage of M$^2$RAG task, there is a lack of systematic studies and analysis. To fill this gap, we construct a benchmark for M$^2$RAG task, equipped with a suite of text-modal metrics and multi-modal metrics to analyze the capabilities of existing foundation models. Besides, we also propose several effective methods for foundation models to accomplish this task, based on the comprehensive evaluation results on our benchmark. Extensive experimental results reveal several intriguing phenomena worth further research.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2411.16365

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.69)
Government (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Does Prompt Formatting Have Any Impact on LLM Performance?

He, Jia, Rungta, Mukund, Koleczek, David, Sekhon, Arshdeep, Wang, Franklin X, Hasan, Sadid

arXiv.org Artificial IntelligenceNov-15-2024

In the realm of Large Language Models (LLMs), prompt optimization is crucial for model performance. Although previous research has explored aspects like rephrasing prompt contexts, using various prompting techniques (like in-context learning and chain-of-thought), and ordering few-shot examples, our understanding of LLM sensitivity to prompt templates remains limited. Therefore, this paper examines the impact of different prompt templates on LLM performance. We formatted the same contexts into various human-readable templates, including plain text, Markdown, JSON, and YAML, and evaluated their impact across tasks like natural language reasoning, code generation, and translation using OpenAI's GPT models. Experiments show that GPT-3.5-turbo's performance varies by up to 40\% in a code translation task depending on the prompt template, while larger models like GPT-4 are more robust to these variations. Our analysis highlights the need to reconsider the use of fixed prompt templates, as different formats can significantly affect model performance.

large language model, machine learning, prompt format, (21 more...)

arXiv.org Artificial Intelligence

2411.10541

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SWEb: A Large Web Dataset for the Scandinavian Languages

Norlund, Tobias, Isbister, Tim, Gyllensten, Amaru Cuba, Santos, Paul Dos, Petrelli, Danila, Ekgren, Ariel, Sahlgren, Magnus

arXiv.org Artificial IntelligenceOct-6-2024

This paper presents the hitherto largest pretraining dataset for the Scandinavian languages: the Scandinavian WEb (SWEb), comprising over one trillion tokens. The paper details the collection and processing pipeline, and introduces a novel model-based text extractor that significantly reduces complexity in comparison with rule-based approaches. We also introduce a new cloze-style benchmark for evaluating language models in Swedish, and use this test to compare models trained on the SWEb data to models trained on FineWeb, with competitive results. All data, models and code are shared openly. Large language models have made significant strides in recent years due to their general capabilities in language-processing tasks. This progress has been largely driven by the development of extensive and high-quality pretraining datasets sourced from open web data (Wenzek et al., 2020; Brown et al., 2020; Abadji et al., 2022; Penedo et al., 2023; 2024). However, the majority of research aimed at improving pretraining data focuses on high-resource languages such as English. Our goal is to create a large-scale and high-performing open pretraining dataset specifically for the Scandinavian (north-germanic) languages: Swedish, Danish, Norwegian, and Icelandic. Existing large-scale datasets for these languages primarily include mC4 (Xue et al., 2021), OSCAR (Abadji et al., 2022), and HPLT Datasets 1.2 (de Gibert et al., 2024). The Scandinavian portion of mC4 comprises approximately 100B tokens, 10B tokens for OSCAR 23.01, and 35B tokens for HPLT, which are all relatively small numbers considering that state-of-the-art large language models today are trained on trillions of high-quality tokens.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.04456

Country:

North America > Cuba (0.04)
Europe > Norway (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.68)

Add feedback