AITopics | pdf file

Collaborating Authors

pdf file

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Information Extraction From Fiscal Documents Using LLMs

Aggarwal, Vikram, Kulkarni, Jay, Mascarenhas, Aditi, Narang, Aakriti, Raman, Siddarth, Shah, Ajay, Thomas, Susan

arXiv.org Artificial IntelligenceNov-25-2025

Large Language Models (LLMs) have demonstrated remarkable capabilities in text comprehension, but their ability to process complex, hierarchical tabular data remains underexplored. We present a novel approach to extracting structured data from multi-page government fiscal documents using LLM-based techniques. Applied to annual fiscal documents from the State of Karnataka in India (200+ pages), our method achieves high accuracy through a multi-stage pipeline that leverages domain knowledge, sequential context, and algorithmic validation. A large challenge with traditional OCR methods is the inability to verify the accurate extraction of numbers. When applied to fiscal data, the inherent structure of fiscal tables, with totals at each level of the hierarchy, allows for robust internal validation of the extracted data. We use these hierarchical relationships to create multi-level validation checks. We demonstrate that LLMs can read tables and also process document-specific structural hierarchies, offering a scalable process for converting PDF-based fiscal disclosures into research-ready databases. Our implementation shows promise for broader applications across developing country contexts.

information, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.10659

Country: Asia > India > Karnataka (0.27)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

7ccaa4f9a89cce6619093226f26b84e6-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsSep-28-2025, 09:39:10 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China (1.00)
Oceania > Australia (0.94)
Europe > Germany (0.68)
(34 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Passenger (1.00)
Materials > Metals & Mining (1.00)
Law > Environmental Law (1.00)
(15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Meet Your New Client: Writing Reports for AI -- Benchmarking Information Loss in Market Research Deliverables

Simmering, Paul F., Schulz, Benedikt, Tabino, Oliver, Wittenburg, Georg

arXiv.org Artificial IntelligenceAug-25-2025

As organizations adopt retrieval-augmented generation (RAG) for their knowledge management systems (KMS), traditional market research deliverables face new functional demands. While PDF reports and slides have long served human readers, they are now also "read" by AI systems to answer user questions. To future-proof reports being delivered today, this study evaluates information loss during their ingestion into RAG systems. It compares how well PDF and PowerPoint (PPTX) documents converted to Markdown can be used by an LLM to answer factual questions in an end-to-end benchmark. Findings show that while text is reliably extracted, significant information is lost from complex objects like charts and diagrams. This suggests a need for specialized, AI-native deliverables to ensure research insights are not lost in translation.

large language model, layout element, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2508.15817

Genre: Research Report > New Finding (0.48)

Industry: Marketing (0.62)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

An NLP Benchmark Dataset for Assessing Corporate Climate Policy Engagement

Neural Information Processing SystemsAug-20-2025, 12:19:38 GMT

As societal awareness of climate change grows, corporate climate policy engagements are attracting attention.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > India (1.00)
Asia > China (1.00)
Oceania > Australia (0.94)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Passenger (1.00)
Materials > Metals & Mining (1.00)
Industrial Conglomerates (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering

Cao, Ruisheng, Zhang, Hanchong, Huang, Tiancheng, Kang, Zhangyi, Zhang, Yuxin, Sun, Liangtai, Li, Hanqi, Miao, Yuxun, Fan, Shuai, Chen, Lu, Yu, Kai

arXiv.org Artificial IntelligenceJun-3-2025

The increasing number of academic papers poses significant challenges for researchers to efficiently acquire key details. While retrieval augmented generation (RAG) shows great promise in large language model (LLM) based automated question answering, previous works often isolate neural and symbolic retrieval despite their complementary strengths. Moreover, conventional single-view chunking neglects the rich structure and layout of PDFs, e.g., sections and tables. In this work, we propose NeuSym-RAG, a hybrid neural symbolic retrieval framework which combines both paradigms in an interactive process. By leveraging multi-view chunking and schema-based parsing, NeuSym-RAG organizes semi-structured PDF content into both the relational database and vectorstore, enabling LLM agents to iteratively gather context until sufficient to generate answers. Experiments on three full PDF-based QA datasets, including a self-annotated one AIRQA-REAL, show that NeuSym-RAG stably defeats both the vector-based RAG and various structured baselines, highlighting its capacity to unify both retrieval schemes and utilize multiple views. Code and data are publicly available at https://github.com/X-LANCE/NeuSym-RAG.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2505.19754

Country:

Asia (0.28)
North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Adobe Acrobat Pro review: Still the gold standard

PCWorldDec-12-2024, 15:00:00 GMT

Acrobat Pro's comprehensive PDF features show why it's still the editor against which all others are judged. Editor's note: This review was updated December 9, 2024 to reflect the addition of AI Assistant and current pricing. Adobe created the PDF two decades ago and its PDF editor has continued to rule the category, despite what many users felt was its exorbitant price. But a couple of years back, Acrobat adopted a cloud subscription model that now makes it more affordable for folks without an enterprise budget. Acrobat Pro is composed of three components: Acrobat, which allows you to perform a variety of editing functions on your PDFs on desktop and mobile devices; Adobe Document Cloud, which lets you create and export PDF files, as well as store and send files and collect electronic signatures; and Acrobat Reader, which enables you to read, print, and sign PDFs.

acrobat, adobe acrobat, artificial intelligence, (13 more...)

PCWorld

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Mobile (0.35)

Add feedback

Polish Medical Exams: A new dataset for cross-lingual medical knowledge transfer assessment

Grzybowski, Łukasz, Pokrywka, Jakub, Ciesiółka, Michał, Kaczmarek, Jeremi I., Kubis, Marek

arXiv.org Artificial IntelligenceNov-30-2024

Large Language Models (LLMs) have demonstrated significant potential in handling specialized tasks, including medical problem-solving. However, most studies predominantly focus on English-language contexts. This study introduces a novel benchmark dataset based on Polish medical licensing and specialization exams (LEK, LDEK, PES) taken by medical doctor candidates and practicing doctors pursuing specialization. The dataset was web-scraped from publicly available resources provided by the Medical Examination Center and the Chief Medical Chamber. It comprises over 24,000 exam questions, including a subset of parallel Polish-English corpora, where the English portion was professionally translated by the examination center for foreign candidates. By creating a structured benchmark from these existing exam questions, we systematically evaluate state-of-the-art LLMs, including general-purpose, domain-specific, and Polish-specific models, and compare their performance against human medical students. Our analysis reveals that while models like GPT-4o achieve near-human performance, significant challenges persist in cross-lingual translation and domain-specific understanding. These findings underscore disparities in model performance across languages and medical specialties, highlighting the limitations and ethical considerations of deploying LLMs in clinical practice.

exam, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.00559

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Poland > Greater Poland Province > Poznań (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PDFs are now even easier to work with thanks to the new AI features in PDFelement

PCWorldOct-23-2024, 19:15:15 GMT

PDFelement is already a well-established name when it comes to working with PDFs, thanks to its impressive range of features and affordable price. But the developers at Wondershare haven't rested on their laurels, as a new upgrade brings a host of AI tools and enhancements that will make it even easier to edit, annotate, extract information and share the results. If you regularly deal with PDF files, the updated PDFelements version 11 release could be about to make your life a whole lot simpler. The AI revolution is well underway, and the updated PDFelement brings AI-powered tools that are focussed on improving how users interact with PDF files. With these new abilities you can get work done in the least amount of time and with a minimum of fuss.

ai feature, artificial intelligence, pdfelement, (6 more...)

PCWorld

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Reviews: Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds

Neural Information Processing SystemsOct-7-2024, 03:45:32 GMT

This paper investigates Dimensionality Reduction (DR) maps in an information retrieval setting. In particular, they showed that no DR map can attain both perfect precision and perfect recall. Further, they showed the theoretical bounds for the precision and the Wasserstein distance of a continuous DR map. They also run simulations in various settings. Quality: They have theoretical equivalences of precision and recall (Proposition 1) and show that perfect map does not exist (Theorem 1).

dimensionality reduction, geometric bound, quantifiable imperfection, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.71)

Add feedback

KaPQA: Knowledge-Augmented Product Question-Answering

Eppalapally, Swetha, Dangi, Daksh, Bhat, Chaithra, Gupta, Ankita, Zhang, Ruiyi, Agarwal, Shubham, Bagga, Karishma, Yoon, Seunghyun, Lipka, Nedim, Rossi, Ryan A., Dernoncourt, Franck

arXiv.org Artificial IntelligenceJul-22-2024

Question-answering for domain-specific applications has recently attracted much interest due to the latest advancements in large language models (LLMs). However, accurately assessing the performance of these applications remains a challenge, mainly due to the lack of suitable benchmarks that effectively simulate real-world scenarios. To address this challenge, we introduce two product question-answering (QA) datasets focused on Adobe Acrobat and Photoshop products to help evaluate the performance of existing models on domain-specific product QA tasks. Additionally, we propose a novel knowledge-driven RAG-QA framework to enhance the performance of the models in the product QA task. Our experiments demonstrated that inducing domain knowledge through query reformulation allowed for increased retrieval and generative performance when compared to standard RAG-QA methods. This improvement, however, is slight, and thus illustrates the challenge posed by the datasets introduced.

dataset, knowledge augmented method, query, (15 more...)

arXiv.org Artificial Intelligence

2407.16073

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Dominican Republic (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (0.46)
Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback