AITopics | subfield

Collaborating Authors

subfield

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating Hydro-Science and Engineering Knowledge of Large Language Models

Hu, Shiruo, Shan, Wenbo, Li, Yingjia, Wan, Zhiqi, Yu, Xinpeng, Qi, Yunjia, Xia, Haotian, Xiao, Yang, Liu, Dingxiao, Wang, Jiaru, Gong, Chenxu, Zhang, Ruixi, Wu, Shuyue, Cui, Shibo, Lai, Chee Hui, Luo, Wei, He, Yubin, Xu, Bin, Zhao, Jianshi

arXiv.org Artificial IntelligenceDec-4-2025

Hydro-Science and Engineering (Hydro-SE) is a critical and irreplaceable domain that secures human water supply, generates clean hydropower energy, and mitigates flood and drought disasters. Featuring multiple engineering objectives, Hydro-SE is an inherently interdisciplinary domain that integrates scientific knowledge with engineering expertise. This integration necessitates extensive expert collaboration in decision-making, which poses difficulties for intelligence. With the rapid advancement of large language models (LLMs), their potential application in the Hydro-SE domain is being increasingly explored. However, the knowledge and application abilities of LLMs in Hydro-SE have not been sufficiently evaluated. To address this issue, we propose the Hydro-SE LLM evaluation benchmark (Hydro-SE Bench), which contains 4,000 multiple-choice questions. Hydro-SE Bench covers nine subfields and enables evaluation of LLMs in aspects of basic conceptual knowledge, engineering application ability, and reasoning and calculation ability. The evaluation results on Hydro-SE Bench show that the accuracy values vary among 0.74 to 0.80 for commercial LLMs, and among 0.41 to 0.68 for small-parameter LLMs. While LLMs perform well in subfields closely related to natural and physical sciences, they struggle with domain-specific knowledge such as industry standards and hydraulic structures. Model scaling mainly improves reasoning and calculation abilities, but there is still great potential for LLMs to better handle problems in practical engineering application. This study highlights the strengths and weaknesses of LLMs for Hydro-SE tasks, providing model developers with clear training targets and Hydro-SE researchers with practical guidance for applying LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.03672

Country: Asia > China (0.47)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry:

Energy > Power Industry (0.48)
Energy > Renewable > Hydroelectric (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Mathematics is hard for mathematicians to understand too Science

ScienceNov-27-2025, 14:01:00 GMT

At a recent conference on mathematics in the age of automated proofs, mathematician and Fields Medalist Akshay Venkatesh presented “How do we talk to our students about AI?'' He quoted an email he'd received from a young student who asked, “Do you believe that mathematics is worth being studied in a world in which a machine can answer everything for you? What do you believe would be the 'job’ of a mathematician in this world?” Venkatesh framed AI as an opportunity to correct what he called an “essential gap that has opened between the practice of mathematics and our values.” Mathematician William Thurston has explained these values by writing, “mathematics is not about numbers, equations, computations, or algorithms: it is about understanding.” But Venkatesh argued that the record on this is terrible, lamenting that “for a typical paper or talk, very few of us understand it.” He is not alone in thinking that something is wrong with the current state of mathematics research.

artificial intelligence, mathematician, mathematics, (13 more...)

Science

Country: North America > United States > District of Columbia > Washington (0.05)

Industry: Education (0.55)

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

Novelty and Impact of Economics Papers

Wu, Chaofeng

arXiv.org Artificial IntelligenceNov-18-2025

We propose a framework that recasts scientific novelty not as a single attribute of a paper, but as a reflection of its position within the evolving intellectual landscape. We decompose this position into two orthogonal dimensions: \textit{spatial novelty}, which measures a paper's intellectual distinctiveness from its neighbors, and \textit{temporal novelty}, which captures its engagement with a dynamic research frontier. To operationalize these concepts, we leverage Large Language Models to develop semantic isolation metrics that quantify a paper's location relative to the full-text literature. Applying this framework to a large corpus of economics articles, we uncover a fundamental trade-off: these two dimensions predict systematically different outcomes. Temporal novelty primarily predicts citation counts, whereas spatial novelty predicts disruptive impact. This distinction allows us to construct a typology of semantic neighborhoods, identifying four archetypes associated with distinct and predictable impact profiles. Our findings demonstrate that novelty can be understood as a multidimensional construct whose different forms, reflecting a paper's strategic location, have measurable and fundamentally distinct consequences for scientific progress.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.01211

Country: North America > United States (0.92)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance > Economy (0.67)
Health & Medicine (0.45)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Demystify, Use, Reflect: Preparing students to be informed LLM-users

Chandrashekar, Nikitha Donekal, Nizamani, Sehrish Basir, Ellis, Margaret, Ramakrishnan, Naren

arXiv.org Artificial IntelligenceNov-18-2025

We transitioned our post-CS1 course that introduces various subfields of computer science so that it integrates Large Language Models (LLMs) in a structured, critical, and practical manner. It aims to help students develop the skills needed to engage meaningfully and responsibly with AI. The course now includes explicit instruction on how LLMs work, exposure to current tools, ethical issues, and activities that encourage student reflection on personal use of LLMs as well as the larger evolving landscape of AI-assisted programming. In class, we demonstrate the use and verification of LLM outputs, guide students in the use of LLMs as an ingredient in a larger problem-solving loop, and require students to disclose and acknowledge the nature and extent of LLM assistance. Throughout the course, we discuss risks and benefits of LLMs across CS subfields. In our first iteration of the course, we collected and analyzed data from students pre and post surveys. Student understanding of how LLMs work became more technical, and their verification and use of LLMs shifted to be more discerning and collaborative. These strategies can be used in other courses to prepare students for the AI-integrated future.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.11764

Country: North America > United States (1.00)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.47)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AstroMMBench: A Benchmark for Evaluating Multimodal Large Language Models Capabilities in Astronomy

Shi, Jinghang, Tang, Xiaoyu, Huang, Yang, Li, Yuyang, Kong, Xiao, Zhang, Yanxia, Yue, Caizhan

arXiv.org Artificial IntelligenceOct-22-2025

Astronomical image interpretation presents a significant challenge for applying multimodal large language models (MLLMs) to specialized scientific tasks. Existing benchmarks focus on general multimodal capabilities but fail to capture the complexity of astronomical data. To bridge this gap, we introduce AstroMMBench, the first comprehensive benchmark designed to evaluate MLLMs in astronomical image understanding. AstroMMBench comprises 621 multiple-choice questions across six astrophysical subfields, curated and reviewed by 15 domain experts for quality and relevance. We conducted an extensive evaluation of 25 diverse MLLMs, including 22 open-source and 3 closed-source models, using AstroMMBench. The results show that Ovis2-34B achieved the highest overall accuracy (70.5%), demonstrating leading capabilities even compared to strong closed-source models. Performance showed variations across the six astrophysical subfields, proving particularly challenging in domains like cosmology and high-energy astrophysics, while models performed relatively better in others, such as instrumentation and solar astrophysics. These findings underscore the vital role of domain-specific benchmarks like AstroMMBench in critically evaluating MLLM performance and guiding their targeted development for scientific applications. AstroMMBench provides a foundational resource and a dynamic tool to catalyze advancements at the intersection of AI and astronomy.

astrommbench, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.00063

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

DentalBench: Benchmarking and Advancing LLMs Capability for Bilingual Dentistry Understanding

Zhu, Hengchuan, Xu, Yihuan, Li, Yichen, Meng, Zijie, Liu, Zuozhu

arXiv.org Artificial IntelligenceAug-29-2025

Recent advances in large language models (LLMs) and medical LLMs (Med-LLMs) have demonstrated strong performance on general medical benchmarks. However, their capabilities in specialized medical fields, such as dentistry which require deeper domain-specific knowledge, remain underexplored due to the lack of targeted evaluation resources. In this paper, we introduce DentalBench, the first comprehensive bilingual benchmark designed to evaluate and advance LLMs in the dental domain. DentalBench consists of two main components: DentalQA, an English-Chinese question-answering (QA) benchmark with 36,597 questions spanning 4 tasks and 16 dental subfields; and DentalCorpus, a large-scale, high-quality corpus with 337.35 million tokens curated for dental domain adaptation, supporting both supervised fine-tuning (SFT) and retrieval-augmented generation (RAG). We evaluate 14 LLMs, covering proprietary, open-source, and medical-specific models, and reveal significant performance gaps across task types and languages. Further experiments with Qwen-2.5-3B demonstrate that domain adaptation substantially improves model performance, particularly on knowledge-intensive and terminology-focused tasks, and highlight the importance of domain-specific benchmarks for developing trustworthy and effective LLMs tailored to healthcare applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.20416

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Dental and Oral Health (0.63)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ArXivBench: When You Should Avoid Using ChatGPT for Academic Writing

Li, Ning, Zhang, Jingran, Cui, Justin

arXiv.org Artificial IntelligenceAug-8-2025

Large language models (LLMs) demonstrate strong capabilities in reasoning and question answering, yet their tendency to generate factually incorrect content remains a critical challenge. This study evaluates proprietary and open-source LLMs on generating relevant research papers with accurate arXiv links. Our evaluation reveals critical academic risks: LLMs frequently generate incorrect arXiv links or references to non-existent papers, fundamentally undermining their ability to properly attribute research contributions to the actual authors. We introduce arXivBench, a benchmark specifically designed to assess LLM performance across eight major subject categories on arXiv and five subfields within computer science, one of the most popular categories among them. Our findings show concerning accuracy variations across subjects, with Claude-3.5-Sonnet exhibiting a substantial advantage in generating both relevant and accurate responses. Notably, most LLMs perform significantly better in Artificial Intelligence than other subfields. This benchmark provides a standardized tool for evaluating LLM reliability in scientific contexts, promoting more dependable academic use in research environments. Our code and dataset are available at https://github.com/liningresearch/arXivBench and https://huggingface.co/datasets/arXivBenchLLM/arXivBench.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2504.10496

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Using Large Language Models to Study Mathematical Practice

D'Alessandro, William

arXiv.org Artificial IntelligenceJul-8-2025

The philosophy of mathematical practice (PMP) looks to evidence from working mathematics to help settle philosophical questions. One prominent program under the PMP banner is the study of explanation in mathematics, which aims to understand what sorts of proofs mathematicians consider explanatory and what role the pursuit of explanation plays in mathematical practice. In an effort to address worries about cherry-picked examples and file-drawer problems in PMP, a handful of authors have recently turned to corpus analysis methods as a promising alternative to small-scale case studies. This paper reports the results from such a corpus study facilitated by Google's Gemini 2.5 Pro, a model whose reasoning capabilities, advances in hallucination control and large context window allow for the accurate analysis of hundreds of pages of text per query. Based on a sample of 5000 mathematics papers from arXiv.org, the experiments yielded a dataset of hundreds of useful annotated examples. Its aim was to gain insight on questions like the following: How often do mathematicians make claims about explanation in the relevant sense? Do mathematicians' explanatory practices vary in any noticeable way by subject matter? Which philosophical theories of explanation are most consistent with a large body of non-cherry-picked examples? How might philosophers make further use of AI tools to gain insights from large datasets of this kind? As the first PMP study making extensive use of LLM methods, it also seeks to begin a conversation about these methods as research tools in practice-oriented philosophy and to evaluate the strengths and weaknesses of current models for such work.

explanation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.02873

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner

Zhang, Wenchuan, Zhang, Penghao, Guo, Jingru, Cheng, Tao, Chen, Jie, Zhang, Shuwan, Zhang, Zhang, Yi, Yuhao, Bu, Hong

arXiv.org Artificial IntelligenceJun-18-2025

Recent advances in vision language models (VLMs) have enabled broad progress in the general medical field. However, pathology still remains a more challenging subdomain, with current pathology specific VLMs exhibiting limitations in both diagnostic accuracy and reasoning plausibility. Such shortcomings are largely attributable to the nature of current pathology datasets, which are primarily composed of image description pairs that lack the depth and structured diagnostic paradigms employed by real world pathologists. In this study, we leverage pathology textbooks and real world pathology experts to construct high-quality, reasoning-oriented datasets. Building on this, we introduce Patho-R1, a multimodal RL-based pathology Reasoner, trained through a three-stage pipeline: (1) continued pretraining on 3.5 million image-text pairs for knowledge infusion; (2) supervised fine-tuning on 500k high-quality Chain-of-Thought samples for reasoning incentivizing; (3) reinforcement learning using Group Relative Policy Optimization and Decoupled Clip and Dynamic sAmpling Policy Optimization strategies for multimodal reasoning quality refinement. To further assess the alignment quality of our dataset, we propose Patho-CLIP, trained on the same figure-caption corpus used for continued pretraining. Comprehensive experimental results demonstrate that both Patho-CLIP and Patho-R1 achieve robust performance across a wide range of pathology-related tasks, including zero-shot classification, cross-modal retrieval, Visual Question Answering, and Multiple Choice Question. Our project is available at the Patho-R1 repository: https://github.com/Wenchuan-Zhang/Patho-R1.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.11404

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

ChemRxivQuest: A Curated Chemistry Question-Answer Database Extracted from ChemRxiv Preprints

Amiri, Mahmoud, Bocklitz, Thomas

arXiv.org Artificial IntelligenceJun-16-2025

The rapid expansion of chemistry literature poses significant challenges for researchers seeking to efficiently access domain-specific knowledge. To support advancements in chemistry-focused natural language processing (NLP), we present ChemRxivQuest, a curated dataset of 970 high-quality question-answer (QA) pairs derived from 155 ChemRxiv preprints across 17 subfields of chemistry. Each QA pair is explicitly linked to its source text segment to ensure traceability and contextual accuracy. ChemRxivQuest was constructed using an automated pipeline that combines optical character recognition (OCR), GPT-4o-based QA generation, and a fuzzy matching technique for answer verification. The dataset emphasizes conceptual, mechanistic, applied, and experimental questions, enabling applications in retrieval-based QA systems, search engine development, and fine-tuning of domain-adapted large language models. We analyze the dataset's structure, coverage, and limitations, and outline future directions for expansion and expert validation. ChemRxivQuest provides a foundational resource for chemistry NLP research, education, and tool development.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2505.05232

Country: Europe (0.14)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)

Add feedback