Goto

Collaborating Authors

 textbook



AgriRegion: Region-Aware Retrieval for High-Fidelity Agricultural Advice

Fanuel, Mesafint, Mahmoud, Mahmoud Nabil, Marshal, Crystal Cook, Lakhotia, Vishal, Dari, Biswanath, Roy, Kaushik, Zhang, Shaohu

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated significant potential in democratizing access to information. However, in the domain of agriculture, general-purpose models frequently suffer from contextual hallucination, which provides non-factual advice or answers are scientifically sound in one region but disastrous in another due to variations in soil, climate, and local regulations. We introduce AgriRegion, a Retrieval-Augmented Generation (RAG) framework designed specifically for high-fidelity, region-aware agricultural advisory. Unlike standard RAG approaches that rely solely on semantic similarity, AgriRegion incorporates a geospatial metadata injection layer and a region-prioritized re-ranking mechanism. By restricting the knowledge base to verified local agricultural extension services and enforcing geo-spatial constraints during retrieval, AgriRegion ensures that the advice regarding planting schedules, pest control, and fertilization is locally accurate. We create a novel benchmark dataset, AgriRegion-Eval, which comprises 160 domain-specific questions across 12 agricultural subfields. Experiments demonstrate that AgriRegion reduces hallucinations by 10-20% compared to state-of-the-art LLMs systems and significantly improves trust scores according to a comprehensive evaluation.


Small Language Models Reshape Higher Education: Courses, Textbooks, and Teaching

Zhang, Jian, Shao, Jia

arXiv.org Artificial Intelligence

While large language models (LLMs) have introduced novel paradigms in science and education, their adoption in higher education is constrained by inherent limitations. These include a tendency to produce inaccuracies and high computational requirements, which compromise the strict demands for accurate and reliable knowledge essential in higher education. Small language models (MiniLMs), by contrast, offer distinct advantages in professional education due to their lightweight nature and precise retrieval capabilities. This research takes "Atmospheric Physics" as an example. We established a specialized corpus and image repository by gathering over 550,000 full-text PDFs from over 130 international well-respected journals in Earth and environmental science. From this collection, we extracted over 100 million high-quality sentence-level corpus and more than 3 million high-resolution academic images. Using MiniLMs, these resources were organized into a high-dimensional vector library for precise retrieval and efficient utilization of extensive educational content. Consequently, we systematically redesigned the courses, textbooks, and teaching strategies for "Atmospheric Physics" based on MiniLMs. The course is designed as a "interdisciplinary-frontier" system, breaking down traditional boundaries between atmospheric science, space science, hydrology, and remote sensing. Teaching materials are transformed from static, lagging text formats into a dynamic digital resource library powered by MiniLM. For teaching methods, we have designed a question-based learning pathway. This paradigm promotes a shift from passive knowledge transfer to active cognitive development. Consequently, this MiniLM-driven "Atmospheric Physics" course demonstrates a specific avenue for "AI for education".


SHRAG: AFrameworkfor Combining Human-Inspired Search with RAG

Ryu, Hyunseok, Shin, Wonjune, Park, Hyun

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) is gaining recognition as one of the key technological axes for next generation information retrieval, owing to its ability to mitigate the hallucination phenomenon in Large Language Models (LLMs)and effectively incorporate up-to-date information. However, specialized expertise is necessary to construct ahigh-quality retrieval system independently; moreover, RAGdemonstratesrelativelyslowerprocessing speeds compared to conventional pure retrieval systems because it involves both retrieval and generation stages. Accordingly, this study proposes SHRAG, a novel framework designed to facilitate the seamless integration of Information Retrieval and RAG while simultaneously securing precise retrieval performance. SHRAG utilizes a Large Language Model as a Query Strategist to automatically transform unstructured natural language queries into logically structured search queries, subsequently performing Boolean retrieval to emulate the search process of an expert human searcher. Furthermore, it incorporates multilingual query expansion and a multilingual embedding model, enabling it to perform efficient cross-lingual question answering within the multilingual dataset environment of the ScienceON Challenge. Experimental results demonstrate that the proposed method, combining logical retrieval capabilities and generative reasoning, can significantly enhance the accuracy and reliability of RAG systems. Furthermore, SHRAG movesbeyondconventionaldocument-centric retrieval methods, presenting the potential for a new search paradigm capable of providing direct and reliable responses to queries.


HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning

Yang, Qihao, Wang, Xuelin, Chen, Jiale, Dong, Xuelian, Hao, Yuxin, Hao, Tianyong

arXiv.org Artificial Intelligence

Language acquisition is vital to revealing the nature of human language intelligence and has recently emerged as a promising perspective for improving the interpretability of large language models (LLMs). However, it is ethically and practically infeasible to conduct experiments that require controlling human learners' language inputs. This poses challenges for the verifiability and scalability of language acquisition modeling, particularly in Chinese second language acquisition (SLA). While LLMs provide a controllable and reproducible alternative, a systematic benchmark to support phase-wise modeling and assessment is still lacking. In this paper, we present HSKBenchmark, the first benchmark for staged modeling and writing assessment of LLMs in Chinese SLA. It covers HSK levels 3 to 6 and includes authentic textbooks with 6.76 million tokens, 16K synthetic instruction samples, 30 test topics, and a linguistically grounded evaluation system. To simulate human learning trajectories, we introduce a curriculum-tuning framework that trains models from beginner to advanced levels. An evaluation system is created to examine level-based grammar coverage, writing errors, lexical and syntactic complexity, and holistic scoring. We also build HSKAgent, fine-tuned on 10K learner compositions. Extensive experimental results demonstrate that HSKBenchmark not only models Chinese SLA effectively, but also serves as a reliable benchmark for dynamic writing assessment in LLMs. Our fine-tuned LLMs have writing performance on par with advanced human learners and exhibit human-like acquisition characteristics. The HSKBenchmark, HSKAgent, and checkpoints serve as foundational tools and resources, with the potential to pave the way for future research on language acquisition modeling and LLMs interpretability. Code and data are publicly available at: https://github.com/CharlesYang030/HSKB.


CardioEmbed: Domain-Specialized Text Embeddings for Clinical Cardiology

Young, Richard J., Matthews, Alice M.

arXiv.org Artificial Intelligence

Biomedical text embeddings have primarily been developed using research literature from PubMed, yet clinical cardiology practice relies heavily on procedural knowledge and specialized terminology found in comprehensive textbooks rather than research abstracts. This research practice gap limits the effectiveness of existing embedding models for clinical applications incardiology. This study trained CardioEmbed, a domain-specialized embedding model based on Qwen3-Embedding-8B, using contrastive learning on a curated corpus of seven comprehensive cardiology textbooks totaling approximately 150,000 sentences after deduplication. The model employs InfoNCE loss with in-batch negatives and achieves 99.60% retrieval accuracy on cardiac-specific semantic retrieval tasks, a +15.94 percentage point improvement over MedTE, the current state-of-the-art medical embedding model. On MTEB medical benchmarks, the model obtained BIOSSES 0.77 Spearman and SciFact 0.61 NDCG@10, indicating competitive performance on related biomedical domains. Domain-specialized training on comprehensive clinical textbooks yields near-perfect cardiology retrieval (99.60% Acc@1), improving over MedTE by +15.94 percentage points.


Accelerating Diffusion LLM Inference via Local Determinism Propagation

Kong, Fanheng, Zhang, Jingyuan, Liu, Yahui, Wu, Zirui, Tian, Yu, W., Victoria, Zhou, Guorui

arXiv.org Artificial Intelligence

Diffusion large language models (dLLMs) represent a significant advancement in text generation, offering parallel token decoding capabilities. However, existing open-source implementations suffer from quality-speed trade-offs that impede their practical deployment. Conservative sampling strategies typically decode only the most confident token per step to ensure quality (i.e., greedy decoding), at the cost of inference efficiency due to repeated redundant refinement iterations--a phenomenon we term delayed decoding. Through systematic analysis of dLLM decoding dynamics, we characterize this delayed decoding behavior and propose a training-free adaptive parallel decoding strategy, named LocalLeap, to address these inefficiencies. LocalLeap is built on two fundamental empirical principles: local determinism propagation centered on high-confidence anchors and progressive spatial consistency decay. By applying these principles, LocalLeap identifies anchors and performs localized relaxed parallel decoding within bounded neighborhoods, achieving substantial inference step reduction through early commitment of already-determined tokens without compromising output quality. Comprehensive evaluation on various benchmarks demonstrates that LocalLeap achieves 6.94$\times$ throughput improvements and reduces decoding steps to just 14.2\% of the original requirement, achieving these gains with negligible performance impact. The source codes are available at: https://github.com/friedrichor/LocalLeap.


OpenStaxQA: A multilingual dataset based on open-source college textbooks

Gupta, Pranav

arXiv.org Artificial Intelligence

We present OpenStaxQA, an evaluation benchmark specific to college-level educational applications based on 43 open-source college textbooks in English, Spanish, and Polish, available under a permissive Creative Commons license. We finetune and evaluate large language models (LLMs) with approximately 7 billion parameters on this dataset using quantized low rank adapters (QLoRa). Additionally we also perform a zero-shot evaluation on the AI2 reasoning challenge dev dataset in order to check if OpenStaxQA can lead to an improved performance on other tasks. We also discuss broader impacts relevant to datasets such as OpenStaxQA.


7fd3b80fb1884e2927df46a7139bb8bf-AuthorFeedback.pdf

Neural Information Processing Systems

We begin by sincerely thanking all reviewers for carefully reading our paper and providing useful feedback. If accepted, we will amend the manuscript accordingly. Generalization properties of heterogeneous boosted ensembles (R2). The weights correspond to how many times each subclass appears in the ensemble. What is the role of Θ (R4)?


Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook

Chen, Eason, Li, Chuangji, Li, Shizhuo, Xiao, Zimo, Lin, Jionghao, Koedinger, Kenneth R.

arXiv.org Artificial Intelligence

Technology-enhanced learning environments often help students retrieve relevant learning content for questions arising during self-paced study. Large language models (LLMs) have emerged as novel aids for information retrieval during learning. While LLMs are effective for general-purpose question-answering, they typically lack alignment with the domain knowledge of specific course materials such as textbooks and slides. We investigate Retrieval-Augmented Generation (RAG) and GraphRAG, a knowledge graph-enhanced RAG approach, for page-level question answering in an undergraduate mathematics textbook. While RAG has been effective for retrieving discrete, contextually relevant passages, GraphRAG may excel in modeling interconnected concepts and hierarchical knowledge structures. We curate a dataset of 477 question-answer pairs, each tied to a distinct textbook page. We then compare the standard embedding-based RAG methods to GraphRAG for evaluating both retrieval accuracy-whether the correct page is retrieved-and generated answer quality via F1 scores. Our findings show that embedding-based RAG achieves higher retrieval accuracy and better F1 scores compared to GraphRAG, which tends to retrieve excessive and sometimes irrelevant content due to its entity-based structure. We also explored re-ranking the retrieved pages with LLM and observed mixed results, including performance drop and hallucinations when dealing with larger context windows. Overall, this study highlights both the promises and challenges of page-level retrieval systems in educational contexts, emphasizing the need for more refined retrieval methods to build reliable AI tutoring solutions in providing reference page numbers.