Goto

Collaborating Authors

 Overview


How does Misinformation Affect Large Language Model Behaviors and Preferences?

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable capabilities in knowledge-intensive tasks, while they remain vulnerable when encountering misinformation. Existing studies have explored the role of LLMs in combating misinformation, but there is still a lack of fine-grained analysis on the specific aspects and extent to which LLMs are influenced by misinformation. To bridge this gap, we present MisBench, the current largest and most comprehensive benchmark for evaluating LLMs' behavior and knowledge preference toward misinformation. MisBench consists of 10,346,712 pieces of misinformation, which uniquely considers both knowledge-based conflicts and stylistic variations in misinformation. Empirical results reveal that while LLMs demonstrate comparable abilities in discerning misinformation, they still remain susceptible to knowledge conflicts and stylistic variations. Based on these findings, we further propose a novel approach called Reconstruct to Discriminate (RtD) to strengthen LLMs' ability to detect misinformation. Our study provides valuable insights into LLMs' interactions with misinformation, and we believe MisBench can serve as an effective benchmark for evaluating LLM-based detectors and enhancing their reliability in real-world applications. Codes and data are available at https://github.com/GKNL/MisBench.


StreamLink: Large-Language-Model Driven Distributed Data Engineering System

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable proficiency in natural language understanding (NLU), opening doors for innovative applications. We introduce StreamLink - an LLM-driven distributed data system designed to improve the efficiency and accessibility of data engineering tasks. We build StreamLink on top of distributed frameworks such as Apache Spark and Hadoop to handle large data at scale. One of the important design philosophies of StreamLink is to respect user data privacy by utilizing local fine-tuned LLMs instead of a public AI service like ChatGPT. With help from domain-adapted LLMs, we can improve our system's understanding of natural language queries from users in various scenarios and simplify the procedure of generating database queries like the Structured Query Language (SQL) for information processing. We also incorporate LLM-based syntax and security checkers to guarantee the reliability and safety of each generated query. StreamLink illustrates the potential of merging generative LLMs with distributed data processing for comprehensive and user-centric data engineering. With this architecture, we allow users to interact with complex database systems at different scales in a user-friendly and security-ensured manner, where the SQL generation reaches over 10\% of execution accuracy compared to baseline methods, and allow users to find the most concerned item from hundreds of millions of items within a few seconds using natural language.


A Novel Convolutional Neural Network-Based Framework for Complex Multiclass Brassica Seed Classification

arXiv.org Artificial Intelligence

Agricultural research has accelerated in recent years, yet farmers often lack the time and resources for on-farm research due to the demands of crop production and farm operations. Seed classification offers valuable insights into quality control, production efficiency, and impurity detection. Early identification of seed types is critical to reducing the cost and risk associated with field emergence, which can lead to yield losses or disruptions in downstream processes like harvesting. Seed sampling supports growers in monitoring and managing seed quality, improving precision in determining seed purity levels, guiding management adjustments, and enhancing yield estimations. This study proposes a novel convolutional neural network (CNN)-based framework for the efficient classification of ten common Brassica seed types. The approach addresses the inherent challenge of texture similarity in seed images using a custom-designed CNN architecture. The model's performance was evaluated against several pre-trained state-of-the-art architectures, with adjustments to layer configurations for optimized classification. Experimental results using our collected Brassica seed dataset demonstrate that the proposed model achieved a high accuracy rate of 93 percent.


Collaborative Agentic AI Needs Interoperability Across Ecosystems

arXiv.org Artificial Intelligence

Collaborative agentic AI is projected to transform entire industries by enabling AI-powered agents to autonomously perceive, plan, and act within digital environments. Yet, current solutions in this field are all built in isolation, and we are rapidly heading toward a landscape of fragmented, incompatible ecosystems. In this position paper, we argue that interoperability, achieved by the adoption of minimal standards, is essential to ensure open, secure, web-scale, and widely-adopted agentic ecosystems. To this end, we devise a minimal architectural foundation for collaborative agentic AI, named Web of Agents, which is composed of four components: agent-to-agent messaging, interaction interoperability, state management, and agent discovery. Web of Agents adopts existing standards and reuses existing infrastructure where possible. With Web of Agents, we take the first but critical step toward interoperable agentic systems and offer a pragmatic path forward before ecosystem fragmentation becomes the norm.


OpenReview Should be Protected and Leveraged as a Community Asset for Research in the Era of Large Language Models

arXiv.org Artificial Intelligence

In the era of large language models (LLMs), high-quality, domain-rich, and continuously evolving datasets capturing expert-level knowledge, core human values, and reasoning are increasingly valuable. This position paper argues that OpenReview -- the continually evolving repository of research papers, peer reviews, author rebuttals, meta-reviews, and decision outcomes -- should be leveraged more broadly as a core community asset for advancing research in the era of LLMs. We highlight three promising areas in which OpenReview can uniquely contribute: enhancing the quality, scalability, and accountability of peer review processes; enabling meaningful, open-ended benchmarks rooted in genuine expert deliberation; and supporting alignment research through real-world interactions reflecting expert assessment, intentions, and scientific values. To better realize these opportunities, we suggest the community collaboratively explore standardized benchmarks and usage guidelines around OpenReview, inviting broader dialogue on responsible data use, ethical considerations, and collective stewardship.


Walk&Retrieve: Simple Yet Effective Zero-shot Retrieval-Augmented Generation via Knowledge Graph Walks

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have showcased impressive reasoning abilities, but often suffer from hallucinations or outdated knowledge. Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG) remedies these shortcomings by grounding LLM responses in structured external information from a knowledge base. However, many KG-based RAG approaches struggle with (i) aligning KG and textual representations, (ii) balancing retrieval accuracy and efficiency, and (iii) adapting to dynamically updated KGs. In this work, we introduce Walk&Retrieve, a simple yet effective KG-based framework that leverages walk-based graph traversal and knowledge verbalization for corpus generation for zero-shot RAG. Built around efficient KG walks, our method does not require fine-tuning on domain-specific data, enabling seamless adaptation to KG updates, reducing computational overhead, and allowing integration with any off-the-shelf backbone LLM. Despite its simplicity, Walk&Retrieve performs competitively, often outperforming existing RAG systems in response accuracy and hallucination reduction. Moreover, it demonstrates lower query latency and robust scalability to large KGs, highlighting the potential of lightweight retrieval strategies as strong baselines for future RAG research.


Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents

arXiv.org Artificial Intelligence

Large-language models (LLMs) and chatbot agents are known to provide wrong outputs at times, and it was recently found that this can never be fully prevented. Hence, uncertainty quantification plays a crucial role, aiming to quantify the level of ambiguity in either one overall number or two numbers for aleatoric and epistemic uncertainty. This position paper argues that this traditional dichotomy of uncertainties is too limited for the open and interactive setup that LLM agents operate in when communicating with a user, and that we need to research avenues that enrich uncertainties in this novel scenario. We review the literature and find that popular definitions of aleatoric and epistemic uncertainties directly contradict each other and lose their meaning in interactive LLM agent settings. Hence, we propose three novel research directions that focus on uncertainties in such human-computer interactions: Underspecification uncertainties, for when users do not provide all information or define the exact task at the first go, interactive learning, to ask follow-up questions and reduce the uncertainty about the current context, and output uncertainties, to utilize the rich language and speech space to express uncertainties as more than mere numbers. We expect that these new ways of dealing with and communicating uncertainties will lead to LLM agent interactions that are more transparent, trustworthy, and intuitive.


A Human-Centric Approach to Explainable AI for Personalized Education

arXiv.org Artificial Intelligence

Deep neural networks form the backbone of artificial intelligence research, with potential to transform the human experience in areas ranging from autonomous driving to personal assistants, healthcare to education. However, their integration into the daily routines of real-world classrooms remains limited. It is not yet common for a teacher to assign students individualized homework targeting their specific weaknesses, provide students with instant feedback, or simulate student responses to a new exam question. While these models excel in predictive performance, this lack of adoption can be attributed to a significant weakness: the lack of explainability of model decisions, leading to a lack of trust from students, parents, and teachers. This thesis aims to bring human needs to the forefront of eXplainable AI (XAI) research, grounded in the concrete use case of personalized learning and teaching. We frame the contributions along two verticals: technical advances in XAI and their aligned human studies. We investigate explainability in AI for education, revealing systematic disagreements between post-hoc explainers and identifying a need for inherently interpretable model architectures. We propose four novel technical contributions in interpretability with a multimodal modular architecture (MultiModN), an interpretable mixture-of-experts model (InterpretCC), adversarial training for explainer stability, and a theory-driven LLM-XAI framework to present explanations to students (iLLuMinaTE), which we evaluate in diverse settings with professors, teachers, learning scientists, and university students. By combining empirical evaluations of existing explainers with novel architectural designs and human studies, our work lays a foundation for human-centric AI systems that balance state-of-the-art performance with built-in transparency and trust.


CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs

arXiv.org Artificial Intelligence

Efficient thermal and power management in modern multiprocessor systems-on-chip (MPSoCs) demands accurate power consumption estimation. One of the state-of-the-art approaches, Alternative Blind Power Identification (ABPI), theoretically eliminates the dependence on steady-state temperatures, addressing a major shortcoming of previous approaches. However, ABPI performance has remained unverified in actual hardware implementations. In this study, we conduct the first empirical validation of ABPI on commercial hardware using the NVIDIA Jetson Xavier AGX platform. Our findings reveal that, while ABPI provides computational efficiency and independence from steady-state temperature, it exhibits considerable accuracy deficiencies in real-world scenarios. To overcome these limitations, we introduce a novel approach that integrates Custom Physics-Informed Neural Networks (CPINNs) with the underlying thermal model of ABPI. Our approach employs a specialized loss function that harmonizes physical principles with data-driven learning, complemented by multi-objective genetic algorithm optimization to balance estimation accuracy and computational cost. In experimental validation, CPINN-ABPI achieves a reduction of 84.7\% CPU and 73.9\% GPU in the mean absolute error (MAE) relative to ABPI, with the weighted mean absolute percentage error (WMAPE) improving from 47\%--81\% to $\sim$12\%. The method maintains real-time performance with 195.3~$μ$s of inference time, with similar 85\%--99\% accuracy gains across heterogeneous SoCs.


Position: All Current Generative Fidelity and Diversity Metrics are Flawed

arXiv.org Artificial Intelligence

Any method's development and practical application is limited by our ability to measure its reliability. The popularity of generative modeling emphasizes the importance of good synthetic data metrics. Unfortunately, previous works have found many failure cases in current metrics, for example lack of outlier robustness and unclear lower and upper bounds. We propose a list of desiderata for synthetic data metrics, and a suite of sanity checks: carefully chosen simple experiments that aim to detect specific and known generative modeling failure modes. Based on these desiderata and the results of our checks, we arrive at our position: all current generative fidelity and diversity metrics are flawed. This significantly hinders practical use of synthetic data. Our aim is to convince the research community to spend more effort in developing metrics, instead of models. Additionally, through analyzing how current metrics fail, we provide practitioners with guidelines on how these metrics should (not) be used.