AITopics | approved

Collaborating Authors

approved

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analysis and Explainability of LLMs Via Evolutionary Methods

Gallagher, Shannon K., Rallapalli, Swati, Brooks, Tyler, Loughin, Chuck, Sezgin, Michele, Yurko, Ronald

arXiv.org Machine LearningMay-6-2026

Evolutionary methods have long been useful for analysis and explanation in genetics, biology, ecology, and related fields. In this work, we extend these methods to neural networks, specifically large language models (LLMs), to better analyze and explain relationships among models. We show how relating weights to genotypes and output text to phenotypes can improve our understanding of model lineage, important datasets, the roles of different model layers, and visualization of model relationships. We demonstrate this in a controlled experiment, where our estimated evolutionary trees reliably recover the topology of the ground-truth training tree. We further identify the most important weight layers according to weight differences and show through phenotypic experiments that one training dataset appears to contribute more useful information than the others. Finally, we generate an unsupervised evolutionary tree of black-box foundation models. Throughout, we provide visualizations that support a clearer understanding of evolutionary relationships among LLMs.

large language model, machine learning, public release and unlimited distribution, (19 more...)

arXiv.org Machine Learning

2605.0293

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (0.55)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Time-Series Anomaly Classification for Launch Vehicle Propulsion Systems: Fast Statistical Detectors Enhancing LSTM Accuracy and Data Quality

Engelstad, Sean P., Darr, Sameul R., Taliaferro, Matthew, Goyal, Vinay K.

arXiv.org Machine LearningJan-13-2026

Supporting Go/No-Go decisions prior to launch requires assessing real-time telemetry data against redline limits established during the design qualification phase. Family data from ground testing or previous flights is commonly used to detect initiating failure modes and their timing; however, this approach relies heavily on engineering judgment and is more error-prone for new launch vehicles. To address these limitations, we utilize Long-Term Short-Term Memory (LSTM) networks for supervised classification of time-series anomalies. Although, initial training labels derived from simulated anomaly data may be suboptimal due to variations in anomaly strength, anomaly settling times, and other factors. In this work, we propose a novel statistical detector based on the Mahalanobis distance and forward-backward detection fractions to adjust the supervised training labels. We demonstrate our method on digital twin simulations of a ground-stage propulsion system with 20.8 minutes of operation per trial and O(10^8) training timesteps. The statistical data relabeling improved precision and recall of the LSTM classifier by 7% and 22% respectively.

anomaly, artificial intelligence, machine learning, (20 more...)

arXiv.org Machine Learning

2601.06186

Country: North America > United States > California > Los Angeles County (0.46)

Genre: Research Report (0.82)

Industry: Aerospace & Defense (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) Framework

Nathanson, Samuel, Lee, Alexander, Kieffer, Catherine Chen, Junkin, Jared, Ye, Jessica, Saeed, Amir, Lockhart, Melanie, Fink, Russ, Peterson, Elisha, Watkins, Lanier

arXiv.org Artificial IntelligenceNov-18-2025

Assurance for artificial intelligence (AI) systems remains fragmented across software supply-chain security, adversarial machine learning, and governance documentation. Existing transparency mechanisms - including Model Cards, Datasheets, and Software Bills of Materials (SBOMs) - advance provenance reporting but rarely provide verifiable, machine-readable evidence of model security. This paper introduces the AI Risk Scanning (AIRS) Framework, a threat-model-based, evidence-generating framework designed to operationalize AI assurance. The AIRS Framework evolved through three progressive pilot studies - Smurf (AIBOM schema design), OPAL (operational validation), and Pilot C (AIRS) - that reframed AI documentation from descriptive disclosure toward measurable, evidence-bound verification. The framework aligns its assurance fields to the MITRE ATLAS adversarial ML taxonomy and automatically produces structured artifacts capturing model integrity, packaging and serialization safety, structural adapters, and runtime behaviors. Currently, the AIRS Framework is scoped to provide model-level assurances for LLMs, but it could be expanded to include other modalities and cover system-level threats (e.g. application-layer abuses, tool-calling). A proof-of-concept on a quantized GPT-OSS-20B model demonstrates enforcement of safe loader policies, per-shard hash verification, and contamination and backdoor probes executed under controlled runtime conditions. Comparative analysis with SBOM standards of SPDX 3.0 and CycloneDX 1.6 reveals alignment on identity and evaluation metadata, but identifies critical gaps in representing AI-specific assurance fields. The AIRS Framework thus extends SBOM practice to the AI domain by coupling threat modeling with automated, auditable evidence generation, providing a principled foundation for standardized, trustworthy, and machine-verifiable AI risk documentation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.12668

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Data Fusion of Deep Learned Molecular Embeddings for Property Prediction

Appleton, Robert J, Barnes, Brian C, Strachan, Alejandro

arXiv.org Artificial IntelligenceOct-29-2025

Data - driven approaches such as deep learning can result in predictive models for material properties with exceptional accuracy and efficiency. However, in many applications, data is sparse, severely limiting their accuracy and applicability . To improve predictions, techniques such as transfer learning and multi - task learning have been used. T he performance of multi - task learning models depend s on the strength of the underlying correlations between tasks and the completeness of the dataset . S tandard multi - task models tend to underperform when trained on sparse datasets with weakly correlated properties. To address this gap, we fuse deep - learned embeddings generated by independent pre - trained single - task models, resulting in a multi - task model that inherit s rich, property - specific representations. By re - using (rather than re - training) these embeddings, the resulting fused model outperforms standard multi - task models and can be extended with fewer trainable parameters . We demonstrate this technique on a widely used benchmark dataset of quantum chemistry data for small molecules as well as a newly compiled sparse dataset of experimental data collected from literature and our own quant um chemistry and thermochemical calculations.

artificial intelligence, distribution statement, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1021/acs.jcim.5c01728

2504.07297

Country: North America > United States > Indiana (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Government > Military (0.68)
Government > Regional Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation

Zenith, Ayush, Zumbrun, Arnold, Raut, Neel, Lin, Jing

arXiv.org Artificial IntelligenceOct-9-2025

The performance of machine learning models depends heavily on training data. The scarcity of large-scale, well-annotated datasets poses significant challenges in creating robust models. To address this, synthetic data generated through simulations and generative models has emerged as a promising solution, enhancing dataset diversity and improving the performance, reliability, and resilience of models. However, evaluating the quality of this generated data requires an effective metric. This paper introduces the Synthetic Dataset Quality Metric (SDQM) to assess data quality for object detection tasks without requiring model training to converge. This metric enables more efficient generation and selection of synthetic datasets, addressing a key challenge in resource-constrained object detection tasks. In our experiments, SDQM demonstrated a strong correlation with the mean Average Precision (mAP) scores of YOLOv11, a leading object detection model, while previous metrics only exhibited moderate or weak correlations. Additionally, it provides actionable insights for improving dataset quality, minimizing the need for costly iterative training. This scalable and efficient metric sets a new standard for evaluating synthetic data.

artificial intelligence, distribution unlimited, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.06596

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.88)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.88)

Add feedback

Gamma Mixture Modeling for Cosine Similarity in Small Language Models

Player, Kevin

arXiv.org Artificial IntelligenceOct-8-2025

We study the cosine similarity of sentence transformer embeddings and observe that they are well modeled by gamma mixtures. From a fixed corpus, we measure similarities between all document embeddings and a reference query embedding. Empirically we find that these distributions are often well captured by a gamma distribution shifted and truncated to [ 1, 1], and in many cases, by a gamma mixture. We propose a heuristic model in which a hierarchical clustering of topics naturally leads to a gamma-mixture structure in the similarity scores. Finally, we outline an expectation-maximization algorithm for fitting shifted gamma mixtures, which provides a practical tool for modeling similarity distributions.

machine learning, natural language, public release and unlimited distribution, (11 more...)

arXiv.org Artificial Intelligence

2510.05309

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs

Gupta, Ayush, Kaur, Ramneet, Roy, Anirban, Cobb, Adam D., Chellappa, Rama, Jha, Susmit

arXiv.org Artificial IntelligenceSep-17-2025

We propose a novel inference-time out-of-domain (OOD) detection algorithm for specialized large language models (LLMs). Despite achieving state-of-the-art performance on in-domain tasks through fine-tuning, specialized LLMs remain vulnerable to incorrect or unreliable outputs when presented with OOD inputs, posing risks in critical applications. Our method leverages the Inductive Conformal Anomaly Detection (ICAD) framework, using a new non-conformity measure based on the model's dropout tolerance. Motivated by recent findings on polysemanticity and redundancy in LLMs, we hypothesize that in-domain inputs exhibit higher dropout tolerance than OOD inputs. We aggregate dropout tolerance across multiple layers via a valid ensemble approach, improving detection while maintaining theoretical false alarm bounds from ICAD. Experiments with medical-specialized LLMs show that our approach detects OOD inputs better than baseline methods, with AUROC improvements of $2\%$ to $37\%$ when treating OOD datapoints as positives and in-domain test datapoints as negatives.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2509.04655

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming

Sinha, Anusha, Grimes, Keltin, Lucassen, James, Feffer, Michael, VanHoudnos, Nathan, Wu, Zhiwei Steven, Heidari, Hoda

arXiv.org Artificial IntelligenceSep-16-2025

A red team simulates adversary attacks to help defenders find effective strategies to defend their systems in a real-world operational setting. As more enterprise systems adopt AI, red-teaming will need to evolve to address the unique vulnerabilities and risks posed by AI systems. We take the position that AI systems can be more effectively red-teamed if AI red-teaming is recognized as a domain-specific evolution of cyber red-teaming. Specifically, we argue that existing Cyber Red Teams who adopt this framing will be able to better evaluate systems with AI components by recognizing that AI poses new risks, has new failure modes to exploit, and often contains unpatchable bugs that re-prioritize disclosure and mitigation strategies. Similarly, adopting a cybersecurity framing will allow existing AI Red Teams to leverage a well-tested structure to emulate realistic adversaries, promote mutual accountability with formal rules of engagement, and provide a pattern to mature the tooling necessary for repeatable, scalable engagements. In these ways, the merging of AI and Cyber Red Teams will create a robust security ecosystem and best position the community to adapt to the rapidly changing threat landscape.

ai system, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.11398

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

A Data-Based Architecture for Flight Test without Test Points

Harp, D. Isaiah, Ott, Joshua, Alora, John, Asmar, Dylan

arXiv.org Artificial IntelligenceJun-4-2025

The justification for the "test point" derives from the test pilot's obligation to reproduce faithfully the pre-specified conditions of some model prediction. Pilot deviation from those conditions invalidates the model assumptions. Flight test aids have been proposed to increase accuracy on more challenging test points. However, the very existence of databands and tolerances is the problem more fundamental than inadequate pilot skill. We propose a novel approach, which eliminates test points. We start with a high-fidelity digital model of an air vehicle. Instead of using this model to generate a point prediction, we use a machine learning method to produce a reduced-order model (ROM). The ROM has two important properties. First, it can generate a prediction based on any set of conditions the pilot flies. Second, if the test result at those conditions differ from the prediction, the ROM can be updated using the new data. The outcome of flight test is thus a refined ROM at whatever conditions were flown. This ROM in turn updates and validates the high-fidelity model. We present a single example of this "point-less" architecture, using T-38C flight test data. We first use a generic aircraft model to build a ROM of longitudinal pitching motion as a hypersurface. We then ingest unconstrained flight test data and use Gaussian Process Regression to update and condition the hypersurface. By proposing a second-order equivalent system for the T-38C, this hypersurface then generates parameters necessary to assess MIL-STD-1797B compliance for longitudinal dynamics.

artificial intelligence, data-based architecture, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.02315

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Luis Obispo County > San Luis Obispo (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Government > Military > Air Force (1.00)
Aerospace & Defense > Aircraft (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding

Padhi, Trilok, Kaur, Ramneet, Cobb, Adam D., Acharya, Manoj, Roy, Anirban, Samplawski, Colin, Matejek, Brian, Berenbeim, Alexander M., Bastian, Nathaniel D., Jha, Susmit

arXiv.org Artificial IntelligenceMay-8-2025

We introduce a novel approach for calibrating uncertainty quantification (UQ) tailored for multi-modal large language models (LLMs). Existing state-of-the-art UQ methods rely on consistency among multiple responses generated by the LLM on an input query under diverse settings. However, these approaches often report higher confidence in scenarios where the LLM is consistently incorrect. This leads to a poorly calibrated confidence with respect to accuracy. To address this, we leverage cross-modal consistency in addition to self-consistency to improve the calibration of the multi-modal models. Specifically, we ground the textual responses to the visual inputs. The confidence from the grounding model is used to calibrate the overall confidence. Given that using a grounding model adds its own uncertainty in the pipeline, we apply temperature scaling - a widely accepted parametric calibration technique - to calibrate the grounding model's confidence in the accuracy of generated responses. We evaluate the proposed approach across multiple multi-modal tasks, such as medical question answering (Slake) and visual question answering (VQAv2), considering multi-modal models such as LLaVA-Med and LLaVA. The experiments demonstrate that the proposed framework achieves significantly improved calibration on both tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.03788

Country: North America > United States (1.00)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback