AITopics

2507.18866

Country:

Asia (0.68)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Abdullah, Abdulhady Abas, Gandomi, Amir H., Rashid, Tarik A, Mirjalili, Seyedali, Abualigah, Laith, Živković, Milena, Veisi, Hadi

The Role of Orthographic Consistency in Multilingual Embedding Models for Text Classification in Arabic-Script Languages

arXiv.org Artificial IntelligenceJul-28-2025

In natural language processing, multilingual models like mBERT and XLM-RoBERTa promise broad coverage but often struggle with languages that share a script yet differ in orthographic norms and cultural context. This issue is especially notable in Arabic-script languages such as Kurdish Sorani, Arabic, Persian, and Urdu. We introduce the Arabic Script RoBERTa (AS-RoBERTa) family: four RoBERTa-based models, each pre-trained on a large corpus tailored to its specific language. By focusing pre-training on language-specific script features and statistics, our models capture patterns overlooked by general-purpose models. When fine-tuned on classification tasks, AS-RoBERTa variants outperform mBERT and XLM-RoBERTa by 2 to 5 percentage points. An ablation study confirms that script-focused pre-training is central to these gains. Error analysis using confusion matrices shows how shared script traits and domain-specific content affect performance. Our results highlight the value of script-aware specialization for languages using the Arabic script and support further work on pre-training strategies rooted in script and language specificity.

arabic, machine learning, natural language, (21 more...)

2507.18762

Country: Asia > Middle East (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJul-28-2025

Agentic Program Repair from Test Failures at Scale: A Neuro-symbolic approach with static analysis and test execution feedback

Maddila, Chandra, Tait, Adam, Chang, Claire, Cheng, Daniel, Ahmad, Nauman, Murali, Vijayaraghavan, Roch, Marshall, Avondet, Arnaud, Meltzer, Aaron, Montalvao, Victor, Hopko, Michael, Waterson, Chris, Thakkar, Parth, Fernandez, Renuka, Kristensen, Kristian, Barzily, Sivan, Chen, Sherry, Abreu, Rui, Nagappan, Nachiappan, Shodjai, Payam, Murphy, Killian, Everingham, James, Ramani, Aparna, Rigby, Peter C.

Aim: With the advent of LLMs, sophisticated agentic program repair has become viable at large organizations with large codebases. In this work, we develop an Engineering Agent that fixes the source code based on test failures at scale across diverse software offerings internally. Method: Using Llama as the base, we employ the ReAct harness to develop an agent. We start with a test failure that was triaged by a rule-based test failure bot. We then set up an agentic harness and allow the agent to reason and run a set of 15 actions from reading a file to generating a patch. We provide feedback to the agent through static analysis and test failures so it can refine its solution. We leverage an LLM-as-a-Judge to ensure that the patch conforms to the standards followed by a human review to land fixes. Benchmark Findings: We curated offline benchmarks for our patch generator, the Engineering Agent loop, and the LLM-as-a-Judge. In offline evaluations we found that a specialized 70B model is highly competitive with the much larger but vanilla Llama-405B. In an ablation study, we found that the ReAct harness (neural model) benefited from the symbolic information from static analysis tools and test execution traces. A model that strikes a balance between the solve rate and error rate vs the cost and latency has a benchmark solve rate of 42.3% using an average 11.8 feedback iterations. Production Findings: In a three month period, 80% of the generated fixes were reviewed, of which 31.5% were landed (25.5% of the total number of generated fixes). Feedback from Engineers: We used open coding to extract qualitative themes from engineers' feedback. We saw positive feedback in the form of quick approvals, gratitude, and surprise. We also found mixed feedback when the Engineering Agent's solution was partially correct and it served as a good starting point.

large language model, machine learning, natural language, (17 more...)

2507.18755

Country: North America > United States > California (0.28)

Genre:

Research Report (0.82)
Workflow (0.68)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Ramineni, Varsha, Rahmani, Hossein A., Yilmaz, Emine, Barber, David

Beyond Internal Data: Constructing Complete Datasets for Fairness Testing

arXiv.org Machine LearningJul-25-2025

As AI becomes prevalent in high-risk domains and decision-making, it is essential to test for potential harms and biases. This urgency is reflected by the global emergence of AI regulations that emphasise fairness and adequate testing, with some mandating independent bias audits. However, procuring the necessary data for fairness testing remains a significant challenge. Particularly in industry settings, legal and privacy concerns restrict the collection of demographic data required to assess group disparities, and auditors face practical and cultural challenges in gaining access to data. Further, internal historical datasets are often insufficiently representative to identify real-world biases. This work focuses on evaluating classifier fairness when complete datasets including demographics are inaccessible. We propose leveraging separate overlapping datasets to construct complete synthetic data that includes demographic information and accurately reflects the underlying relationships between protected attributes and model features. We validate the fidelity of the synthetic data by comparing it to real data, and empirically demonstrate that fairness metrics derived from testing on such synthetic data are consistent with those obtained from real data. This work, therefore, offers a path to overcome real-world data scarcity for fairness testing, enabling independent, model-agnostic evaluation of fairness, and serving as a viable substitute where real data is limited.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2507.18561

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Banking & Finance (1.00)
Government > Regional Government (0.67)
Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Skorski, Maciej, Landowska, Alina

The Moral Gap of Large Language Models

MFT has found numerous aplications, including analysis of political ideology (Graham et al., 2009), environmental attitudes (Fein-berg and Willer, 2013), vaccine hesitancy (Amin et al., 2017), social "Everyone deserves equal access to healthcare regardless of income" Fairness "Respect your elders and follow traditional "Stand with our troops - they sacrifice everything for our freedom" Loyalty "Marriage is sacred and should be protected The advent of deep learning and particularly transformer architectures marked a significant advancement in moral content analysis. Hoover et al. (2020) first applied deep learning models to moral The recent proposal of applying LLMs to moral content categorization (Bulla et al., 2025) showed promise but suffered from methodological limitations.

large language model, machine learning, natural language, (20 more...)

doi: 10.13140/RG.2.2.26221.70880

2507.18523

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (0.54)
Health & Medicine > Therapeutic Area > Immunology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Icoz, Busra, Biricik, Goksel

Automated Code Review Using Large Language Models with Symbolic Reasoning

Code review is one of the key processes in the software development lifecycle and is essential to maintain code quality. However, manual code review is subjective and time consuming. Given its rule-based nature, code review is well suited for automation. In recent years, significant efforts have been made to automate this process with the help of artificial intelligence. Recent developments in Large Language Models (LLMs) have also emerged as a promising tool in this area, but these models often lack the logical reasoning capabilities needed to fully understand and evaluate code. To overcome this limitation, this study proposes a hybrid approach that integrates symbolic reasoning techniques with LLMs to automate the code review process. We tested our approach using the CodexGlue dataset, comparing several models, including CodeT5, CodeBERT, and GraphCodeBERT, to assess the effectiveness of combining symbolic reasoning and prompting techniques with LLMs. Our results show that this approach improves the accuracy and efficiency of automated code review.

large language model, machine learning, natural language, (16 more...)

2507.18476

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Khademi, Sedigh, Palmer, Christopher, Javed, Muhammad, Clothier, Hazel, Buttery, Jim, Dimaguila, Gerardo Luis, Black, Jim

Actively evaluating and learning the distinctions that matter: Vaccine safety signal detection from emergency triage notes

The rapid development of COVID-19 vaccines has showcased the global community's ability to combat infectious diseases. However, the need for post-licensure surveillance systems has grown due to the limited window for safety data collection in clinical trials and early widespread implementation. This study aims to employ Natural Language Processing (NLP) techniques and Active Learning (AL) to rapidly develop a classifier that detects potential vaccine safety issues from emergency department (ED) notes. ED triage notes, containing expert, succinct vital patient information at the point of entry to health systems, can significantly contribute to timely vaccine safety signal surveillance. While keyword-based classification can be effective, it may yield false positives and demand extensive keyword modifications. This is exacerbated by the infrequency of vaccination-related ED presentations and their similarity to other reasons for ED visits. NLP offers a more accurate and efficient alternative, albeit requiring annotated data, which is often scarce in the medical field. Active learning optimizes the annotation process and the quality of annotated data, which can result in faster model implementation and improved model performance. This work combines active learning, data augmentation, and active learning and evaluation techniques to create a classifier that is used to enhance vaccine safety surveillance from ED triage notes.

machine learning, natural language, training data, (16 more...)

2507.18123

Country: Oceania > Australia (0.94)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Kleinmann, Marcel, Agnihotri, Shashank, Keuper, Margret

Faithful, Interpretable Chest X-ray Diagnosis with Anti-Aliased B-cos Networks

Faithfulness and interpretability are essential for deploying deep neural networks (DNNs) in safety-critical domains such as medical imaging. B-cos networks offer a promising solution by replacing standard linear layers with a weight-input alignment mechanism, producing inherently interpretable, class-specific explanations without post-hoc methods. While maintaining diagnostic performance competitive with state-of-the-art DNNs, standard B-cos models suffer from severe aliasing artifacts in their explanation maps, making them unsuitable for clinical use where clarity is essential. In this work, we address these limitations by introducing anti-aliasing strategies using FLCPooling (FLC) and BlurPool (BP) to significantly improve explanation quality. Our experiments on chest X-ray datasets demonstrate that the modified $\text{B-cos}_\text{FLC}$ and $\text{B-cos}_\text{BP}$ preserve strong predictive performance while providing faithful and artifact-free explanations suitable for clinical application in multi-class and multi-label settings. Code available at: GitHub repository (url: https://github.com/mkleinma/B-cos-medical-paper).

explanation, machine learning, natural language, (20 more...)

2507.16761

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models

Ran, Delong, He, Xinlei, Cong, Tianshuo, Wang, Anyu, Li, Qi, Wang, Xiaoyun

Language Models (LMs) typically adhere to a "pre-training and fine-tuning" paradigm, where a universal pre-trained model can be fine-tuned to cater to various specialized domains. Low-Rank Adaptation (LoRA) has gained the most widespread use in LM fine-tuning due to its lightweight computational cost and remarkable performance. Because the proportion of parameters tuned by LoRA is relatively small, there might be a misleading impression that the LoRA fine-tuning data is invulnerable to Membership Inference Attacks (MIAs). However, we identify that utilizing the pre-trained model can induce more information leakage, which is neglected by existing MIAs. Therefore, we introduce LoRA-Leak, a holistic evaluation framework for MIAs against the fine-tuning datasets of LMs. LoRA-Leak incorporates fifteen membership inference attacks, including ten existing MIAs, and five improved MIAs that leverage the pre-trained model as a reference. In experiments, we apply LoRA-Leak to three advanced LMs across three popular natural language processing tasks, demonstrating that LoRA-based fine-tuned LMs are still vulnerable to MIAs (e.g., 0.775 AUC under conservative fine-tuning settings). We also applied LoRA-Leak to different fine-tuning settings to understand the resulting privacy risks. We further explore four defenses and find that only dropout and excluding specific LM layers during fine-tuning effectively mitigate MIA risks while maintaining utility. We highlight that under the "pre-training and fine-tuning" paradigm, the existence of the pre-trained model makes MIA a more severe risk for LoRA-based LMs. We hope that our findings can provide guidance on data privacy protection for specialized LM providers.

large language model, machine learning, pre-trained model, (20 more...)

2507.18302

Country: Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Information Security Based on LLM Approaches: A Review

Gong, Chang, Li, Zhongwen, Li, Xiaoqi

Information security is facing increasingly severe challenges, and traditional protection means are difficult to cope with complex and changing threats. In recent years, as an emerging intelligent technology, large language models (LLMs) have shown a broad application prospect in the field of information security. In this paper, we focus on the key role of LLM in information security, systematically review its application progress in malicious behavior prediction, network threat analysis, system vulnerability detection, malicious code identification, and cryptographic algorithm optimization, and explore its potential in enhancing security protection performance. Based on neural networks and Transformer architecture, this paper analyzes the technical basis of large language models and their advantages in natural language processing tasks. It is shown that the introduction of large language modeling helps to improve the detection accuracy and reduce the false alarm rate of security systems. Finally, this paper summarizes the current application results and points out that it still faces challenges in model transparency, interpretability, and scene adaptability, among other issues. It is necessary to explore further the optimization of the model structure and the improvement of the generalization ability to realize a more intelligent and accurate information security protection system.

large language model, machine learning, natural language, (16 more...)

2507.18215

Country: Asia > China (0.15)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)