AITopics

2507.07339

Country: North America > United States > South Carolina (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Surgery > Transplant Surgery (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Bradford, Mariah, Krishnaswamy, Nikhil, Blanchard, Nathaniel

The Impact of Background Speech on Interruption Detection in Collaborative Groups

arXiv.org Artificial IntelligenceJul-11-2025

Interruption plays a crucial role in collaborative learning, shaping group interactions and influencing knowledge construction. AI-driven support can assist teachers in monitoring these interactions. However, most previous work on interruption detection and interpretation has been conducted in single-conversation environments with relatively clean audio. AI agents deployed in classrooms for collaborative learning within small groups will need to contend with multiple concurrent conversations -- in this context, overlapping speech will be ubiquitous, and interruptions will need to be identified in other ways. In this work, we analyze interruption detection in single-conversation and multi-group dialogue settings. We then create a state-of-the-art method for interruption identification that is robust to overlapping speech, and thus could be deployed in classrooms. Further, our work highlights meaningful linguistic and prosodic information about how interruptions manifest in collaborative group interactions. Our investigation also paves the way for future works to account for the influence of overlapping speech from multiple groups when tracking group dialog.

artificial intelligence, interruption, machine learning, (16 more...)

2507.0728

Country: North America > United States > Colorado (0.14)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)

Elbasheer, Mohamed, Akinfaderin, Adewale

User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection

arXiv.org Artificial IntelligenceJul-11-2025

Insider threat detection presents unique challenges due to the authorized status of malicious actors and the subtlety of anomalous behaviors. Existing machine learning methods often treat user activity as isolated events, thereby failing to leverage sequential dependencies in user behavior. In this study, we propose a User-Based Sequencing (UBS) methodology, transforming the CERT insider threat dataset into structured temporal sequences suitable for deep sequential modeling. We deploy a Transformer Encoder architecture to model benign user activity and employ its reconstruction errors as anomaly scores. These scores are subsequently evaluated using three unsupervised outlier detection algorithms: One-Class SVM (OCSVM), Local Outlier Factor (LOF), and Isolation Forest (iForest). Across four rigorously designed test sets, including combinations of multiple CERT dataset releases, our UBS-Transformer pipeline consistently achieves state-of-the-art performance - notably 96.61% accuracy, 99.43% recall, 96.38% F1-score, 95.00% AUROC, and exceptionally low false negative (0.0057) and false positive (0.0571) rates. Comparative analyses demonstrate that our approach substantially outperforms tabular and conventional autoencoder baselines, underscoring the efficacy of sequential user modeling and advanced anomaly detection in the insider threat domain.

artificial intelligence, data mining, machine learning, (17 more...)

2506.23446

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Wild refitting for black box prediction

Wainwright, Martin J.

We describe and analyze a computionally efficient refitting procedure for computing high-probability upper bounds on the instance-wise mean-squared prediction error of penalized nonparametric estimates based on least-squares minimization. Requiring only a single dataset and black box access to the prediction method, it consists of three steps: computing suitable residuals, symmetrizing and scaling them with a pre-factor $ρ$, and using them to define and solve a modified prediction problem recentered at the current estimate. We refer to it as wild refitting, since it uses Rademacher residual symmetrization as in a wild bootstrap variant. Under relatively mild conditions allowing for noise heterogeneity, we establish a high probability guarantee on its performance, showing that the wild refit with a suitably chosen wild noise scale $ρ$ gives an upper bound on prediction error. This theoretical analysis provides guidance into the design of such procedures, including how the residuals should be formed, the amount of noise rescaling in the wild sub-problem needed for upper bounds, and the local stability properties of the block-box procedure. We illustrate the applicability of this procedure to various problems, including non-rigid structure-from-motion recovery with structured matrix penalties; plug-and-play image restoration with deep neural network priors; and randomized sketching with kernel methods.

artificial intelligence, machine learning, modeling & simulation, (20 more...)

2506.2146

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Transportation > Air (0.61)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

A Collectivist, Economic Perspective on AI

Jordan, Michael I.

Information technology is in the midst of a revolution in which omnipresent data collection and machine learning are impacting the human world as never before. The word "intelligence" is being used as a North Star for the development of this technology, with human cognition viewed as a baseline. This view neglects the fact that humans are social animals, and that much of our intelligence is social and cultural in origin. A related issue is that the current view treats the social consequences of technology as an afterthought. The path forward is not merely more data and compute, and not merely more attention paid to cognitive or symbolic representations, but a thorough blending of economic and social concepts with computational and inferential concepts, in the service of system-level designs in which social welfare is a first-class citizen, and with the aspiration that a new human-centric engineering field will emerge.

artificial intelligence, machine learning, platform, (17 more...)

2507.06268

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine (0.94)
Government (0.94)
Law (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Ramirez-Gonzalez, Jose-Hermenegildo, Sun, Ying

Neural Networks for Tamed Milstein Approximation of SDEs with Additive Symmetric Jump Noise Driven by a Poisson Random Measure

This work aims to estimate the drift and diffusion functions in stochastic differential equations (SDEs) driven by a particular class of Lévy processes with finite jump intensity, using neural networks. We propose a framework that integrates the Tamed-Milstein scheme with neural networks employed as non-parametric function approximators. Estimation is carried out in a non-parametric fashion for the drift function $f: \mathbb{Z} \to \mathbb{R}$, the diffusion coefficient $g: \mathbb{Z} \to \mathbb{R}$. The model of interest is given by \[ dX(t) = ξ+ f(X(t))\, dt + g(X(t))\, dW_t + γ\int_{\mathbb{Z}} z\, N(dt,dz), \] where $W_t$ is a standard Brownian motion, and $N(dt,dz)$ is a Poisson random measure on $(\mathbb{R}_{+} \times \mathbb{Z}$, $\mathcal{B} (\mathbb{R}_{+}) \otimes \mathcal{Z}$, $λ( Λ\otimes v))$, with $λ, γ> 0$, $Λ$ being the Lebesgue measure on $\mathbb{R}_{+}$, and $v$ a finite measure on the measurable space $(\mathbb{Z}, \mathcal{Z})$. Neural networks are used as non-parametric function approximators, enabling the modeling of complex nonlinear dynamics without assuming restrictive functional forms. The proposed methodology constitutes a flexible alternative for inference in systems with state-dependent noise and discontinuities driven by Lévy processes.

artificial intelligence, equation, machine learning, (18 more...)

2507.04417

Country: North America > United States > Illinois (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Kulynych, Bogdan, Gomez, Juan Felipe, Kaissis, Georgios, Hayes, Jamie, Balle, Borja, Calmon, Flavio du Pin, Raisaro, Jean Louis

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary (including worst-case) levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, Rényi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., more than 15pp accuracy increase in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.

artificial intelligence, machine learning, natural language, (17 more...)

2507.06969

Country:

North America > United States (1.00)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Hasan, Md Nahid, Murshed, Md Monzur, Hasan, Md Mahadi, Chowdhury, Faysal A.

A Machine Learning Framework for Breast Cancer Treatment Classification Using a Novel Dataset

arXiv.org Artificial IntelligenceJul-10-2025

Breast cancer (BC) remains a significant global health challenge, with personalized treatment selection complicated by the disease's molecular and clinical heterogeneity. BC treatment decisions rely on various patient-specific clinical factors, and machine learning (ML) offers a powerful approach to predicting treatment outcomes. This study utilizes The Cancer Genome Atlas (TCGA) breast cancer clinical dataset to develop ML models for predicting the likelihood of undergoing chemotherapy or hormonal therapy. The models are trained using five-fold cross-validation and evaluated through performance metrics, including accuracy, precision, recall, specificity, sensitivity, F1-score, and area under the receiver operating characteristic curve (AUROC). Model uncertainty is assessed using bootstrap techniques, while SHAP values enhance interpretability by identifying key predictors. Among the tested models, the Gradient Boosting Machine (GBM) achieves the highest stable performance (accuracy = 0.7718, AUROC = 0.8252), followed by Extreme Gradient Boosting (XGBoost) (accuracy = 0.7557, AUROC = 0.8044) and Adaptive Boosting (AdaBoost) (accuracy = 0.7552, AUROC = 0.8016). These findings underscore the potential of ML in supporting personalized breast cancer treatment decisions through data-driven insights.

artificial intelligence, cancer, machine learning, (17 more...)

2507.06243

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

arXiv.org Artificial IntelligenceJul-10-2025

ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease

Yang, Zhiyuan, Li, Kai, Ramandi, Sophia Ghamoshi, Brassard, Patricia, Khellaf, Hakim, Trinh, Vincent Quoc-Huy, Zhang, Jennifer, Chen, Lina, Rowsell, Corwyn, Varma, Sonal, Plataniotis, Kostas, Hosseini, Mahdi S.

Computational pathology (CoPath) leverages histopathology images to enhance diagnostic precision and reproducibility in clinical pathology. However, publicly available datasets for CoPath that are annotated with extensive histological tissue type (HTT) taxonomies at a granular level remain scarce due to the significant expertise and high annotation costs required. Existing datasets, such as the Atlas of Digital Pathology (ADP), address this by offering diverse HTT annotations generalized to multiple organs, but limit the capability for in-depth studies on specific organ diseases. Building upon this foundation, we introduce ADPv2, a novel dataset focused on gastrointestinal histopathology. Our dataset comprises 20,004 image patches derived from healthy colon biopsy slides, annotated according to a hierarchical taxonomy of 32 distinct HTTs of 3 levels. Furthermore, we train a multilabel representation learning model following a two-stage training procedure on our ADPv2 dataset. We leverage the VMamba architecture and achieving a mean average precision (mAP) of 0.88 in multilabel classification of colon HTTs. Finally, we show that our dataset is capable of an organ-specific in-depth study for potential biomarker discovery by analyzing the model's prediction behavior on tissues affected by different colon diseases, which reveals statistical patterns that confirm the two pathological pathways of colon cancer development. Our dataset is publicly available at https://zenodo.org/records/15307021

artificial intelligence, machine learning, natural language, (18 more...)

2507.05656

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.89)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.49)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Khan, Saif Ur Rehman, Asim, Muhammad Nabeel, Vollmer, Sebastian, Dengel, Andreas

FOLC-Net: A Federated-Optimized Lightweight Architecture for Enhanced MRI Disease Diagnosis across Axial, Coronal, and Sagittal Views

arXiv.org Artificial IntelligenceJul-10-2025

The framework is designed to improve performance in the analysis of combined as well as single anatomical perspectives for MRI disease diagnosis. It specifically addresses the performance degradation observed in state-of-the-art (SOTA) models, particularly when processing axial, coronal, and sagittal anatomical planes. The paper introduces the FOLC-Net framework, which incorporates a novel federated-optimized lightweight architecture with approximately 1.217 million parameters and a storage requirement of only 0.9 MB. FOLC-Net integrates Manta-ray foraging optimization (MRFO) mechanisms for efficient model structure generation, global model cloning for scalable training, and ConvNeXt for enhanced client adaptability. The model was evaluated on combined multi-view data as well as individual views, such as axial, coronal, and sagittal, to assess its robustness in various medical imaging scenarios. Moreover, FOLC-Net tests a ShallowFed model on different data to evaluate its ability to generalize beyond the training dataset. The results show that FOLC-Net outperforms existing models, particularly in the challenging sagittal view. For instance, FOLC-Net achieved an accuracy of 92.44% on the sagittal view, significantly higher than the 88.37% accuracy of study method (DL + Residual Learning) and 88.95% of DL models. Additionally, FOLC-Net demonstrated improved accuracy across all individual views, providing a more reliable and robust solution for medical image analysis in decentralized environments. FOLC-Net addresses the limitations of existing SOTA models by providing a framework that ensures better adaptability to individual views while maintaining strong performance in multi-view settings. The incorporation of MRFO, global model cloning, and ConvNeXt ensures that FOLC-Net performs better in real-world medical applications.

artificial intelligence, machine learning, optimization problem, (21 more...)

2507.06763

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(3 more...)