AITopics

The demand for text classification is growing significantly in web searching, data mining, web ranking, recommendation systems, and so many other fields of information and technology. This paper illustrates the text classification process on different datasets using some standard supervised machine learning techniques. Text documents can be classified through various kinds of classifiers. Labeled text documents are used to classify the text in supervised classifications. This paper applies these classifiers on different kinds of labeled documents and measures the accuracy of the classifiers. An Artificial Neural Network (ANN) model using Back Propagation Network (BPN) is used with several other models to create an independent platform for labeled and supervised text classification process. An existing benchmark approach is used to analyze the performance of classification using labeled documents. Experimental analysis on real data reveals which model works well in terms of classification accuracy.

machine learning, natural language, text classification, (13 more...)

2509.00983

Genre: Research Report (0.66)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.74)
(2 more...)

Gupta, Kishor Datta, Ahsan, Md Manjurul, Haque, Mohd Ariful, George, Roy, Wasi, Azmine Toushik

UrbanInsight: A Distributed Edge Computing Framework with LLM-Powered Data Filtering for Smart City Digital Twins

Cities today generate enormous streams of data from sensors, cameras, and connected infrastructure. While this information offers unprecedented opportunities to improve urban life, most existing systems struggle with scale, latency, and fragmented insights. This work introduces a framework that blends physics-informed machine learning, multimodal data fusion, and knowledge graph representation with adaptive, rule-based intelligence powered by large language models (LLMs). Physics-informed methods ground learning in real-world constraints, ensuring predictions remain meaningful and consistent with physical dynamics. Knowledge graphs act as the semantic backbone, integrating heterogeneous sensor data into a connected, queryable structure. At the edge, LLMs generate context-aware rules that adapt filtering and decision-making in real time, enabling efficient operation even under constrained resources. Together, these elements form a foundation for digital twin systems that go beyond passive monitoring to provide actionable insights. By uniting physics-based reasoning, semantic data fusion, and adaptive rule generation, this approach opens new possibilities for creating responsive, trustworthy, and sustainable smart infrastructures.

large language model, machine learning, natural language, (19 more...)

2509.00936

Country: North America > United States > Oklahoma (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Adaptive Contrast Adjustment Module: A Clinically-Inspired Plug-and-Play Approach for Enhanced Fetal Plane Classification

Chen, Yang, Zhao, Sanglin, Chen, Baoyu, Gustaf, Mans

Fetal ultrasound standard plane classification is essential for reliable prenatal diagnosis but faces inherent challenges, including low tissue contrast, boundary ambiguity, and operator-dependent image quality variations. To overcome these limitations, we propose a plug-and-play adaptive contrast adjustment module (ACAM), whose core design is inspired by the clinical practice of doctors adjusting image contrast to obtain clearer and more discriminative structural information. The module employs a shallow texture-sensitive network to predict clinically plausible contrast parameters, transforms input images into multiple contrast-enhanced views through differentiable mapping, and fuses them within downstream classifiers. Validated on a multi-center dataset of 12,400 images across six anatomical categories, the module consistently improves performance across diverse models, with accuracy of lightweight models increasing by 2.02 percent, accuracy of traditional models increasing by 1.29 percent, and accuracy of state-of-the-art models increasing by 1.15 percent. The innovation of the module lies in its content-aware adaptation capability, replacing random preprocessing with physics-informed transformations that align with sonographer workflows while improving robustness to imaging heterogeneity through multi-view fusion. This approach effectively bridges low-level image features with high-level semantics, establishing a new paradigm for medical image analysis under real-world image quality variations.

artificial intelligence, machine learning, module, (17 more...)

2509.00808

Country: Asia > China (0.29)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Mukhtiar, Noorain, Mahmood, Adnan, Sheng, Quan Z.

Fairness in Federated Learning: Trends, Challenges, and Opportunities

At the intersection of the cutting-edge technologies and privacy concerns, Federated Learning (FL) with its distributed architecture, stands at the forefront in a bid to facilitate collaborative model training across multiple clients while preserving data privacy. However, the applicability of FL systems is hindered by fairness concerns arising from numerous sources of heterogeneity that can result in biases and undermine a system's effectiveness, with skewed predictions, reduced accuracy, and inefficient model convergence. This survey thus explores the diverse sources of bias, including but not limited to, data, client, and model biases, and thoroughly discusses the strengths and limitations inherited within the array of the state-of-the-art techniques utilized in the literature to mitigate such disparities in the FL training process. We delineate a comprehensive overview of the several notions, theoretical underpinnings, and technical aspects associated with fairness and their adoption in FL-based multidisciplinary environments. Furthermore, we examine salient evaluation metrics leveraged to measure fairness quantitatively. Finally, we envisage exciting open research directions that have the potential to drive future advancements in achieving fairer FL frameworks, in turn, offering a strong foundation for future research in this pivotal area.

artificial intelligence, data mining, machine learning, (20 more...)

doi: 10.1002/aisy.202400836

2509.00799

Country: Oceania > Australia > New South Wales (0.28)

Genre:

Overview > Innovation (0.66)
Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Selvaganapathy, Sanjeeevan, Nasim, Mehwish

Confident, Calibrated, or Complicit: Probing the Trade-offs between Safety Alignment and Ideological Bias in Language Models in Detecting Hate Speech

We investigate the efficacy of Large Language Models (LLMs) in detecting implicit and explicit hate speech, examining whether models with minimal safety alignment (uncensored) might provide more objective classification capabilities compared to their heavily-aligned (censored) counterparts. While uncensored models theoretically offer a less constrained perspective free from moral guardrails that could bias classification decisions, our results reveal a surprising trade-off: censored models significantly outperform their uncensored counterparts in both accuracy and robustness, achieving 78.7% versus 64.1% strict accuracy. However, this enhanced performance comes with its own limitation -- the safety alignment acts as a strong ideological anchor, making censored models resistant to persona-based influence, while uncensored models prove highly malleable to ideological framing. Furthermore, we identify critical failures across all models in understanding nuanced language such as irony. We also find alarming fairness disparities in performance across different targeted groups and systemic overconfidence that renders self-reported certainty unreliable. These findings challenge the notion of LLMs as objective arbiters and highlight the need for more sophisticated auditing frameworks that account for fairness, calibration, and ideological consistency.

accuracy, large language model, machine learning, (17 more...)

2509.00673

Genre: Research Report > New Finding (1.00)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Face4FairShifts: A Large Image Benchmark for Fairness and Robust Learning across Visual Domains

Lin, Yumeng, Li, Dong, Wu, Xintao, Shao, Minglai, Zhao, Xujiang, Chen, Zhong, Zhao, Chen

Ensuring fairness and robustness in machine learning models remains a challenge, particularly under domain shifts. We present Face4FairShifts, a large-scale facial image benchmark designed to systematically evaluate fairness-aware learning and domain generalization. The dataset includes 100,000 images across four visually distinct domains with 39 annotations within 14 attributes covering demographic and facial features. Through extensive experiments, we analyze model performance under distribution shifts and identify significant gaps. Our findings emphasize the limitations of existing related datasets and the need for more effective fairness-aware domain adaptation techniques. Face4FairShifts provides a comprehensive testbed for advancing equitable and reliable AI systems. The dataset is available online at https://meviuslab.github.io/Face4FairShifts/.

artificial intelligence, dataset, machine learning, (16 more...)

2509.00658

Country:

Europe (0.67)
North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Media (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Enabling Trustworthy Federated Learning via Remote Attestation for Mitigating Byzantine Threats

Zhang, Chaoyu, Jin, Heng, Shi, Shanghao, Yu, Hexuan, Johns, Sydney, Hou, Y. Thomas, Lou, Wenjing

Federated Learning (FL) has gained significant attention for its privacy-preserving capabilities, enabling distributed devices to collaboratively train a global model without sharing raw data. However, its distributed nature forces the central server to blindly trust the local training process and aggregate uncertain model updates, making it susceptible to Byzantine attacks from malicious participants, especially in mission-critical scenarios. Detecting such attacks is challenging due to the diverse knowledge across clients, where variations in model updates may stem from benign factors, such as non-IID data, rather than adversarial behavior. Existing data-driven defenses struggle to distinguish malicious updates from natural variations, leading to high false positive rates and poor filtering performance. To address this challenge, we propose Sentinel, a remote attestation (RA)-based scheme for FL systems that regains client-side transparency and mitigates Byzantine attacks from a system security perspective. Our system employs code instrumentation to track control-flow and monitor critical variables in the local training process. Additionally, we utilize a trusted training recorder within a Trusted Execution Environment (TEE) to generate an attestation report, which is cryptographically signed and securely transmitted to the server. Upon verification, the server ensures that legitimate client training processes remain free from program behavior violation or data manipulation, allowing only trusted model updates to be aggregated into the global model. Experimental results on IoT devices demonstrate that Sentinel ensures the trustworthiness of the local training integrity with low runtime and memory overhead.

artificial intelligence, federated learning, machine learning, (15 more...)

2509.00634

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.93)
Education (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Poduri, Aryan, Tailor, Yashwant

Illuminating Patterns of Divergence: DataDios SmartDiff for Large-Scale Data Difference Analysis

Data engineering workflows require reliable differencing across files, databases, and query outputs, yet existing tools falter under schema drift, heterogeneous types, and limited explainability. SmartDiff is a unified system that combines schema-aware mapping, type-specific comparators, and parallel execution. It aligns evolving schemas, compares structured and semi-structured data (strings, numbers, dates, JSON/XML), and clusters results with labels that explain how and why differences occur. On multi-million-row datasets, SmartDiff achieves over 95 percent precision and recall, runs 30 to 40 percent faster, and uses 30 to 50 percent less memory than baselines; in user studies, it reduces root-cause analysis time from 10 hours to 12 minutes. An LLM-assisted labeling pipeline produces deterministic, schema-valid multilabel explanations using retrieval augmentation and constrained decoding; ablations show further gains in label accuracy and time to diagnosis over rules-only baselines. These results indicate SmartDiff's utility for migration validation, regression testing, compliance auditing, and continuous data quality monitoring. Index Terms: data differencing, schema evolution, data quality, parallel processing, clustering, explainable validation, big data

data mining, machine learning, natural language, (23 more...)

2509.00293

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Centralized vs. Federated Learning for Educational Data Mining: A Comparative Study on Student Performance Prediction with SAEB Microdata

Tertulino, Rodrigo

The application of data mining and artificial intelligence in education offers unprecedented potential for personalizing learning and early identification of at-risk students. However, the practical use of these techniques faces a significant barrier in privacy legislation, such as Brazil's General Data Protection Law (LGPD), which restricts the centralization of sensitive student data. To resolve this challenge, privacy-preserving computational approaches are required. The present study evaluates the feasibility and effectiveness of Federated Learning, specifically the FedProx algorithm, to predict student performance using microdata from the Brazilian Basic Education Assessment System (SAEB). A Deep Neural Network (DNN) model was trained in a federated manner, simulating a scenario with 50 schools, and its performance was rigorously benchmarked against a centralized eXtreme Gradient Boosting (XGBoost) model. The analysis, conducted on a universe of over two million student records, revealed that the centralized model achieved an accuracy of 63.96%. Remarkably, the federated model reached a peak accuracy of 61.23%, demonstrating a marginal performance loss in exchange for a robust privacy guarantee. The results indicate that Federated Learning is a viable and effective solution for building collaborative predictive models in the Brazilian educational context, in alignment with the requirements of the LGPD.

artificial intelligence, data mining, machine learning, (19 more...)

2509.00086

Country: South America > Brazil (0.48)

Genre:

Research Report > New Finding (1.00)
Workflow (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Setting > Higher Education (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Dahm, Zachery, Vasili, Konstantinos, Theos, Vasileios, Gkouliaras, Konstantinos, Richards, William, Miller, True, Jowers, Brian, Chatzidakis, Stylianos

Experimental Assessment of a Multi-Class AI/ML Architecture for Real-Time Characterization of Cyber Events in a Live Research Reactor

There is increased interest in applying Artificial Intelligence and Machine Learning (AI/ML) within the nuclear industry and nuclear engineering community. Effective implementation of AI/ML could offer benefits to the nuclear domain, including enhanced identification of anomalies, anticipation of system failures, and operational schedule optimization. However, limited work has been done to investigate the feasibility and applicability of AI/ML tools in a functioning nuclear reactor. Here, we go beyond the development of a single model and introduce a multi-layered AI/ML architecture that integrates both information technology and operational technology data streams to identify, characterize, and differentiate (i) among diverse cybersecurity events and (ii) between cyber events and other operational anomalies. Leveraging Purdue Universitys research reactor, PUR-1, we demonstrate this architecture through a representative use case that includes multiple concurrent false data injections and denial-of-service attacks of increasing complexity under realistic reactor conditions. The use case includes 14 system states (1 normal, 13 abnormal) and over 13.8 million multi-variate operational and information technology data points. The study demonstrated the capability of AI/ML to distinguish between normal, abnormal, and cybersecurity-related events, even under challenging conditions such as denial-of-service attacks. Combining operational and information technology data improved classification accuracy but posed challenges related to synchronization and collection during certain cyber events. While results indicate significant promise for AI/ML in nuclear cybersecurity, the findings also highlight the need for further refinement in handling complex event differentiation and multi-class architectures.

artificial intelligence, balance ratio, machine learning, (17 more...)

2509.00076

Country: North America > United States > Indiana > Tippecanoe County (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Power Industry > Utilities > Nuclear (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)