AITopics

Country:

Europe (0.67)
North America > United States > New York (0.27)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Neural Information Processing SystemsFeb-8-2026, 10:16:40 GMT

Fair Classification with Adversarial Perturbations

In these cases, protected attributes can be "imputed" [

classifier, data mining, machine learning, (20 more...)

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Agha, Bochra Al, Tajeddine, Razane

Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids

arXiv.org Artificial IntelligenceOct-6-2025

Abstract--Smart grids are exposed to passive eavesdropping, where attackers listen silently to communication links. Although no data is actively altered, such reconnaissance can reveal grid topology, consumption patterns, and operational behavior, creating a gateway to more severe targeted attacks. Detecting this threat is difficult because the signals it produces are faint, short-lived, and often disappear when traffic is examined by a single node or along a single timeline. This paper introduces a graph-centric, multimodal detector that fuses physical-layer (Channel State Information (CSI), Signal-to-Noise Ratio (SNR)) and behavioral (latency, Packet Error Rate (PER), event context) indicators over ego-centric star subgraphs and short temporal windows to detect passive attacks. T o capture stealthy perturbations, a two-stage encoder is introduced: graph convolution aggregates spatial context across ego-centric star subgraphs, while a bidirectional GRU models short-term temporal dependencies. The encoder transforms heterogeneous features into a unified spatio-temporal representation suitable for classification. Training occurs in a federated learning setup under FedProx, improving robustness to heterogeneous local raw data and contributing to the trustworthiness of decentralized training; raw measurements remain on client devices. The model achieves a testing accuracy of 98.32% per-timestep (F1 The results demonstrate that combining spatial and temporal context enables reliable detection of stealthy reconnaissance while maintaining low false-positive rates, making the approach suitable for non-IID federated smart-grid deployments. Smart grids [1] define new energy systems constructed on the notion of bidirectional communication between consumers and utilities. They enable the management of real-time data across distributed nodes. However, this open communication exposes the grid to significant risks of passive attacks, which pose a threat to privacy, trust, and stability [2].

artificial intelligence, machine learning, node, (14 more...)

2510.02371

Country: Asia (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceSep-16-2025

Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems

Shukla, Manish

Agentic artificial intelligence (AI) -- multi-agent systems that combine large language models with external tools and autonomous planning -- are rapidly transitioning from research laboratories into high-stakes domains. Our earlier "Basic" paper introduced a five-axis framework and proposed preliminary metrics such as goal drift and harm reduction but did not provide an algorithmic instantiation or empirical evidence. This "Advanced" sequel fills that gap. First, we revisit recent benchmarks and industrial deployments to show that technical metrics still dominate evaluations: a systematic review of 84 papers from 2023--2025 found that 83% report capability metrics while only 30% consider human-centred or economic axes [2]. Second, we formalise an Adaptive Multi-Dimensional Monitoring (AMDM) algorithm that normalises heterogeneous metrics, applies per-axis exponentially weighted moving-average thresholds and performs joint anomaly detection via the Mahalanobis distance [7]. Third, we conduct simulations and real-world experiments. AMDM cuts anomaly-detection latency from 12.3 s to 5.6 s on simulated goal drift and reduces false-positive rates from 4.5% to 0.9% compared with static thresholds. We present a comparison table and ROC/PR curves, and we reanalyse case studies to surface missing metrics. Code, data and a reproducibility checklist accompany this paper to facilitate replication. The code supporting this work is available at https://github.com/Manishms18/Adaptive-Multi-Dimensional-Monitoring.

artificial intelligence, data mining, machine learning, (17 more...)

2509.00115

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance (0.72)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Mohammad Saberian, Nuno Vasconcelos

Multi-Resolution Cascades for Multiclass Object Detection

Neural Information Processing SystemsFeb-9-2025, 15:46:45 GMT

An algorithm for learning fast multiclass object detection cascades is introduced. It produces multi-resolution (MRes) cascades, whose early stages are binary target vs. non-target detectors that eliminate false positives, late stages multiclass classifiers that finely discriminate target classes, and middle stages have intermediate numbers of classes, determined in a data-driven manner. This MRes structure is achieved with a new structurally biased boosting algorithm (SBBoost). SBBost extends previous multiclass boosting approaches, whose boosting mechanisms are shown to implement two complementary data-driven biases: 1) the standard bias towards examples difficult to classify, and 2) a bias towards difficult classes. It is shown that structural biases can be implemented by generalizing this class-based bias, so as to encourage the desired MRes structure.

artificial intelligence, cascade, machine learning, (19 more...)

Country: North America > United States > California > San Diego County > San Diego (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Tokpanov, Yury, Millidge, Beren, Glorioso, Paolo, Pilault, Jonathan, Ibrahim, Adam, Whittington, James, Anthony, Quentin

Zyda: A 1.3T Dataset for Open Language Modeling

arXiv.org Artificial IntelligenceJun-4-2024

The size of large language models (LLMs) has scaled dramatically in recent years and their computational and data requirements have surged correspondingly. State-of-the-art language models, even at relatively smaller sizes, typically require training on at least a trillion tokens. This rapid advancement has eclipsed the growth of open-source datasets available for large-scale LLM pretraining. In this paper, we introduce Zyda (Zyphra Dataset), a dataset under a permissive license comprising 1.3 trillion tokens, assembled by integrating several major respected open-source datasets into a single, high-quality corpus. We apply rigorous filtering and deduplication processes, both within and across datasets, to maintain and enhance the quality derived from the original datasets. Our evaluations show that Zyda not only competes favorably with other open datasets like Dolma, FineWeb, and RefinedWeb, but also substantially improves the performance of comparable models from the Pythia suite. Our rigorous data processing methods significantly enhance Zyda's effectiveness, outperforming even the best of its constituent datasets when used independently.

large language model, machine learning, natural language, (20 more...)

2406.01981

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

arXiv.org Artificial IntelligenceMay-14-2024

Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram

Min, Aehong, Wang, Xuan, Correia, Rion Brattig, Rozum, Jordan, Miller, Wendy R., Rocha, Luis M.

We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task.

annotator, dictionary refinement, social media, (13 more...)

2405.08784

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Broome County > Binghamton (0.04)
(7 more...)

Genre: Research Report > New Finding (0.94)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Epilepsy (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Neural Information Processing SystemsMar-13-2024, 11:30:52 GMT

Multi-Resolution Cascades for Multiclass Object Detection

cascade, detection, detector, (17 more...)

Country: North America > United States > California > San Diego County > San Diego (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Jallad, Khloud Al, Aljnidi, Mohamad, Desouki, Mohammad Said

Anomaly detection optimization using big data and deep learning to reduce false-positive

arXiv.org Artificial IntelligenceSep-28-2022

Anomaly-based Intrusion Detection System (IDS) has been a hot research topic because of its ability to detect new threats rather than only memorized signatures threats of signature-based IDS. Especially after the availability of advanced technologies that increase the number of hacking tools and increase the risk impact of an attack. The problem of any anomaly-based model is its high false-positive rate. The high false-positive rate is the reason why anomaly IDS is not commonly applied in practice. Because anomaly-based models classify an unseen pattern as a threat where it may be normal but not included in the training dataset. This type of problem is called overfitting where the model is not able to generalize. Optimizing Anomaly-based models by having a big training dataset that includes all possible normal cases may be an optimal solution but could not be applied in practice. Although we can increase the number of training samples to include much more normal cases, still we need a model that has more ability to generalize. In this research paper, we propose applying deep model instead of traditional models because it has more ability to generalize. Thus, we will obtain less false-positive by using big data and deep model. We made a comparison between machine learning and deep learning algorithms in the optimization of anomaly-based IDS by decreasing the false-positive rate. We did an experiment on the NSL-KDD benchmark and compared our results with one of the best used classifiers in traditional learning in IDS optimization. The experiment shows 10% lower false-positive by using deep learning instead of traditional learning.

artificial intelligence, dataset, machine learning, (19 more...)

doi: 10.1186/s40537-020-00346-1

2209.13965

Country:

Asia > Middle East > Syria > Damascus Governorate > Damascus (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceApr-21-2022, 08:00:39 GMT

I ran 80,000 simulations to investigate different p-value adjustments

However, in a surprise to approximately no one who works professionally with data, we do not live in an ideal world. A variety of pressures compel many practitioners to perform tens, hundreds, or even thousands of significance tests on the same data set. Some reasons for doing this are better than others but, independent of even the very best motivations: this practice basically breaks everyday statistics. The assurance of a getting small p-value–that chance alone would spur null differences to appear this distinct only 5%, 1%, 0.1% of the time–is moot when you're playing the odds hundreds, thousands, or tens of thousands of times. A really really big number divided by a big number [or, equivalently here, multiplied by a small proportion] is still a really really big number.

adjustment, classification accuracy, simulation, (11 more...)

#artificialintelligence

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)