AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

A Parameter-Free Two-Bit Covariance Estimator with Improved Operator Norm Error Rate

arXiv.org Machine LearningAug-30-2023

A covariance matrix estimator using two bits per entry was recently developed by Dirksen, Maly and Rauhut [Annals of Statistics, 50(6), pp. 3538-3562]. The estimator achieves near minimax rate for general sub-Gaussian distributions, but also suffers from two downsides: theoretically, there is an essential gap on operator norm error between their estimator and sample covariance when the diagonal of the covariance matrix is dominated by only a few entries; practically, its performance heavily relies on the dithering scale, which needs to be tuned according to some unknown parameters. In this work, we propose a new 2-bit covariance matrix estimator that simultaneously addresses both issues. Unlike the sign quantizer associated with uniform dither in Dirksen et al., we adopt a triangular dither prior to a 2-bit quantizer inspired by the multi-bit uniform quantizer. By employing dithering scales varying across entries, our estimator enjoys an improved operator norm error rate that depends on the effective rank of the underlying covariance matrix rather than the ambient dimension, thus closing the theoretical gap. Moreover, our proposed method eliminates the need of any tuning parameter, as the dithering scales are entirely determined by the data. Experimental results under Gaussian samples are provided to showcase the impressive numerical performance of our estimator. Remarkably, by halving the dithering scales, our estimator oftentimes achieves operator norm errors less than twice of the errors of sample covariance.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

2308.16059

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Assessing Cyclostationary Malware Detection via Feature Selection and Classification

Nkongolo, Mike

arXiv.org Artificial IntelligenceAug-29-2023

Cyclostationarity involves periodic statistical variations in signals and processes, commonly used in signal analysis and network security. In the context of attacks, cyclostationarity helps detect malicious behaviors within network traffic, such as traffic patterns in Distributed Denial of Service (DDoS) attacks or hidden communication channels in malware. This approach enhances security by identifying abnormal patterns and informing Network Intrusion Detection Systems (NIDSs) to recognize potential attacks, enhancing protection against both known and novel threats. This research focuses on identifying cyclostationary malware behavior and its detection. The main goal is to pinpoint essential cyclostationary features used in NIDSs. These features are extracted using algorithms such as Boruta and Principal Component Analysis (PCA), and then categorized to find the most significant cyclostationary patterns. The aim of this article is to reveal periodically changing malware behaviors through cyclostationarity. The study highlights the importance of spotting cyclostationary malware in NIDSs by using established datasets like KDD99, NSL-KDD, and the UGRansome dataset. The UGRansome dataset is designed for anomaly detection research and includes both normal and abnormal network threat categories of zero-day attacks. A comparison is made using the Random Forest (RF) and Support Vector Machine (SVM) algorithms, while also evaluating the effectiveness of Boruta and PCA. The findings show that PCA is more promising than using Boruta alone for extracting cyclostationary network feature patterns. Additionally, the analysis identifies the internet protocol as the most noticeable cyclostationary feature pattern used by malware. Notably, the UGRansome dataset outperforms the KDD99 and NSL-KDD, achieving 99% accuracy in signature malware detection using the RF algorithm and 98% with the SVM.

cyclostationarity, dataset, malware, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-19-3035-5_41

2308.15237

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > Switzerland (0.04)
Africa > South Africa > Gauteng > Pretoria (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(2 more...)

Add feedback

Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification

Wang, Meng, Lin, Tian, Wang, Lianyu, Lin, Aidi, Zou, Ke, Xu, Xinxing, Zhou, Yi, Peng, Yuanyuan, Meng, Qingquan, Qian, Yiming, Deng, Guoyao, Wu, Zhiqun, Chen, Junhong, Lin, Jianhong, Zhang, Mingzhi, Zhu, Weifang, Zhang, Changqing, Zhang, Daoqiang, Goh, Rick Siow Mong, Liu, Yong, Pang, Chi Pui, Chen, Xinjian, Chen, Haoyu, Fu, Huazhu

arXiv.org Artificial IntelligenceAug-29-2023

Failure to recognize samples from the classes unseen during training is a major limitation of artificial intelligence in the real-world implementation for recognition and classification of retinal anomalies. We established an uncertainty-inspired open-set (UIOS) model, which was trained with fundus images of 9 retinal conditions. Besides assessing the probability of each category, UIOS also calculated an uncertainty score to express its confidence. Our UIOS model with thresholding strategy achieved an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set, external target categories (TC)-JSIEC dataset and TC-unseen testing set, respectively, compared to the F1 score of 92.20%, 80.69% and 64.74% by the standard AI model. Furthermore, UIOS correctly predicted high uncertainty scores, which would prompt the need for a manual check in the datasets of non-target categories retinal diseases, low-quality fundus images, and non-fundus images. UIOS provides a robust method for real-world screening of retinal anomalies.

ai model, dataset, uncertainty score, (15 more...)

arXiv.org Artificial Intelligence

2304.03981

Country:

Asia > China > Guangdong Province > Shantou (0.04)
Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Identifying Unique Causal Network from Nonstationary Time Series

Kang, Mingyu, Chen, Duxin, Meng, Ning, Yan, Gang, Yu, Wenwu

arXiv.org Artificial IntelligenceAug-29-2023

Identifying causality is a challenging task in many data-intensive scenarios. Many algorithms have been proposed for this critical task. However, most of them consider the learning algorithms for directed acyclic graph (DAG) of Bayesian network (BN). These BN-based models only have limited causal explainability because of the issue of Markov equivalence class. Moreover, they are dependent on the assumption of stationarity, whereas many sampling time series from complex system are nonstationary. The nonstationary time series bring dataset shift problem, which leads to the unsatisfactory performances of these algorithms. To fill these gaps, a novel causation model named Unique Causal Network (UCN) is proposed in this paper. Different from the previous BN-based models, UCN considers the influence of time delay, and proves the uniqueness of obtained network structure, which addresses the issue of Markov equivalence class. Furthermore, based on the decomposability property of UCN, a higher-order causal entropy (HCE) algorithm is designed to identify the structure of UCN in a distributed way. HCE algorithm measures the strength of causality by using nearest-neighbors entropy estimator, which works well on nonstationary time series. Finally, lots of experiments validate that HCE algorithm achieves state-of-the-art accuracy when time series are nonstationary, compared to the other baseline algorithms.

algorithm, entropy, time sery, (14 more...)

arXiv.org Artificial Intelligence

2211.10085

Country:

North America > United States > California > San Mateo County > San Mateo (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)

Add feedback

NBIAS: A Natural Language Processing Framework for Bias Identification in Text

Raza, Shaina, Garg, Muskan, Reji, Deepak John, Bashir, Syed Raza, Ding, Chen

arXiv.org Artificial IntelligenceAug-29-2023

Bias in textual data can lead to skewed interpretations and outcomes when the data is used. These biases could perpetuate stereotypes, discrimination, or other forms of unfair treatment. An algorithm trained on biased data may end up making decisions that disproportionately impact a certain group of people. Therefore, it is crucial to detect and remove these biases to ensure the fair and ethical use of data. To this end, we develop a comprehensive and robust framework NBIAS that consists of four main layers: data, corpus construction, model development and an evaluation layer. The dataset is constructed by collecting diverse data from various domains, including social media, healthcare, and job hiring portals. As such, we applied a transformer-based token classification model that is able to identify bias words/ phrases through a unique named entity BIAS. In the evaluation procedure, we incorporate a blend of quantitative and qualitative measures to gauge the effectiveness of our models. We achieve accuracy improvements ranging from 1% to 8% compared to baselines. We are also able to generate a robust understanding of the model functioning. The proposed approach is applicable to a variety of biases and contributes to the fair and ethical use of textual data.

annotation, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.01681

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Dominican Republic (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.67)

Industry:

Government (1.00)
Media > News (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EpiDeNet: An Energy-Efficient Approach to Seizure Detection for Embedded Systems

Ingolfsson, Thorir Mar, Chakraborty, Upasana, Wang, Xiaying, Beniczky, Sandor, Ducouret, Pauline, Benatti, Simone, Ryvlin, Philippe, Cossettini, Andrea, Benini, Luca

arXiv.org Artificial IntelligenceAug-28-2023

Epilepsy is a prevalent neurological disorder that affects millions of individuals globally, and continuous monitoring coupled with automated seizure detection appears as a necessity for effective patient treatment. To enable long-term care in daily-life conditions, comfortable and smart wearable devices with long battery life are required, which in turn set the demand for resource-constrained and energy-efficient computing solutions. In this context, the development of machine learning algorithms for seizure detection faces the challenge of heavily imbalanced datasets. This paper introduces EpiDeNet, a new lightweight seizure detection network, and Sensitivity-Specificity Weighted Cross-Entropy (SSWCE), a new loss function that incorporates sensitivity and specificity, to address the challenge of heavily unbalanced datasets. The proposed EpiDeNet-SSWCE approach demonstrates the successful detection of 91.16% and 92.00% seizure events on two different datasets (CHB-MIT and PEDESITE, respectively), with only four EEG channels. A three-window majority voting-based smoothing scheme combined with the SSWCE loss achieves 3x reduction of false positives to 1.18 FP/h. EpiDeNet is well suited for implementation on low-power embedded platforms, and we evaluate its performance on two ARM Cortex-based platforms (M4F/M7) and two parallel ultra-low power (PULP) systems (GAP8, GAP9). The most efficient implementation (GAP9) achieves an energy efficiency of 40 GMAC/s/W, with an energy consumption per inference of only 0.051 mJ at high performance (726.46 MMAC/s), outperforming the best ARM Cortex-based solutions by approximately 160x in energy efficiency. The EpiDeNet-SSWCE method demonstrates effective and accurate seizure detection performance on heavily imbalanced datasets, while being suited for implementation on energy-constrained platforms.

energy-efficient approach, epidenet, seizure detection

arXiv.org Artificial Intelligence

2309.07135

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)

Add feedback

Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML

Nguyen, Giang, Biswas, Sumon, Rajan, Hridesh

arXiv.org Artificial IntelligenceAug-28-2023

Machine learning (ML) is increasingly being used in critical decision-making software, but incidents have raised questions about the fairness of ML predictions. To address this issue, new tools and methods are needed to mitigate bias in ML-based software. Previous studies have proposed bias mitigation algorithms that only work in specific situations and often result in a loss of accuracy. Our proposed solution is a novel approach that utilizes automated machine learning (AutoML) techniques to mitigate bias. Our approach includes two key innovations: a novel optimization function and a fairness-aware search space. By improving the default optimization function of AutoML and incorporating fairness objectives, we are able to mitigate bias with little to no loss of accuracy. Additionally, we propose a fairness-aware search space pruning method for AutoML to reduce computational cost and repair time. Our approach, built on the state-of-the-art Auto-Sklearn tool, is designed to reduce bias in real-world scenarios. In order to demonstrate the effectiveness of our approach, we evaluated our approach on four fairness problems and 16 different ML models, and our results show a significant improvement over the baseline and existing bias mitigation techniques. Our approach, Fair-AutoML, successfully repaired 60 out of 64 buggy cases, while existing bias mitigation techniques only repaired up to 44 out of 64 cases.

artificial intelligence, fair-automl, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3611643.3616257

2306.09297

Country:

North America > United States > California > San Francisco County > San Francisco (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.88)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)

Add feedback

When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming

Mozannar, Hussein, Bansal, Gagan, Fourney, Adam, Horvitz, Eric

arXiv.org Artificial IntelligenceAug-28-2023

AI powered code-recommendation systems, such as Copilot and CodeWhisperer, provide code suggestions inside a programmer's environment (e.g., an IDE) with the aim to improve their productivity. Since, in these scenarios, programmers accept and reject suggestions, ideally, such a system should use this feedback in furtherance of this goal. In this work, we leverage prior data of programmers interacting with GitHub Copilot, a system used by millions of programmers, to develop interventions that can save programmer time. We propose a utility theory framework, which models this interaction with programmers and decides which suggestions to display. Our framework Conditional suggestion Display from Human Feedback (CDHF), relies on a cascade of models that predict suggestion acceptance to selectively hide suggestions reducing both latency and programmer verification time. Using data from 535 programmers, we perform a retrospective evaluation of CDHF and show that we can avoid displaying a significant fraction of suggestions that would have been rejected doing so without total knowledge of the suggestions themselves. We further demonstrate the importance of incorporating the programmer's latent unobserved state in deciding when to display suggestions through ablations on user study data. Finally, we showcase that using suggestion acceptance as a reward signal to know which suggestions to display leads to reduced quality suggestions indicating an unexpected pitfall.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.0493

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Counterpart Fairness -- Addressing Systematic between-group Differences in Fairness Evaluation

Wang, Yifei, Zhou, Zhengyang, Wang, Liqin, Laurentiev, John, Hou, Peter, Zhou, Li, Hong, Pengyu

arXiv.org Artificial IntelligenceAug-28-2023

When using machine learning (ML) to aid decision-making, it is critical to ensure that an algorithmic decision is fair, i.e., it does not discriminate against specific individuals/groups, particularly those from underprivileged populations. Existing group fairness methods require equal group-wise measures, which however fails to consider systematic between-group differences. The confounding factors, which are non-sensitive variables but manifest systematic differences, can significantly affect fairness evaluation. To tackle this problem, we believe that a fairness measurement should be based on the comparison between counterparts (i.e., individuals who are similar to each other with respect to the task of interest) from different groups, whose group identities cannot be distinguished algorithmically by exploring confounding factors. We have developed a propensity-score-based method for identifying counterparts, which prevents fairness evaluation from comparing "oranges" with "apples". In addition, we propose a counterpart-based statistical fairness index, termed Counterpart-Fairness (CFair), to assess fairness of ML models. Various empirical studies were conducted to validate the effectiveness of CFair. We publish our code at \url{https://github.com/zhengyjo/CFair}.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.1816

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Banking & Finance (1.00)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Closing the Gap in High-Risk Pregnancy Care Using Machine Learning and Human-AI Collaboration

Mozannar, Hussein, Utsumi, Yuria, Chen, Irene Y., Gervasi, Stephanie S., Ewing, Michele, Smith-McLallen, Aaron, Sontag, David

arXiv.org Artificial IntelligenceAug-28-2023

High-risk pregnancy (HRP) is a pregnancy complicated by factors that can adversely affect outcomes of the mother or the infant. Health insurers use algorithms to identify members who would benefit from additional clinical support. We aimed to build machine learning algorithms to identify pregnant patients and triage them by risk of complication to assist care management. In this retrospective study, we trained a hybrid Lasso regularized classifier to predict whether a patient is currently pregnant using claims data from 36735 insured members of Independence Blue Cross (IBC), a health insurer in Philadelphia. We then train a linear classifier on a subset of 12,243 members to predict whether a patient will develop gestational diabetes or gestational hypertension. These algorithms were developed in cooperation with the care management team at IBC and integrated into the dashboard. In small user studies with the nurses, we evaluated the impact of integrating our algorithms into their workflow. We find that the proposed model predicts an earlier pregnancy start date for 3.54% (95% CI 3.05-4.00) for patients with complications compared to only using a set of pre-defined codes that indicate the start of pregnancy and never later at the expense of a 5.58% (95% CI 4.05-6.40) false positive rate. The classifier for predicting complications has an AUC of 0.754 (95% CI 0.764-0.788) using data up to the patient's first trimester. Nurses from the care management program expressed a preference for the proposed models over existing approaches. The proposed model outperformed commonly used claim codes for the identification of pregnant patients at the expense of a manageable false positive rate. Our risk complication classifier shows that we can accurately triage patients by risk of complication.

artificial intelligence, complication, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.17261

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Oceania > New Zealand (0.04)
(3 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Public Health > Maternal Health (1.00)
Banking & Finance > Insurance (1.00)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback