AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

I ran 80,000 simulations to investigate different p-value adjustments

#artificialintelligenceApr-21-2022, 08:00:39 GMT

However, in a surprise to approximately no one who works professionally with data, we do not live in an ideal world. A variety of pressures compel many practitioners to perform tens, hundreds, or even thousands of significance tests on the same data set. Some reasons for doing this are better than others but, independent of even the very best motivations: this practice basically breaks everyday statistics. The assurance of a getting small p-value–that chance alone would spur null differences to appear this distinct only 5%, 1%, 0.1% of the time–is moot when you're playing the odds hundreds, thousands, or tens of thousands of times. A really really big number divided by a big number [or, equivalently here, multiplied by a small proportion] is still a really really big number.

adjustment, classification accuracy, simulation, (11 more...)

#artificialintelligence

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

Misiakiewicz, Theodor

arXiv.org Machine LearningApr-21-2022

We study the spectrum of inner-product kernel matrices, i.e., $n \times n$ matrices with entries $h (\langle \textbf{x}_i ,\textbf{x}_j \rangle/d)$ where the $( \textbf{x}_i)_{i \leq n}$ are i.i.d.~random covariates in $\mathbb{R}^d$. In the linear high-dimensional regime $n \asymp d$, it was shown that these matrices are well approximated by their linearization, which simplifies into the sum of a rescaled Wishart matrix and identity matrix. In this paper, we generalize this decomposition to the polynomial high-dimensional regime $n \asymp d^\ell,\ell \in \mathbb{N}$, for data uniformly distributed on the sphere and hypercube. In this regime, the kernel matrix is well approximated by its degree-$\ell$ polynomial approximation and can be decomposed into a low-rank spike matrix, identity and a `Gegenbauer matrix' with entries $Q_\ell (\langle \textbf{x}_i , \textbf{x}_j \rangle)$, where $Q_\ell$ is the degree-$\ell$ Gegenbauer polynomial. We show that the spectrum of the Gegenbauer matrix converges in distribution to a Marchenko-Pastur law. This problem is motivated by the study of the prediction error of kernel ridge regression (KRR) in the polynomial regime $n \asymp d^\kappa, \kappa >0$. Previous work showed that for $\kappa \not\in \mathbb{N}$, KRR fits exactly a degree-$\lfloor \kappa \rfloor$ polynomial approximation to the target function. In this paper, we use our characterization of the kernel matrix to complete this picture and compute the precise asymptotics of the test error in the limit $n/d^\kappa \to \psi$ with $\kappa \in \mathbb{N}$. In this case, the test error can present a double descent behavior, depending on the effective regularization and signal-to-noise ratio at level $\kappa$. Because this double descent can occur each time $\kappa$ crosses an integer, this explains the multiple descent phenomenon in the KRR risk curve observed in several previous works.

artificial intelligence, machine learning, polynomial, (16 more...)

arXiv.org Machine Learning

2204.10425

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

Improving Proximity Classification for Contact Tracing using a Multi-channel Approach

Lanfer, Eric, Hänel, Thomas, van Rijswijk-Deij, Roland, Aschenbruck, Nils

arXiv.org Artificial IntelligenceApr-20-2022

Due to the COVID 19 pandemic, smartphone-based proximity tracing systems became of utmost interest. Many of these systems use BLE signals to estimate the distance between two persons. The quality of this method depends on many factors and, therefore, does not always deliver accurate results. In this paper, we present a multi-channel approach to improve proximity classification, and a novel, publicly available data set that contains matched IEEE 802.11 (2.4 GHz and 5 GHz) and BLE signal strength data, measured in four different environments. We have developed and evaluated a combined classification model based on BLE and IEEE 802.11 signals. Our approach significantly improves the distance classification and consequently also the contact tracing accuracy. We are able to achieve good results with our approach in everyday public transport scenarios. However, in our implementation based on IEEE 802.11 probe requests, we also encountered privacy problems and limitations due to the consistency and interval at which such probes are sent. We discuss these limitations and sketch how our approach could be improved to make it suitable for real-world deployment.

classification, ieee 802, threshold, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LCN53696.2022.9843531

2201.10401

Country:

Europe > Germany (0.04)
North America > United States (0.04)
Europe > Netherlands > South Holland > Rijswijk (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation (0.66)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)
Health & Medicine > Therapeutic Area > Immunology (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Automate your Machine Learning development pipeline with PyCaret

#artificialintelligenceApr-19-2022, 13:07:34 GMT

Data science is not easy, we all know that. Even programming requires a lot of your cycles to get fully onboarded. Don't get me wrong, I love being a developer to some extent, but is hard. You can read and watch a ton of videos about how easy is to get into programming, but as with everything in life, if you are not passionate, you may find some roadblocks along the way. I get it, you may be thinking, "Nice way to start a post!, I'm out dude", but, let me tell you that even though becoming a data scientist is a challenge, as we are becoming more data-centric, data-aware, and data-dependent, you need to sort these issues out to become a specialist, that's part of the journey.

machine learning development pipeline, pycaret, setup, (13 more...)

#artificialintelligence

Country: Asia > Taiwan (0.05)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Adaptive Noisy Data Augmentation for Regularized Estimation and Inference in Generalized Linear Models

Li, Yinan, Liu, Fang

arXiv.org Machine LearningApr-18-2022

We propose the AdaPtive Noise Augmentation (PANDA) procedure to regularize the estimation and inference of generalized linear models (GLMs). PANDA iteratively optimizes the objective function given noise augmented data until convergence to obtain the regularized model estimates. The augmented noises are designed to achieve various regularization effects, including $l_0$, bridge (lasso and ridge included), elastic net, adaptive lasso, and SCAD, as well as group lasso and fused ridge. We examine the tail bound of the noise-augmented loss function and establish the almost sure convergence of the noise-augmented loss function and its minimizer to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized parameters, based on which, inferences can be obtained simultaneously with variable selection. PANDA exhibits ensemble learning behaviors that help further decrease the generalization error. Computationally, PANDA is easy to code, leveraging existing software for implementing GLMs, without resorting to complicated optimization techniques. We demonstrate the superior or similar performance of PANDA against the existing approaches of the same type of regularizers in simulated and real-life data. We show that the inferences through PANDA achieve nominal or near-nominal coverage and are far more efficient compared to a popular existing post-selection procedure.

artificial intelligence, machine learning, regression, (19 more...)

arXiv.org Machine Learning

2204.08574

Country: North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

ML classifies gravitational-wave glitches with high accuracy - DataScienceCentral.com

#artificialintelligenceApr-17-2022, 17:00:41 GMT

Caltech/MIT's LIGO, the largest gravitational-wave observatory in the world, collects data on minute space-time ripples from cataclysmic astronomical events like colliding black holes or supernovae. Classifying LIDO's data as either an event of interest or an unknown "glitch" with high accuracy poses a challenge, due to the volume of highly complex data collected by the observatory. A recent dissertation by Columbia University's Robert Colgan [1] proposes a neural network to accurately separate non-astrophysical glitches, achieving significantly higher classification accuracy than previous methods. Gravitational waves, first proposed by Einstein in his general theory of relativity, are caused by massive objects -- like black holes--curving spacetime. The waves ripple through the universe at the speed of light, distorting space and time as they compress and stretch distances.

accuracy, glitch, ml classify gravitational-wave glitch, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.39)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.39)

Add feedback

Anomaly Detection in Autonomous Driving: A Survey

Bogdoll, Daniel, Nitsche, Maximilian, Zöllner, J. Marius

arXiv.org Artificial IntelligenceApr-17-2022

Nowadays, there are outstanding strides towards a future with autonomous vehicles on our roads. While the perception of autonomous vehicles performs well under closed-set conditions, they still struggle to handle the unexpected. This survey provides an extensive overview of anomaly detection techniques based on camera, lidar, radar, multimodal and abstract object level data. We provide a systematization including detection approach, corner case level, ability for an online application, and further attributes. We outline the state-of-the-art and point out current research gaps.

data mining, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CVPRW56347.2022.00495

2204.07974

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Overview (0.88)
Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (0.65)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Latest Research From Stanford Introduces 'Domino': A Python Tool for Identifying and Describing Underperforming Slices in Machine Learning Models

#artificialintelligenceApr-15-2022, 02:30:46 GMT

Machine learning and Artificial Intelligence models have gained promising results in recent years. The major factor behind their success is the availability and development of vast datasets. However, regardless of how many terabytes of data you have or how skilled you are at data science, machine learning models will be useless and even dangerous if you can't make sense of data records. A slice is a collection of data samples with a common feature. For example, in a picture dataset, photographs of antique vehicles make up a slice.

domino, machine learning model, model underperform, (14 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.05)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

Multimodal spatiotemporal graph neural networks for improved prediction of 30-day all-cause hospital readmission

Tang, Siyi, Tariq, Amara, Dunnmon, Jared, Sharma, Umesh, Elugunti, Praneetha, Rubin, Daniel, Patel, Bhavik N., Banerjee, Imon

arXiv.org Artificial IntelligenceApr-14-2022

Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such as (a) only patients with certain conditions are considered, (b) existing approaches do not leverage data temporality, (c) individual admissions are assumed independent of each other, which is unrealistic, (d) prior studies are usually limited to single source of data and single center data. To address these limitations, we propose a multimodal, modality-agnostic spatiotemporal graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission that fuses multimodal in-patient longitudinal data. By training and evaluating our methods using longitudinal chest radiographs and electronic health records from two independent centers, we demonstrate that MM-STGNN achieves AUROC of 0.79 on both primary and external datasets. Furthermore, MM-STGNN significantly outperforms the current clinical reference standard, LACE+ score (AUROC=0.61), on the primary dataset. For subset populations of patients with heart and vascular disease, our model also outperforms baselines on predicting 30-day readmission (e.g., 3.7 point improvement in AUROC in patients with heart disease). Lastly, qualitative model interpretability analysis indicates that while patients' primary diagnoses were not explicitly used to train the model, node features crucial for model prediction directly reflect patients' primary diagnoses. Importantly, our MM-STGNN is agnostic to node feature modalities and could be utilized to integrate multimodal data for triaging patients in various downstream resource allocation tasks.

artificial intelligence, machine learning, readmission, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JBHI.2023.3236888

2204.06766

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
North America > Canada > Ontario (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Global Counterfactual Explanations: Investigations, Implementations and Improvements

Ley, Dan, Mishra, Saumitra, Magazzeni, Daniele

arXiv.org Machine LearningApr-14-2022

Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2204.06917

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > United States > New York (0.04)
North America > United States > Maryland (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback