AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Addressing Missing Sources with Adversarial Support-Matching

Kehrenberg, Thomas, Bartlett, Myles, Sharmanska, Viktoriia, Quadrianto, Novi

arXiv.org Machine LearningMar-24-2022

When trained on diverse labeled data, machine learning models have proven themselves to be a powerful tool in all facets of society. However, due to budget limitations, deliberate or non-deliberate censorship, and other problems during data collection and curation, the labeled training set might exhibit a systematic shortage of data for certain groups. We investigate a scenario in which the absence of certain data is linked to the second level of a two-level hierarchy in the data. Inspired by the idea of protected groups from algorithmic fairness, we refer to the partitions carved by this second level as "subgroups"; we refer to combinations of subgroups and classes, or leaves of the hierarchy, as "sources". To characterize the problem, we introduce the concept of classes with incomplete subgroup support. The representational bias in the training set can give rise to spurious correlations between the classes and the subgroups which render standard classification models ungeneralizable to unseen sources. To overcome this bias, we make use of an additional, diverse but unlabeled dataset, called the "deployment set", to learn a representation that is invariant to subgroup. This is done by adversarially matching the support of the training and deployment sets in representation space. In order to learn the desired invariance, it is paramount that the sets of samples observed by the discriminator are balanced by class; this is easily achieved for the training set, but requires using semi-supervised clustering for the deployment set. We demonstrate the effectiveness of our method with experiments on several datasets and variants of the problem.

artificial intelligence, clustering, machine learning, (16 more...)

arXiv.org Machine Learning

2203.13154

Country:

Europe > United Kingdom > England > East Sussex > Brighton (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

All About Logistic Regression

#artificialintelligenceMar-22-2022, 12:12:57 GMT

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Logistic Regression is a Supervised Machine Learning algorithm that is used in classification problems where we have to distinguish the dependent variable between two or more categories or classes by using the independent variables.

classification model, independent variable, logistic regression, (11 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.79)
Research Report > Experimental Study (0.79)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)

Add feedback

Sequential algorithmic modification with test data reuse

Feng, Jean, Pennello, Gene, Petrick, Nicholas, Sahiner, Berkman, Pirracchio, Romain, Gossmann, Alexej

arXiv.org Machine LearningMar-21-2022

After initial release of a machine learning algorithm, the model can be fine-tuned by retraining on subsequently gathered data, adding newly discovered features, or more. Each modification introduces a risk of deteriorating performance and must be validated on a test dataset. It may not always be practical to assemble a new dataset for testing each modification, especially when most modifications are minor or are implemented in rapid succession. Recent works have shown how one can repeatedly test modifications on the same dataset and protect against overfitting by (i) discretizing test results along a grid and (ii) applying a Bonferroni correction to adjust for the total number of modifications considered by an adaptive developer. However, the standard Bonferroni correction is overly conservative when most modifications are beneficial and/or highly correlated. This work investigates more powerful approaches using alpha-recycling and sequentially-rejective graphical procedures (SRGPs). We introduce novel extensions that account for correlation between adaptively chosen algorithmic modifications. In empirical analyses, the SRGPs control the error rate of approving unacceptable modifications and approve a substantially higher number of beneficial modifications than previous approaches.

hypothesis, modification, procedure, (13 more...)

arXiv.org Machine Learning

2203.11377

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Benchmarking emergency department triage prediction models with machine learning and large public electronic health records

Xie, Feng, Zhou, Jun, Lee, Jin Wee, Tan, Mingrui, Li, Siqi, Rajnthern, Logasan S/O, Chee, Marcel Lucas, Chakraborty, Bibhas, Wong, An-Kwok Ian, Dagan, Alon, Ong, Marcus Eng Hock, Gao, Fei, Liu, Nan

arXiv.org Artificial IntelligenceMar-20-2022

The demand for emergency department (ED) services is increasing across the globe, particularly during the current COVID-19 pandemic. Clinical triage and risk assessment have become increasingly challenging due to the shortage of medical resources and the strain on hospital infrastructure caused by the pandemic. As a result of the widespread use of electronic health records (EHRs), we now have access to a vast amount of clinical data, which allows us to develop predictive models and decision support systems to address these challenges. To date, however, there are no widely accepted benchmark ED triage prediction models based on large-scale public EHR data. An open-source benchmarking platform would streamline research workflows by eliminating cumbersome data preprocessing, and facilitate comparisons among different studies and methodologies. In this paper, based on the Medical Information Mart for Intensive Care IV Emergency Department (MIMIC-IV-ED) database, we developed a publicly available benchmark suite for ED triage predictive models and created a benchmark dataset that contains over 400,000 ED visits from 2011 to 2019. We introduced three ED-based outcomes (hospitalization, critical outcomes, and 72-hour ED reattendance) and implemented a variety of popular methodologies, ranging from machine learning methods to clinical scoring systems. We evaluated and compared the performance of these methods against benchmark tasks. Our codes are open-source, allowing anyone with MIMIC-IV-ED data access to perform the same steps in data processing, benchmark model building, and experiments. This study provides future researchers with insights, suggestions, and protocols for managing raw data and developing risk triaging tools for emergency care.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41597-022-01782-9

2111.11017

Country:

Europe (0.67)
North America > United States > Massachusetts (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

How I'm using Machine Learning to Trade in the Stock Market

#artificialintelligenceMar-17-2022, 06:41:21 GMT

Disclaimer: This article is about a simple strategy that I have used to create a trading bot. While back-testing shows that the trading bot is profitable, the trading bot is not capable of handling "black swan" events such as market crashes. Also I am not a financial advisor nor a professional trader. I am simply sharing this for entertainment purposes. So trade & read at your own risk.

buying point, lr model, threshold, (13 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.30)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

Machine Learning for Encrypted Malicious Traffic Detection: Approaches, Datasets and Comparative Study

Wang, Zihao, Fok, Kar-Wai, Thing, Vrizlynn L. L.

arXiv.org Artificial IntelligenceMar-17-2022

As people's demand for personal privacy and data security becomes a priority, encrypted traffic has become mainstream in the cyber world. However, traffic encryption is also shielding malicious and illegal traffic introduced by adversaries, from being detected. This is especially so in the post-COVID-19 environment where malicious traffic encryption is growing rapidly. Common security solutions that rely on plain payload content analysis such as deep packet inspection are rendered useless. Thus, machine learning based approaches have become an important direction for encrypted malicious traffic detection. In this paper, we formulate a universal framework of machine learning based encrypted malicious traffic detection techniques and provided a systematic review. Furthermore, current research adopts different datasets to train their models due to the lack of well-recognized datasets and feature sets. As a result, their model performance cannot be compared and analyzed reliably. Therefore, in this paper, we analyse, process and combine datasets from 5 different sources to generate a comprehensive and fair dataset to aid future research in this field. On this basis, we also implement and compare 10 encrypted malicious traffic detection algorithms. We then discuss challenges and propose future directions of research.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cose.2021.102542

2203.09332

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > Canada > New Brunswick > Fredericton (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)
Health & Medicine > Therapeutic Area > Immunology (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GAM(L)A: An econometric model for interpretable Machine Learning

Flachaire, Emmanuel, Hacheme, Gilles, Hué, Sullivan, Laurent, Sébastien

arXiv.org Machine LearningMar-17-2022

Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting.

algorithm, gamla, predictive performance, (15 more...)

arXiv.org Machine Learning

2203.11691

Country:

North America > United States (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance > Credit (0.47)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Learning Distributionally Robust Models at Scale via Composite Optimization

Haddadpour, Farzin, Kamani, Mohammad Mahdi, Mahdavi, Mehrdad, Karbasi, Amin

arXiv.org Machine LearningMar-17-2022

To train machine learning models that are robust to distribution shifts in the data, distributionally robust optimization (DRO) has been proven very effective. However, the existing approaches to learning a distributionally robust model either require solving complex optimization problems such as semidefinite programming or a first-order method whose convergence scales linearly with the number of data samples -- which hinders their scalability to large datasets. In this paper, we show how different variants of DRO are simply instances of a finite-sum composite optimization for which we provide scalable methods. We also provide empirical results that demonstrate the effectiveness of our proposed algorithm with respect to the prior art in order to learn robust models from very large datasets.

algorithm, optimization, optimization problem, (14 more...)

arXiv.org Machine Learning

2203.09607

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

'False Positive': These Exoplanets Are Actually Stars

International Business TimesMar-16-2022, 02:55:06 GMT

Thousands of exoplanets have been discovered, but researchers now learned that several of them aren't actually planets. Using updated measurement methods, they found that the objects are actually stars. Exoplanets are the planets outside our solar system, whether free-floating or orbiting a star. Thousands of them have been discovered since they were first spotted in the 1990s. So far, almost 5,000 exoplanets have been confirmed, while 5,000 more are planetary candidates, or the ones that may be planets but haven't been confirmed, the Massachusetts Institute of Technology (MIT) noted.

exoplanet, false positive, planet, (7 more...)

International Business Times

Country: North America > United States > Massachusetts (0.26)

Genre: Research Report > New Finding (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback

A Continual Learning Framework for Adaptive Defect Classification and Inspection

Sun, Wenbo, Kontar, Raed Al, Jin, Judy, Chang, Tzyy-Shuh

arXiv.org Artificial IntelligenceMar-16-2022

Recent development of advanced sensing and high computing technologies has enabled the wide adoption of machine vision to automatically inspect products' dimensional quality for efficient process control and reducing the manual inspection cost. The process control procedure requires effective data analysis methods to provide reliable inspection results. In this paper, we consider a high-volume manufacturing system that uses machine vision at the quality inspection station for automatic classification of product defects. Here classification implies both; identifying a defect and classifying its corresponding type. As a motivating example, we consider the scenario where batches of three-dimensional (3D) point cloud data are independently collected from a manufacturing process. The 3D point cloud data is obtained by measuring the 3D location of points on the product surface using a 3D scanner. The location measurements can then be used for fast classification of surface defects, and thus provide timely feedback for process control. Figure 1 (right) shows some exemplar surface defects on a wood product and the corresponding 3D point cloud measurements. The 3D point cloud measurements have a set of defining characteristics that should be considered in the development of defect classification techniques.

artificial intelligence, defect type, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/00224065.2023.2224974

2203.08796

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report (0.64)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback