AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Tuning Parameters for Boosting/Bagging/Random Forest • /r/MachineLearning

@machinelearnbotApr-17-2016, 21:05:11 GMT

Random forests usually performs quite well with the default settings. That is bootstrap resampling scheme, unpruned trees, as many trees as possible to get results in a reasonable amount of time and sqrt(#features) tried per split (mtry parameter). Then you can try to optimize the choices by checking the results on out of bag data (those each tree didnt train on because of the resampling scheme). If you have very unbalanced classes you should decide a measure of interest (such as true positive ratio) and try to tune the related parameter. Out of bag data can be trusted almost as a proper cross validation if you use enough trees and bootstrap resampling.

artificial intelligence, decision tree learning, tuning parameter, (4 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce

Yang, Kun

arXiv.org Machine LearningApr-13-2016

In this paper, we propose a one-pass algorithm on MapReduce for penalized linear regression \[f_\lambda(\alpha, \beta) = \|Y - \alpha\mathbf{1} - X\beta\|_2^2 + p_{\lambda}(\beta)\] where $\alpha$ is the intercept which can be omitted depending on application; $\beta$ is the coefficients and $p_{\lambda}$ is the penalized function with penalizing parameter $\lambda$. $f_\lambda(\alpha, \beta)$ includes interesting classes such as Lasso, Ridge regression and Elastic-net. Compared to latest iterative distributed algorithms requiring multiple MapReduce jobs, our algorithm achieves huge performance improvement; moreover, our algorithm is exact compared to the approximate algorithms such as parallel stochastic gradient decent. Moreover, what our algorithm distinguishes with others is that it trains the model with cross validation to choose optimal $\lambda$ instead of user specified one. Key words: penalized linear regression, lasso, elastic-net, ridge, MapReduce

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1307.0048

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.64)

Add feedback

[Question] help in Ridge regression • /r/MachineLearning

@machinelearnbotApr-12-2016, 03:15:13 GMT

This is why Ridge regression is a linear model, the model is a linear combination of its variables/weights.

artificial intelligence, machine learning, machinelearning

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.80)

Add feedback

Identifying Contributing Factors of Occupant Thermal Discomfort in a Smart Building

Basak, Aniruddha (Carnegie Mellon University, Silicon Valley Campus) | Mengshoel, Ole (Carnegie Mellon University, Silicon Valley Campus) | Hosein, Stefan (University of the West Indies, St. Augustine) | Martin, Rodney (NASA Ames Research Center) | Jayakumaran, Jayasudha (Carnegie Mellon University, Silicon Valley Campus) | Morga, Mario Gurrola (Zapopan's Superior Institute of Technology) | Aghav, Ishwari (Carnegie Mellon University, Silicon Valley Campus)

AAAI ConferencesApr-12-2016

Modeling occupant behavior in smart buildings to reduce energy usage in a more accurate fashion has garnered much recent attention in the literature. Predicting occupant comfort in buildings is a related and challenging problem. In some smart buildings, such as NASA AMES Sustainability Base, there are discrepancies between occupants' actual thermal discomfort and sensors based upon a weighted average of wet bulb, dry bulb, and mean radiant temperature intended to characterize thermal comfort. In this paper we attempt to find other contributing factors to occupant discomfort. For our experiment we use a dataset from a Building Automation System (BAS) in NASA Sustainability Base. We choose one conference room for our experiment and empirically establish the thermal discomfort level for the room's temperature sensor. We use various causality metrics and causal graphs to isolate candidate causes of the target room temperature. And we compare these feature sets according to their predictive capability of future instances of discomfort. Moreover, we establish a trade off between computational and statistical performance of adverse event prediction.

algorithm, artificial intelligence, machine learning, (19 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > California (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Industry:

Government > Space Agency (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Construction & Engineering > HVAC (0.94)
Information Technology > Smart Houses & Appliances (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Internet of Things (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

A Novel Method for Mining Semantics from Patterns over ECG Data

Qiu, Zhen (Peking University) | Li, Feifei (Peking University) | Hong, Shenda (Peking University) | Li, Hongyan (Peking University)

AAAI ConferencesApr-12-2016

In intensive care units (ICU), electrocardiogram (ECG) waveforms show diverse variationsunder different patients' physical conditions.In general, physicians can diagnose patients efficientlyby detecting any disorder of heart rate or rhythm and any change in the morphological pattern of ECG data,which contain underlying semantics.To help physicians better analyze ECG data in a fairly short time,it is essential to develop a novel method for mining semantics from ECG patterns.This paper is the very first time to characterize ECG patterns by using Prefix Scalable Pattern Tree (PSP-Tree).Comparing with similar currently existing methods, PSP-Tree can mine significant semantics,such as scalability, temporality and hierarchy over ECG patterns.We conduct extensive experiments on real ECG data set which are obtained from PhysioBank Community and Beijing No.3 People Hospital.The experiment results show that our method performs more feasibly and effectively than other related work.

machine learning, pattern recognition, scalable pattern, (18 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country:

Asia > China > Beijing > Beijing (0.25)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.69)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.95)
Information Technology > Data Science (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Adaptive Ensemble Learning with Confidence Bounds for Personalized Diagnosis

Tekin, Cem (Bilkent University) | Yoon, Jinsung (University of California, Los Angeles) | Schaar, Mihaela van der (University of California, Los Angeles)

AAAI ConferencesApr-12-2016

With the advances in the field of medical informatics, automated clinical decision support systems are becoming the de facto standard in personalized diagnosis. In order to establish high accuracy and confidence in personalized diagnosis, massive amounts of distributed, heterogeneous, correlated and high-dimensional patient data from different sources such as wearable sensors, mobile applications, Electronic Health Record (EHR) databases etc. need to be processed. This requires learning both locally and globally due to privacy constraints and/or distributed nature of the multi-modal medical data. In the last decade, a large number of meta-learning techniques have been proposed in which local learners make online predictions based on their locally-collected data instances, and feed these predictions to an ensemble learner,which fuses them and issues a global prediction. However, most of these works do not provide performance guarantees or, when they do,these guarantees are asymptotic. None of these existing works provide confidence estimates about the issued predictions or rate of learning guarantees for the ensemble learner. In this paper, we provide a systematic ensemble learning method called Hedged Bandits, which comes with both long run (asymptotic) and short run (rate of learning) performance guarantees. Moreover, we show that our proposed method outperforms all existing ensemble learning techniques, even in the presence of concept drift.

algorithm, prediction, prediction rule, (16 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Industry:

Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Predicting 30-Day Risk and Cost of "All-Cause" Hospital Readmissions

Sushmita, Shanu (University of Washington, Tacoma) | Khulbe, Garima (University of Washington, Tacoma) | Hasan, Aftab (University of Washington, Tacoma) | Newman, Stacey (University of Washington, Tacoma) | Ravindra, Padmashree (University of Washington, Tacoma) | Roy, Senjuti Basu (University of Washington, Tacoma) | Cock, Martine De (University of Washington, Tacoma) | Teredesai, Ankur (University of Washington, Tacoma)

AAAI ConferencesApr-12-2016

The hospital readmission rate of patients within 30 days after discharge is broadly accepted as a healthcare quality measure and cost driver in the United States. The ability to estimate hospitalization costs alongside 30 day risk-stratification for such readmissions provides additional benefit for accountable care, now a global issue and foundation for the U.S.~government mandate under the Affordable Care Act. Recent data mining efforts either predict healthcare costs or risk of hospital readmission, but not both. In this paper we present a dual predictive modeling effort that utilizes healthcare data to predict the risk and cost of any hospital readmission (``all-cause''). For this purpose, we explore machine learning algorithms to do accurate predictions of healthcare costs and risk of 30-day readmission.Results on risk prediction for ``all-cause'' readmission compared to the standardized readmission tool (LACE) are promising, and the proposed techniques for cost prediction consistently outperform baseline models and demonstrate substantially lower mean absolute error (MAE).

hospital readmission, prediction, readmission, (15 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Washington > Pierce County > Tacoma (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.69)

Industry:

Health & Medicine > Government Relations & Public Policy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.89)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.97)

Add feedback

Automatic Label Correction and Appliance Prioritization in Single Household Electricity Disaggregation

Valovage, Mark (University of Minnesota) | Gini, Maria (University of Minnesota)

AAAI ConferencesApr-12-2016

Electricity disaggregation focuses on classification ofindividual appliances by monitoring aggregate electricalsignals. In this paper we present a novel algorithmto automatically correct labels, discard contaminatedtraining samples, and boost signal to noise ratio throughhigh frequency noise reduction. We also propose amethod for prioritized classification which classifies applianceswith the most intense signals first. When testedon four houses in Kaggles Belkin dataset, these methodsautomatically relabel over 77% of all training samplesand decrease error rate by an average of 45% in bothreal power and high frequency noise classification.

appliance, classification, decision tree, (15 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > California > San Diego County > San Diego (0.04)

Industry: Energy > Power Industry (0.93)

Technology:

Information Technology > Data Science (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Active Perception for Cyber Intrusion Detection and Defense

Benton, J. (Smart Information Flow Technologies, LLC) | Goldman, Robert P. (Smart Information Flow Technologies, LLC) | Burstein, Mark (Smart information Flow Technologies, LLC) | Mueller, Joseph (Smart information Flow Technologies, LLC) | Robertson, Paul (DOLL Labs) | Cerys, Dan (DOLL Labs) | Hoffman, Andreas (DOLL Labs) | Bobrow, Rusty (Bobrow Computational Intelligence, LLC)

AAAI ConferencesApr-12-2016

Most modern network-based intrusion detection systems (IDSs) passively monitor network traffic to identify possible attacks through known vectors. Though useful, this approach has widely known high false positive rates, often causing administrators to suffer from a "cry wolf effect," where they ignore all warnings because so many have been false. In this paper, we focus on a method to reduce this effect using an idea borrowed from computer vision and neuroscience called active perception. Our approach is informed by theoretical ideas from decision theory and recent research results in neuroscience. The active perception agent allocates computational and sensing resources to (approximately) optimize its Value of Information. To do this, it draws on models to direct sensors towards phenomena of greatest interest to inform decisions about cyber defense actions. By identifying critical network assets, the organization's mission measures self-interest (and value of information). This model enables the system to follow leads from inexpensive, inaccurate alerts with targeted use of expensive, accurate sensors. This allows the deployment of sensors to build structured interpretations of situations. From these, an organization can meet mission-centered decision-making requirements with calibrated responses proportional to the likelihood of true detection and degree of threat.

hypothesis, information, sensor, (15 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Santa Clara County > Los Altos (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.69)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

Effect of Part-of-Speech and Lemmatization Filtering in Email Classification for Automatic Reply

Bonatti, Rogerio (Universidade de Sao Paulo) | Paula, Arthur G. de (Universidade de Sao Paulo) | Lamarca, Victor S. (Universidade de Sao Paulo) | Cozman, Fabio G. (Universidade de Sao Paulo)

AAAI ConferencesApr-12-2016

We study the automatic reply of email business messages in Brazilian Portuguese. We present a novel corpus containing messages from a real application, and baseline categorization experiments using Naive Bayes and Support Vector Machines. We then discuss the effect of lemmatization and the role of part-of-speech tagging filtering on precision and recall. Support Vector Machines classification coupled with non-lemmatized selection of verbs and nouns, adjectives and adverbs was the best approach, with 87.3% maximum accuracy. Straightforward lemmatization in Portuguese led to the lowest classification results in the group, with 85.3% and 81.7% precision in SVM and Naive Bayes respectively. Thus, while lemmatization reduced precision and recall, part-of-speech filtering improved overall results.

artificial intelligence, classification, machine learning, (15 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country:

South America > Brazil > São Paulo (0.06)
South America > Brazil > Rio Grande do Sul (0.04)
South America > Brazil > Minas Gerais (0.04)
(2 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.56)

Add feedback