AITopics

2003.09737

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Oceania > Australia > Tasmania (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.53)

#artificialintelligenceMar-20-2020, 19:07:23 GMT

Understanding Voting Outcomes through Data Science

After the surprising results of the 2016 presidential election, I wanted to better understand the socio-economic and cultural factors that played a role in voting behavior. With the election results in the books, I thought it would be fun to reverse-engineer a predictive model of voting behavior based on some of the widely available county-level data sets. For example, if you want to answer the question "how could the election have been different if the percentage of people with at least a bachelor's degree had been 2% higher nationwide?" you can simply toggle that parameter up to 1.02 and click "Submit" to find out. The predictions are driven by a random forest classification model that has been tuned and trained on 71 distinct county-level attributes. Using real data, the model has a predictive accuracy of 94.6% and an ROC AUC score of 96%.

decision tree, presidential election, split point, (15 more...)

#artificialintelligence

Country:

North America > United States > Wisconsin (0.05)
North America > United States > Ohio (0.05)
North America > United States > Michigan (0.05)

Industry: Government > Voting & Elections (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)

Sohrab, Fahad, Raitoharju, Jenni, Iosifidis, Alexandros, Gabbouj, Moncef

Ellipsoidal Subspace Support Vector Data Description

arXiv.org Artificial IntelligenceMar-20-2020

In this paper, we propose a novel method for transforming data into a low-dimensional space optimized for one-class classification. The proposed method iteratively transforms data into a new subspace optimized for ellipsoidal encapsulation of target class data. We provide both linear and non-linear formulations for the proposed method. The method takes into account the covariance of the data in the subspace; hence, it yields a more generalized solution as compared to Subspace Support Vector Data Description for a hypersphere. We propose different regularization terms expressing the class variance in the projected space. We compare the results with classic and recently proposed one-class classification methods and achieve better results in the majority of cases. The proposed method is also noticed to converge much faster than recently proposed Subspace Support Vector Data Description.

classification, data description, regularization term, (10 more...)

arXiv.org Artificial Intelligence

2003.09504

Country:

Europe > Finland > Pirkanmaa > Tampere (0.04)
Europe > Finland > Central Finland > Jyväskylä (0.04)
Europe > Denmark > Central Jutland > Aarhus (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Machine LearningMar-20-2020

Prob2Vec: Mathematical Semantic Embedding for Problem Retrieval in Adaptive Tutoring

Su, Du, Yekkehkhany, Ali, Lu, Yi, Lu, Wenmiao

We propose a new application of embedding techniques for problem retrieval in adaptive tutoring. The objective is to retrieve problems whose mathematical concepts are similar. There are two challenges: First, like sentences, problems helpful to tutoring are never exactly the same in terms of the underlying concepts. Instead, good problems mix concepts in innovative ways, while still displaying continuity in their relationships. Second, it is difficult for humans to determine a similarity score that is consistent across a large enough training set. We propose a hierarchical problem embedding algorithm, called Prob2Vec, that consists of abstraction and embedding steps. Prob2Vec achieves 96.88\% accuracy on a problem similarity test, in contrast to 75\% from directly applying state-of-the-art sentence embedding methods. It is interesting that Prob2Vec is able to distinguish very fine-grained differences among problems, an ability humans need time and effort to acquire. In addition, the sub-problem of concept labeling with imbalanced training data set is interesting in its own right. It is a multi-label problem suffering from dimensionality explosion, which we propose ways to ameliorate. We propose the novel negative pre-training algorithm that dramatically reduces false negative and positive ratios for classification, using an imbalanced training data set.

neural network, representation, similarity, (14 more...)

2003.10838

Country: North America > United States > Illinois (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (0.68)
Research Report (0.64)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Barstugan, Mucahid, Ozkaya, Umut, Ozturk, Saban

Coronavirus (COVID-19) Classification using CT Images by Machine Learning Methods

arXiv.org Machine LearningMar-20-2020

This study presents early phase detection of Coronavirus (COVID-19), which is named by World Health Organization (WHO), by machine learning methods. The detection process was implemented on abdominal Computed Tomography (CT) images. The expert radiologists detected from CT images that COVID-19 shows different behaviours from other viral pneumonia. Therefore, the clinical experts specify that COV\.ID-19 virus needs to be diagnosed in early phase. For detection of the COVID-19, four different datasets were formed by taking patches sized as 16x16, 32x32, 48x48, 64x64 from 150 CT images. The feature extraction process was applied to patches to increase the classification performance. Grey Level Co-occurrence Matrix (GLCM), Local Directional Pattern (LDP), Grey Level Run Length Matrix (GLRLM), Grey-Level Size Zone Matrix (GLSZM), and Discrete Wavelet Transform (DWT) algorithms were used as feature extraction methods. Support Vector Machines (SVM) classified the extracted features. 2-fold, 5-fold and 10-fold cross-validations were implemented during the classification process. Sensitivity, specificity, accuracy, precision, and F-score metrics were used to evaluate the classification performance. The best classification accuracy was obtained as 99.68% with 10-fold cross-validation and GLSZM feature extraction method.

classification result, ct image, extraction method, (15 more...)

2003.09424

Country:

Asia > China > Hubei Province > Wuhan (0.05)
Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.04)
Asia > Middle East > Republic of Türkiye > Amasya Province > Amasya (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Srinivasa, Rakshith S, Davenport, Mark A, Romberg, Justin

Localized sketching for matrix multiplication and ridge regression

arXiv.org Machine LearningMar-20-2020

We consider sketched approximate matrix multiplication and ridge regression in the novel setting of localized sketching, where at any given point, only part of the data matrix is available. This corresponds to a block diagonal structure on the sketching matrix. We show that, under mild conditions, block diagonal sketching matrices require only O(stable rank / \epsilon^2) and $O( stat. dim. \epsilon)$ total sample complexity for matrix multiplication and ridge regression, respectively. This matches the state-of-the-art bounds that are obtained using global sketching matrices. The localized nature of sketching considered allows for different parts of the data matrix to be sketched independently and hence is more amenable to computation in distributed and streaming settings and results in a smaller memory and computational footprint.

complexity, matrix, sample complexity, (12 more...)

2003.09097

Country:

Oceania > Australia (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.83)

Jun, Kwang-Sung, Cutkosky, Ashok, Orabona, Francesco

Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration

Neural Information Processing SystemsMar-19-2020, 03:02:40 GMT

In this paper we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS). We propose a new randomized algorithm that has optimal generalization error bounds with respect to the square loss, closing a long-standing gap between upper and lower bounds. Moreover, we show that our algorithm has faster finite-time and asymptotic rates on problems where the Bayes risk with respect to the square loss is small. We state our results using standard tools from the theory of least square regression in RKHSs, namely, the decay of the eigenvalues of the associated integral operator and the complexity of the optimal predictor measured through the integral operator. Papers published at the Neural Information Processing Systems Conference.

kernel truncated randomized ridge regression, optimal rate, rate and low noise acceleration, (3 more...)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Jain, Lalit, Jamieson, Kevin G.

A New Perspective on Pool-Based Active Classification and False-Discovery Control

Neural Information Processing SystemsMar-19-2020, 02:30:34 GMT

In many scientific settings there is a need for adaptive experimental design to guide the process of identifying regions of the search space that contain as many true positives as possible subject to a low rate of false discoveries (i.e. Such regions of the search space could differ drastically from a predicted set that minimizes 0/1 error and accurate identification could require very different sampling strategies. Like active learning for binary classification, this experimental design cannot be optimally chosen a priori, but rather the data must be taken sequentially and adaptively in a closed loop. However, unlike classification with 0/1 error, collecting data adaptively to find a set with high true positive rate and low false discovery rate (FDR) is not as well understood. In this paper, we provide the first provably sample efficient adaptive algorithm for this problem.

active classification and false-discovery control, new perspective, pool-based active classification, (2 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Chzhen, Evgenii, Denis, Christophe, Hebiri, Mohamed, Oneto, Luca, Pontil, Massimiliano

Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification

Neural Information Processing SystemsMar-19-2020, 01:48:12 GMT

We study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets.

classifier, consistent fair binary classification, leveraging labeled and unlabeled data, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Ravuri, Suman, Vinyals, Oriol

Classification Accuracy Score for Conditional Generative Models

Neural Information Processing SystemsMar-19-2020, 01:33:23 GMT

Deep generative models (DGMs) of images are now sufficiently mature that they produce nearly photorealistic samples and obtain scores similar to the data distribution on heuristics such as Frechet Inception Distance (FID). These results, especially on large-scale datasets such as ImageNet, suggest that DGMs are learning the data distribution in a perceptually meaningful space and can be used in downstream tasks. To test this latter hypothesis, we use class-conditional generative models from a number of model classes--variational autoencoders, autoregressive models, and generative adversarial networks (GANs)--to infer the class labels of real data. We perform this inference by training an image classifier using only synthetic data and using the classifier to predict labels on real data. The performance on this task, which we call Classification Accuracy Score (CAS), reveals some surprising results not identified by traditional metrics and constitute our contributions.

classification accuracy score, conditional generative model, generative model, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.63)