AITopics

1912.05198

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Asia > India > NCT > Delhi (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > India > NCT > New Delhi (0.04)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

#artificialintelligenceDec-10-2019, 21:54:12 GMT

An Introduction to Linear Regression Analysis

linear regression analysis

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

#artificialintelligenceDec-10-2019, 03:44:17 GMT

Learn classification algorithms using Python and scikit-learn

This tutorial is part of the Machine learning for developers learning path. In this tutorial, we describe the basics of solving a classification-based machine learning problem, and give you a comparative study of some of the current most popular algorithms. In the open Notebook, click Run to run the cells one at a time. The rest of the tutorial follows the order of the Notebook. Classification is when the feature to be predicted contains categories of values.

algorithm, classification algorithm, tutorial, (15 more...)

Industry: Information Technology (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)

Parsa, Mohsen Ghassemi, Zare, Hadi, Ghatee, Mehdi

Unsupervised Feature Selection based on Adaptive Similarity Learning and Subspace Clustering

arXiv.org Machine LearningDec-10-2019

Unsupervised Feature Selection based on Adaptive Similarity Learning and Subspace Clustering Mohsen Ghassemi Parsa a, Hadi Zare a,, Mehdi Ghatee b a Faculty of New Sciences and Technologies, University of Tehran, Iran b Department of Computer Science, Amirkabir University of Technology, IranAbstract Feature selection methods have an important role on the readability of data and the reduction of complexity of learning algorithms. In recent years, a variety of efforts are investigated on feature selection problems based on unsupervised viewpoint due to the laborious labeling task on large datasets. In this paper, we propose a novel approach on unsupervised feature selection initiated from the subspace clustering to preserve the similarities by representation learning of low dimensional subspaces among the samples. A self-expressive model is employed to implicitly learn the cluster similarities in an adaptive manner. The proposed method not only maintains the sample similarities through subspace clustering, but it also captures the discriminative information based on a regularized regression model. In line with the convergence analysis of the proposed method, the experimental results on benchmark datasets demonstrate the effectiveness of our approach as compared with the state of the art methods.

feature selection, objective function, unsupervised feature selection, (11 more...)

1912.05458

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.24)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.54)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

arXiv.org Machine LearningDec-10-2019

Fenton-Wilkinson Order Statistics and German Tanks: A Case Study of an Orienteering Relay Race

Pääkkönen, Joonas

Ordinal regression falls between discrete-valued classification and continuous-valued regression. Ordinal target variables can be associated with ranked random variables. These random variables are known as order statistics and they are closely related to ordinal regression. However, the challenge of using order statistics for ordinal regression prediction is finding a suitable parent distribution. In this work, we provide a case study of a real-world orienteering relay race by viewing it as a random process. For this process, we show that accurate order statistical ordinal regression predictions of final team rankings, or places, can be obtained by assuming a lognormal distribution of individual leg times. Moreover, we apply Fenton-Wilkinson approximations to intermediate changeover times alongside an estimator for the total number of teams as in the notorious German tank problem. The purpose of this work is, in part, to spark interest in studying the applicability of order statistics in ordinal regression problems.

order statistics, ordinal regression, regression, (15 more...)

1912.05034

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.32)

Dereziński, Michał, Liang, Feynman, Mahoney, Michael W.

Exact expressions for double descent and implicit regularization via surrogate random design

arXiv.org Machine LearningDec-10-2019

Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly over-parameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical models such as linear regression which can also generalize well in the over-parameterized regime. We build on recent advances in Randomized Numerical Linear Algebra (RandNLA) to provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator. Our approach involves constructing what we call a surrogate random design to replace the standard i.i.d. design of the training sample. This surrogate design admits exact expressions for the mean squared error of the estimator while preserving the key properties of the standard design. We also establish an exact implicit regularization result for over-parameterized training samples. In particular, we show that, for the surrogate design, the implicit bias of the unregularized minimum norm estimator precisely corresponds to solving a ridge-regularized least squares problem on the population distribution.

expression, matrix, regularization, (15 more...)

1912.04533

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Plymouth County > Hanover (0.04)
(6 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

#artificialintelligenceDec-9-2019, 06:03:51 GMT

PySpark for Data Science Workflows

Demonstrated experience in PySpark is one of the most desirable competencies that employers are looking for when building data science teams, because it enables these teams to own live data products. While I've previously blogged about PySpark, Parallelization, and UDFs, I wanted to provide a proper overview of this topic as a book chapter. I'm sharing this complete chapter, because I want to encourage the adoption of PySpark as a tool for data scientists. All code examples from this post are available here, and all prerequisites are covered in the sample chapters here. You might want to grab some snacks before diving in! Spark is a general-purpose computing framework that can scale to massive data volumes. It builds upon prior big data tools such as Hadoop and MapReduce, while providing significant improvements in the expressivity of the languages it supports. One of the core components of Spark is resilient distributed datasets (RDD), which enable clusters of machines to perform workloads in a coordinated, and fault-tolerant process. In more recent versions of Spark, the Dataframe API provides an abstraction on top of RDDs that resembles the same data structure in R and Pandas. PySpark is the Python interface to Spark, and it provides an API for working with large-scale datasets in a distributed computing environment. PySpark is an extremely valuable tool for data scientists, because it can streamline the process for translating prototype models into production-grade model workflows. At Zynga, our data science team owns a number of production-grade systems that provide useful signals to our game and marketing teams. By using PySpark, we've been able to reduce the amount of support we need from engineering teams to scale up models from concept to production.

artificial intelligence, data mining, machine learning, (19 more...)

Genre: Workflow (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

#artificialintelligenceDec-9-2019, 02:33:34 GMT

Interpretability: Cracking open the black box – Part I

Interpretability is the degree to which a human can understand the cause of a decision – Miller, Tim[1] Explainable AI (XAI) is a sub-field of AI which has been gaining ground in the recent past. And as I machine learning practitioner dealing with customers day in and day out, I can see why. I've been an analytics practitioner for more than 5 years and I swear, the hardest part of a machine learning project is not creating the perfect model which beats all the benchmarks. It's the part where you convince the customer why and how it works. Humans always had a dichotomy when faced with the unknown.

coefficient, linear regression, regression, (15 more...)

Genre: Research Report (0.30)

Industry: Transportation > Air (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)

Maragos, Petros, Theodosis, Emmanouil

Tropical Geometry and Piecewise-Linear Approximation of Curves and Surfaces on Weighted Lattices

arXiv.org Machine LearningDec-9-2019

Tropical Geometry and Mathematical Morphology share the same max-plus and min-plus semiring arithmetic and matrix algebra. In this chapter we summarize some of their main ideas and common (geometric and algebraic) structure, generalize and extend both of them using weighted lattices and a max-$\star$ algebra with an arbitrary binary operation $\star$ that distributes over max, and outline applications to geometry, machine learning, and optimization. Further, we generalize tropical geometrical objects using weighted lattices. Finally, we provide the optimal solution of max-$\star$ equations using morphological adjunctions that are projections on weighted lattices, and apply it to optimal piecewise-linear regression for fitting max-$\star$ tropical curves and surfaces to arbitrary data that constitute polygonal or polyhedral shape approximations. This also includes an efficient algorithm for solving the convex regression problem of data fitting with max-affine functions.

artificial intelligence, machine learning, multiplication, (17 more...)

1912.03891

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

arXiv.org Machine LearningDec-9-2019

Privacy-preserving data sharing via probabilistic modelling

Jälkö, Joonas, Lagerspetz, Eemil, Haukka, Jari, Tarkoma, Sasu, Kaski, Samuel, Honkela, Antti

Differential privacy allows quantifying privacy loss from computations on sensitive personal data. This loss grows with the number of accesses to the data, making it hard to open the use of such data while respecting privacy. To avoid this limitation, we propose privacy-preserving release of a synthetic version of a data set, which can be used for an unlimited number of analyses with any methods, without affecting the privacy guarantees. The synthetic data generation is based on differentially private learning of a generative probabilistic model which can capture the probability distribution of the original data. We demonstrate empirically that we can reliably reproduce statistical discoveries from the synthetic data. We expect the method to have broad use in sharing anonymized versions of key data sets for research.

discovery, regression coefficient, synthetic data, (15 more...)

1912.04439

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.94)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)