AITopics

1709.03645

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre:

Research Report > Experimental Study (0.89)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

#artificialintelligenceSep-9-2017, 03:00:49 GMT

The math of neural networks

Building neural networks is at the heart of any deep learning technique. Neural networks is a series of forward and backward propagations to train paramters in the model, and it is built on the unit of logistic regression classifiers. This post will expand based on the math of logistic regression to build more advanced neural networks in mathematical terms. A neural network is composed of layers, and there are three types of layers in a neural network: one input layer, one output layer, and one or many hidden layers. Each layer is built based on the same structure of logistic regression classifier, with a linear transformation and an activation function. Given a fixed set of input layer and output layer, we can build more complex neural network by adding more hidden layers.

artificial intelligence, machine learning, neural network, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

@machinelearnbotSep-8-2017, 23:46:33 GMT

Python Overtaking R?

I just read two articles that claim that Python is overtaking R for data science and machine learning. From user comments, I learned that R is still strong in certain tasks. I will survey what these tasks are. The first article by Vincent Granville from DSC uses proxy metrics (as opposed to asking the users). He uses statistics from Google Trends, Indeed job search terms, and Analytic Talent (DSC job database) to conclude that Python has overtaken R. One is led to ask if one group of users (say Python's) is a more active googler.

data mining, machine learning, python, (11 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

@machinelearnbotSep-8-2017, 13:35:04 GMT

Building Machine Learning Model is fun using Orange - Analytics Vidhya

In the growing market of Data Science, there are quite some details that people miss out on. These are tools or techniques that can make you a better performer in the field and also ease your efforts and help you focus on the analytics rather than the trivialities. Here, I will introduce you to another GUI based tool – Orange. This tool is great for beginners who wish to visualize patterns and understand their data without really knowing how to code. In my previous article, I presented you another GUI based tool KNIME, follow this link to learn about it further.

artificial intelligence, machine learning, widget, (15 more...)

Genre:

Workflow (0.62)
Instructional Material > Course Syllabus & Notes (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

Wagy, Mark D., Bongard, Josh C., Bagrow, James P., Hines, Paul D. H.

Crowdsourcing Predictors of Residential Electric Energy Usage

arXiv.org Machine LearningSep-8-2017

Crowdsourcing has been successfully applied in many domains including astronomy, cryptography and biology. In order to test its potential for useful application in a Smart Grid context, this paper investigates the extent to which a crowd can contribute predictive hypotheses to a model of residential electric energy consumption. In this experiment, the crowd generated hypotheses about factors that make one home different from another in terms of monthly energy usage. To implement this concept, we deployed a web-based system within which 627 residential electricity customers posed 632 questions that they thought predictive of energy usage. While this occurred, the same group provided 110,573 answers to these questions as they accumulated. Thus users both suggested the hypotheses that drive a predictive model and provided the data upon which the model is built. We used the resulting question and answer data to build a predictive model of monthly electric energy consumption, using random forest regression. Because of the sparse nature of the answer data, careful statistical work was needed to ensure that these models are valid. The results indicate that the crowd can generate useful hypotheses, despite the sparse nature of the dataset.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

1709.02739

Country:

Europe (0.67)
North America > United States > Vermont > Chittenden County > Burlington (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Energy > Power Industry > Utilities (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Reggiani, Claudio, Borgne, Yann-Aël Le, Bontempi, Gianluca

Feature selection in high-dimensional dataset using MapReduce

arXiv.org Machine LearningSep-7-2017

The exponential growth of data generation, measurements and collection in scientific and engineering disciplines leads to the availability of huge and high-dimensional datasets, in domains as varied as text mining, social network, astronomy or bioinformatics to name a few. The only viable path to the analysis of such datasets is to rely on data-intensive distributed computing frameworks [1]. MapReduce has in the last decade established itself as a reference programming model for distributed computing. The model is articulated around two main classes of functions, mappers and reducers, which greatly decrease the complexity of a distributed program while allowing to express a wide range of computing tasks. MapReduce was popularised by Google research in 2008 [2], and may be executed on parallel computing platforms ranging from specialised hardware units such as parallel field programmable gate arrays (FPGAs) and graphics processing units, to large clusters of commodity machine using for example the Hadoop or Spark frameworks [2]-[4]. In particular, the expressiveness of the MapReduce programming model has led to the design of advanced distributed data processing libraries for machine learning and data mining, such as Hadoop Mahout and Spark MLlib. Many of the standard supervised and unsupervised learning techniques (linear and logistic regression, naive Bayes, SVM, random forest, PCA) are now available from these libraries [5]-[7]. Little attention has however yet been given to feature selection algorithms (FSA), which form an essential component of machine learning and data mining workflows. Besides reducing a dataset size, FSA also generally allow to improve the performance of classification and regression models by selecting the most relevant features and reducing the noise in a dataset [8].

artificial intelligence, data mining, machine learning, (19 more...)

1709.02327

Country: Europe > Belgium (0.15)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.54)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

@machinelearnbotSep-6-2017, 14:05:07 GMT

Time Series Forecasting With Prophet

Prophet is an open source forecasting tool built by Facebook. It can be used for time series modeling and forecasting trends into the future. Prophet is interesting because it's both sophisticated and quite easy to use, so it's possible to generate very good forecasts with relatively little effort or domain knowledge in time series analysis. There are a few requirements you'll need to meet in order to use the library. It uses PyStan to do all of its inference, so PyStan has to be installed.

artificial intelligence, machine learning, prophet, (16 more...)

Country: North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)

Industry: Leisure & Entertainment > Sports > Football (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Ting, Daniel, Brochu, Eric

Optimal Sub-sampling with Influence Functions

arXiv.org Machine LearningSep-6-2017

Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the concept of an asymptotically linear estimator and the associated influence function leads to optimal sampling procedures for a wide class of popular models. Furthermore, for linear regression models which have well-studied procedures for non-uniform sub-sampling, we show our optimal influence function based method outperforms previous approaches. We empirically show the improved performance of our method on real datasets.

artificial intelligence, influence function, machine learning, (17 more...)

1709.01716

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

@machinelearnbotSep-5-2017, 20:40:15 GMT

Visualizing Cross-validation Code

Let's visualize to improve your prediction... Let us say, you are writing a nice and clean Machine Learning code (e.g. You code is OK, first you divided your dataset into two parts, "Training Set and Testing Set" as usual with the function like train_test_split and with some random factor. Your prediction could be slightly under or overfit, like the figures below. As the name of the suggests, cross-validation is the next fun thing after learning Linear Regression because it helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked?

artificial intelligence, dataset, machine learning, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)

#artificialintelligenceSep-5-2017, 10:35:15 GMT

Ford Motor Credit tests AI's ability to spot overlooked borrowers

Jim Moynes, vice president of risk management at Ford Motor Credit in Dearborn, Mich., first became interested in using machine learning to improve car loan underwriting several years ago. "We were watching what others were working on," he said. "We like to be innovative and try to stay up with what's going on." The company recently ran an experiment to see if machine learning could help its underwriters better understand the loan applications it receives. It was a champion vs. challenger test: Moynes' team took several years of loan data, removed all personally identifiable information from it, and gave it to ZestFinance, a provider of machine-learning-based online lending software, and its own modeling team, which creates logistic regression models to predict potential borrowers' creditworthiness. Each team ran the loan application data through its models and predicted the future performance of the loans.

artificial intelligence, ford motor credit, machine learning, (14 more...)

#artificialintelligence

Country: North America > United States > Michigan > Wayne County > Dearborn (0.25)

Industry: Banking & Finance > Loans (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.57)