AITopics

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceJan-6-2020, 11:40:50 GMT

5 Great New Features in Latest Scikit-learn Release - KDnuggets

The latest release of Python's workhorse machine learning library includes a number of new features and bug fixes. You can find a full accounting of these changes from the official Scikit-learn 0.22 release highlights, and can read find the change log here. Here are 5 new features in the latest release of Scikit-learn which are worth your attention. A new plotting API is available, working without requiring any recomputation. Supported plots include, among others, partial dependence plots, confusion matrix, and ROC curves.

estimator, latest scikit-learn release, permutation importance, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.58)

Merk, Miryam S., Otto, Philipp

Estimation of the spatial weighting matrix for regular lattice data -- An adaptive lasso approach with cross-sectional resampling

arXiv.org Machine LearningJan-6-2020

Spatial econometric research typically relies on the assumption that the spatial dependence structure is known in advance and is represented by a deterministic spatial weights matrix. Contrary to classical approaches, we investigate the estimation of sparse spatial dependence structures for regular lattice data. In particular, an adaptive least absolute shrinkage and selection operator (lasso) is used to select and estimate the individual connections of the spatial weights matrix. To recover the spatial dependence structure, we propose cross-sectional resampling, assuming that the random process is exchangeable. The estimation procedure is based on a two-step approach to circumvent simultaneity issues that typically arise from endogenous spatial autoregressive dependencies. The two-step adaptive lasso approach with cross-sectional resampling is verified using Monte Carlo simulations. Eventually, we apply the procedure to model nitrogen dioxide ($\mathrm{NO_2}$) concentrations and show that estimating the spatial dependence structure contrary to using prespecified weights matrices improves the prediction accuracy considerably.

matrix, neighbor, spatial dependence structure, (11 more...)

2001.01532

Country:

North America > Mexico (0.04)
North America > United States > Connecticut (0.04)
Europe > Germany > Lower Saxony > Hanover (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

#artificialintelligenceJan-5-2020, 00:37:34 GMT

Episode 2: A Cross Validation Framework

Sign in to report inappropriate content. This is the second episode of my video series on applied machine learning. In this episode, we talk about the need for cross-validation and different types of cross-validation. We also see how one can implement a re-usable cross validation framework. In the end we are left with a cross validation framework that can be applied to almost all kinds of machine learning problem.

cross validation framework, episode 2

Industry:

Media > Television (0.40)
Leisure & Entertainment (0.40)
Education > Focused Education > Special Education (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Jalilifard, Amir, Caridá, Vinicius, Mansano, Alex, Cristo, Rogers

Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

arXiv.org Machine LearningJan-5-2020

Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this paper we propose STF-IDF, a novel semantic method based on TF-IDF, for scoring word importance of informal documents in a corpus. A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings. Then, the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts. After testing the proposed method with 200 randomly chosen documents, our method managed to decrease the TF-IDF mean error rate by a factor of 50% and reaching the mean error of 13.7%, as opposed to 27.2% of the original TF-IDF.

error rate, iteration, tf-idf, (14 more...)

2001.09896

Country:

South America > Brazil > São Paulo (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.38)

Kuhnert, Nadine, Pflüger, Lea, Maier, Andreas

Prediction of MRI Hardware Failures based on Image Features using Ensemble Learning

arXiv.org Machine LearningJan-5-2020

In order to ensure trouble-free operation, prediction of hardware failures is essential. This applies especially to medical systems. Our goal is to determine hardware which needs to be exchanged before failing. In this work, we focus on predicting failures of 20-channel Head/Neck coils using image-related measurements. Thus, we aim to solve a classification problem with two classes, normal and broken coil. To solve this problem, we use data of two different levels. One level refers to one-dimensional features per individual coil channel on which we found a fully connected neural network to perform best. The other data level uses matrices which represent the overall coil condition and feeds a different neural network. We stack the predictions of those two networks and train a Random Forest classifier as the ensemble learner. Thus, combining insights of both trained models improves the prediction results and allows us to determine the coil's condition with an F-score of 94.14% and an accuracy of 99.09%.

coil, matrix, prediction, (13 more...)

2001.01213

Country:

North America > United States (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Diagnostic Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Farnadi, Golnoosh, Getoor, Lise, Moens, Marie-Francine, De Cock, Martine

User Profiling Using Hinge-loss Markov Random Fields

arXiv.org Machine LearningJan-5-2020

A variety of approaches have been proposed to automatically infer the profiles of users from their digital footprint in social media. Most of the proposed approaches focus on mining a single type of information, while ignoring other sources of available user-generated content (UGC). In this paper, we propose a mechanism to infer a variety of user characteristics, such as, age, gender and personality traits, which can then be compiled into a user profile. To this end, we model social media users by incorporating and reasoning over multiple sources of UGC as well as social relations. Our model is based on a statistical relational learning framework using Hinge-loss Markov Random Fields (HL-MRFs), a class of probabilistic graphical models that can be defined using a set of first-order logical rules. We validate our approach on data from Facebook with more than 5k users and almost 725k relations. We show how HL-MRFs can be used to develop a generic and extensible user profiling framework by leveraging textual, visual, and relational content in the form of status updates, profile pictures and Facebook page likes. Our experimental results demonstrate that our proposed model successfully incorporates multiple sources of information and outperforms competing methods that use only one source of information or an ensemble method across the different sources for modeling of users in social media.

baseline psl-prior 0, characteristic, information, (13 more...)

2001.01177

Country:

North America > United States > Washington > Pierce County > Tacoma (0.04)
North America > United States > New Jersey (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
(2 more...)

arXiv.org Artificial IntelligenceJan-4-2020

Can x2vec Save Lives? Integrating Graph and Language Embeddings for Automatic Mental Health Classification

Ruch, Alexander

Graph and language embedding models are becoming commonplace in large scale analyses given their ability to represent complex sparse data densely in low-dimensional space. Integrating these models' complementary relational and communicative data may be especially helpful if predicting rare events or classifying members of hidden populations - tasks requiring huge and sparse datasets for generalizable analyses. For example, due to social stigma and comorbidities, mental health support groups often form in amorphous online groups. Predicting suicidality among individuals in these settings using standard network analyses is prohibitive due to resource limits (e.g., memory), and adding auxiliary data like text to such models exacerbates complexity- and sparsity-related issues. Here, I show how merging graph and language embedding models (metapath2vec and doc2vec) avoids these limits and extracts unsupervised clustering data without domain expertise or feature engineering. Graph and language distances to a suicide support group have little correlation (\r{ho} < 0.23), implying the two models are not embedding redundant information. When used separately to predict suicidality among individuals, graph and language data generate relatively accurate results (69% and 76%, respectively); however, when integrated, both data produce highly accurate predictions (90%, with 10% false-positives and 12% false-negatives). Visualizing graph embeddings annotated with predictions of potentially suicidal individuals shows the integrated model could classify such individuals even if they are positioned far from the support group. These results extend research on the importance of simultaneously analyzing behavior and language in massive networks and efforts to integrate embedding models for different kinds of data when predicting and classifying, particularly when they involve rare events.

arXiv.org Artificial Intelligence

2001.01126

Country:

North America > United States > New York > Erie County > Buffalo (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

#artificialintelligenceJan-3-2020, 07:51:13 GMT

International evaluation of an artificial intelligence system to identify breast cancer in screening mammography

Screening mammography aims to identify breast cancer before symptoms appear, enabling earlier therapy for more treatable disease. Despite the existence of screening programs worldwide, interpretation of these images suffers from suboptimal rates of false positives and false negatives. Here we present an AI system capable of surpassing a single expert reader in breast cancer prediction performance. Using two large data sets representative of clinical practice in the United States (US) and the United Kingdom (UK), we show an absolute reduction of 5.7%/1.2% We show evidence of the system's ability to generalize from the UK sites to the US site.

artificial intelligence system, identify breast cancer, international evaluation, (5 more...)

Country:

North America > United States (0.29)
Europe > United Kingdom (0.29)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.78)

#artificialintelligenceJan-3-2020, 01:18:26 GMT

How Google AI Is Improving Mammograms

While there has been controversy over when and how often women should be screened for breast cancer using mammograms, studies consistently show that screening can lead to earlier detection of the disease, when it's more treatable. So improving how effectively mammograms can detect abnormal growths that could be cancerous is a priority in the field. AI could play a role in accomplishing that--computer-based machine learning might help doctors to read mammograms more accurately. In a study published Jan. 1 in Nature, researchers from Google Health, and from universities in the U.S. and U.K., report on an AI model that reads mammograms with fewer false positives and false negatives than human experts. The algorithm, based on mammograms taken from more than 76,000 women in the U.K. and more than 15,000 in the U.S., reduced false positive rates by nearly 6% in the U.S., where women are screened every one to two years, and by 1.2% in the U.K., where women are screened every three years.

algorithm, false negative, mammogram, (12 more...)

Country:

North America > United States (0.69)
Europe > United Kingdom (0.48)

Genre: Research Report > New Finding (0.36)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)