AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

User Profiling Using Hinge-loss Markov Random Fields

Farnadi, Golnoosh, Getoor, Lise, Moens, Marie-Francine, De Cock, Martine

arXiv.org Machine LearningJan-5-2020

A variety of approaches have been proposed to automatically infer the profiles of users from their digital footprint in social media. Most of the proposed approaches focus on mining a single type of information, while ignoring other sources of available user-generated content (UGC). In this paper, we propose a mechanism to infer a variety of user characteristics, such as, age, gender and personality traits, which can then be compiled into a user profile. To this end, we model social media users by incorporating and reasoning over multiple sources of UGC as well as social relations. Our model is based on a statistical relational learning framework using Hinge-loss Markov Random Fields (HL-MRFs), a class of probabilistic graphical models that can be defined using a set of first-order logical rules. We validate our approach on data from Facebook with more than 5k users and almost 725k relations. We show how HL-MRFs can be used to develop a generic and extensible user profiling framework by leveraging textual, visual, and relational content in the form of status updates, profile pictures and Facebook page likes. Our experimental results demonstrate that our proposed model successfully incorporates multiple sources of information and outperforms competing methods that use only one source of information or an ensemble method across the different sources for modeling of users in social media.

baseline psl-prior 0, characteristic, information, (13 more...)

arXiv.org Machine Learning

2001.01177

Country:

North America > United States > Washington > Pierce County > Tacoma (0.04)
North America > United States > New Jersey (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
(2 more...)

Add feedback

Can x2vec Save Lives? Integrating Graph and Language Embeddings for Automatic Mental Health Classification

Ruch, Alexander

arXiv.org Artificial IntelligenceJan-4-2020

Graph and language embedding models are becoming commonplace in large scale analyses given their ability to represent complex sparse data densely in low-dimensional space. Integrating these models' complementary relational and communicative data may be especially helpful if predicting rare events or classifying members of hidden populations - tasks requiring huge and sparse datasets for generalizable analyses. For example, due to social stigma and comorbidities, mental health support groups often form in amorphous online groups. Predicting suicidality among individuals in these settings using standard network analyses is prohibitive due to resource limits (e.g., memory), and adding auxiliary data like text to such models exacerbates complexity- and sparsity-related issues. Here, I show how merging graph and language embedding models (metapath2vec and doc2vec) avoids these limits and extracts unsupervised clustering data without domain expertise or feature engineering. Graph and language distances to a suicide support group have little correlation (\r{ho} < 0.23), implying the two models are not embedding redundant information. When used separately to predict suicidality among individuals, graph and language data generate relatively accurate results (69% and 76%, respectively); however, when integrated, both data produce highly accurate predictions (90%, with 10% false-positives and 12% false-negatives). Visualizing graph embeddings annotated with predictions of potentially suicidal individuals shows the integrated model could classify such individuals even if they are positioned far from the support group. These results extend research on the importance of simultaneously analyzing behavior and language in massive networks and efforts to integrate embedding models for different kinds of data when predicting and classifying, particularly when they involve rare events.

arXiv.org Artificial Intelligence

2001.01126

Country:

North America > United States > New York > Erie County > Buffalo (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

International evaluation of an artificial intelligence system to identify breast cancer in screening mammography

#artificialintelligenceJan-3-2020, 07:51:13 GMT

Screening mammography aims to identify breast cancer before symptoms appear, enabling earlier therapy for more treatable disease. Despite the existence of screening programs worldwide, interpretation of these images suffers from suboptimal rates of false positives and false negatives. Here we present an AI system capable of surpassing a single expert reader in breast cancer prediction performance. Using two large data sets representative of clinical practice in the United States (US) and the United Kingdom (UK), we show an absolute reduction of 5.7%/1.2% We show evidence of the system's ability to generalize from the UK sites to the US site.

artificial intelligence system, identify breast cancer, international evaluation, (5 more...)

#artificialintelligence

Country:

North America > United States (0.29)
Europe > United Kingdom (0.29)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.78)

Add feedback

How Google AI Is Improving Mammograms

#artificialintelligenceJan-3-2020, 01:18:26 GMT

While there has been controversy over when and how often women should be screened for breast cancer using mammograms, studies consistently show that screening can lead to earlier detection of the disease, when it's more treatable. So improving how effectively mammograms can detect abnormal growths that could be cancerous is a priority in the field. AI could play a role in accomplishing that--computer-based machine learning might help doctors to read mammograms more accurately. In a study published Jan. 1 in Nature, researchers from Google Health, and from universities in the U.S. and U.K., report on an AI model that reads mammograms with fewer false positives and false negatives than human experts. The algorithm, based on mammograms taken from more than 76,000 women in the U.K. and more than 15,000 in the U.S., reduced false positive rates by nearly 6% in the U.S., where women are screened every one to two years, and by 1.2% in the U.K., where women are screened every three years.

algorithm, false negative, mammogram, (12 more...)

#artificialintelligence

Country:

North America > United States (0.69)
Europe > United Kingdom (0.48)

Genre: Research Report > New Finding (0.36)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis

Raja, Haroon, Bajwa, Waheed U.

arXiv.org Machine LearningJan-3-2020

This paper considers the problem of estimating the principal eigenvector of a covariance matrix from independent and identically distributed data samples in streaming settings. The streaming rate of data in many contemporary applications can be high enough that a single processor cannot finish an iteration of existing methods for eigenvector estimation before a new sample arrives. This paper formulates and analyzes a distributed variant of the classical Krasulina's method (D-Krasulina) that can keep up with the high streaming rate of data by distributing the computational load across multiple processing nodes. The analysis shows that---under appropriate conditions---D-Krasulina converges to the principal eigenvector in an order-wise optimal manner; i.e., after receiving $M$ samples across all nodes, its estimation error can be $O(1/M)$. In order to reduce the network communication overhead, the paper also develops and analyzes a mini-batch extension of D-Krasulina, which is termed DM-Krasulina. The analysis of DM-Krasulina shows that it can also achieve order-optimal estimation error rates under appropriate conditions, even when some samples have to be discarded within the network due to communication latency. Finally, experiments are performed over synthetic and real-world data to validate the convergence behaviors of D-Krasulina and DM-Krasulina in high-rate streaming settings.

d-krasulina, dm-krasulina, iteration, (11 more...)

arXiv.org Machine Learning

2001.01017

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Maryland > Baltimore (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.33)

Add feedback

The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling

Ho, Yaoshiang, Wookey, Samuel

arXiv.org Artificial IntelligenceJan-3-2020

In this paper, we propose a new metric to measure goodness-of-fit for classifiers, the Real World Cost function. This metric factors in information about a real world problem, such as financial impact, that other measures like accuracy or F1 do not. This metric is also more directly interpretable for users. To optimize for this metric, we introduce the Real-World- Weight Crossentropy loss function, in both binary and single-label classification variants. Both variants allow direct input of real world costs as weights. For single-label, multicategory classification, our loss function also allows direct penalization of probabilistic false positives, weighted by label, during the training of a machine learning model. We compare the design of our loss function to the binary crossentropy and categorical crossentropy functions, as well as their weighted variants, to discuss the potential for improvement in handling a variety of known shortcomings of machine learning, ranging from imbalanced classes to medical diagnostic error to reinforcement of social bias. We create scenarios that emulate those issues using the MNIST data set and demonstrate empirical results of our new loss function. Finally, we sketch a proof of this function based on Maximum Likelihood Estimation and discuss future directions.

false negative, false positive, loss function, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2019.2962617

2001.0057

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.88)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.57)

Add feedback

How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification

#artificialintelligenceJan-2-2020, 21:29:41 GMT

Classification accuracy is the total number of correct predictions divided by the total number of predictions made for a dataset. As a performance measure, accuracy is inappropriate for imbalanced classification problems. The main reason is that the overwhelming number of examples from the majority class (or classes) will overwhelm the number of examples in the minority class, meaning that even unskillful models can achieve accuracy scores of 90 percent, or 99 percent, depending on how severe the class imbalance happens to be. An alternative to using classification accuracy is to use precision and recall metrics. In this tutorial, you will discover how to calculate and develop an intuition for precision and recall for imbalanced classification.

classification problem, minority class, precision, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Ensemble emotion recognizing with multiple modal physiological signals

Zhang, Jing, Zhang, Yong, Zhan, Suhua, Cheng, Cheng

arXiv.org Machine LearningJan-1-2020

Physiological signals that provide the objective repression of human affective states are attracted increasing attention in the emotion recognition field. However, the single signal is difficult to obtain completely and accurately description for emotion. Multiple physiological signals fusing models, building the uniform classification model by means of consistent and complementary information from different emotions to improve recognition performance. Original fusing models usually choose the particular classification method to recognition, which is ignoring different distribution of multiple signals. Aiming above problems, in this work, we propose an emotion classification model through multiple modal physiological signals for different emotions. Features are extracted from EEG, EMG, EOG signals for characterizing emotional state on valence and arousal levels. For characterization, four bands filtering theta, beta, alpha, gamma for signal preprocessing are adopted and three Hjorth parameters are computing as features. To improve classification performance, an ensemble classifier is built. Experiments are conducted on the benchmark DEAP datasets. For the two-class task, the best result on arousal is 94.42\%, the best result on valence is 94.02\%, respectively. For the four-class task, the highest average classification accuracy is 90.74, and it shows good stability. The influence of different peripheral physiological signals for results is also analyzed in this paper.

classifier, different classifier, physiological signal, (15 more...)

arXiv.org Machine Learning

2001.00191

Country:

Asia > China > Liaoning Province > Dalian (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > India > Maharashtra > Pune (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Direction Concentration Learning: Enhancing Congruency in Machine Learning

Luo, Yan, Wong, Yongkang, Kankanhalli, Mohan S., Zhao, Qi

arXiv.org Machine LearningJan-1-2020

One of the well-known challenges in computer vision tasks is the visual diversity of images, which could result in an agreement or disagreement between the learned knowledge and the visual content exhibited by the current observation. In this work, we first define such an agreement in a concepts learning process as congruency. Formally, given a particular task and sufficiently large dataset, the congruency issue occurs in the learning process whereby the task-specific semantics in the training data are highly varying. We propose a Direction Concentration Learning (DCL) method to improve congruency in the learning process, where enhancing congruency influences the convergence path to be less circuitous. The experimental results show that the proposed DCL method generalizes to state-of-the-art models and optimizers, as well as improves the performances of saliency prediction task, continual learning task, and classification task. Moreover, it helps mitigate the catastrophic forgetting problem in the continual learning task. The code is publicly available at https://github.com/luoyan407/congruency.

congruency, dcl method, gradient, (11 more...)

arXiv.org Machine Learning

doi: 10.1109/TPAMI.2019.2963387

1912.08136

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > Queensland (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Leveraging Semi-Supervised Learning for Fairness using Neural Networks

Noroozi, Vahid, Bahaadini, Sara, Sheikhi, Samira, Mojab, Nooshin, Yu, Philip S.

arXiv.org Artificial IntelligenceDec-31-2019

--There has been a growing concern about the fairness of decision-making systems based on machine learning. The shortage of labeled data has been always a challenging problem facing machine learning based systems. In such scenarios, semi-supervised learning has shown to be an effective way of exploiting unlabeled data to improve upon the performance of model. Notably, unlabeled data do not contain label information which itself can be a significant source of bias in training machine learning systems. This inspired us to tackle the challenge of fairness by formulating the problem in a semi-supervised framework. In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data to not just improve the performance but also improve the fairness of the decision-making process. The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.

artificial intelligence, fairness, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1912.1323

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > Illinois > Cook County > Evanston (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback