AITopics

Country:

North America > United States > California > San Diego County > San Diego (0.25)
Asia > China (0.05)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Prabhu, Vinay Uday, Birhane, Abeba

Large image datasets: A pyrrhic win for computer vision?

arXiv.org Machine LearningJul-23-2020

In this paper we investigate problematic practices and consequences of large scale vision datasets. We examine broad issues such as the question of consent and justice as well as specific concerns such as the inclusion of verifiably pornographic images in datasets. Taking the ImageNet-ILSVRC-2012 dataset as an example, we perform a cross-sectional model-based quantitative census covering factors such as age, gender, NSFW content scoring, class-wise accuracy, human-cardinality-analysis, and the semanticity of the image class information in order to statistically investigate the extent and subtleties of ethical transgressions. We then use the census to help hand-curate a look-up-table of images in the ImageNet-ILSVRC-2012 dataset that fall into the categories of verifiably pornographic: shot in a non-consensual setting (up-skirt), beach voyeuristic, and exposed private parts. We survey the landscape of harm and threats both society broadly and individuals face due to uncritical and ill-considered dataset curation practices. We then propose possible courses of correction and critique the pros and cons of these. We have duly open-sourced all of the code and the census meta-datasets generated in this endeavor for the computer vision community to build on. By unveiling the severity of the threats, our hope is to motivate the constitution of mandatory Institutional Review Boards (IRB) for large scale dataset curation processes.

artificial intelligence, deep learning, machine learning, (17 more...)

2006.16923

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Ireland (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(9 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Media (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.66)

#artificialintelligenceJul-22-2020, 09:55:57 GMT

Adaptive Thresholds

Performance monitoring is typically based on comparing measurable values against a set of threshold values. In theory, the IT operations team determines what the thresholds for warnings and alerts should be and sets them. In practice, they usually have no idea what the appropriate values should be. For example, "response time" usually varies based on the time of day and day of week. At night, when the network load is negligible, response times would likely be minimal, too.

artificial intelligence, machine learning, threshold, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.42)

Nazmi, Shabnam, Yan, Xuyang, Homaifar, Abdollah, Doucette, Emily

Evolving Multi-label Classification Rules by Exploiting High-order Label Correlation

arXiv.org Machine LearningJul-22-2020

In multi-label classification tasks, each problem instance is associated with multiple classes simultaneously. In such settings, the correlation between labels contains valuable information that can be used to obtain more accurate classification models. The correlation between labels can be exploited at different levels such as capturing the pair-wise correlation or exploiting the higher-order correlations. Even though the high-order approach is more capable of modeling the correlation, it is computationally more demanding and has scalability issues. This paper aims at exploiting the high-order label correlation within subsets of labels using a supervised learning classifier system (UCS). For this purpose, the label powerset (LP) strategy is employed and a prediction aggregation within the set of the relevant labels to an unseen instance is utilized to increase the prediction capability of the LP method in the presence of unseen labelsets. Exact match ratio and Hamming loss measures are considered to evaluate the rule performance and the expected fitness value of a classifier is investigated for both metrics. Also, a computational complexity analysis is provided for the proposed algorithm. The experimental results of the proposed method are compared with other well-known LP-based methods on multiple benchmark datasets and confirm the competitive performance of this method.

artificial intelligence, inductive learning, machine learning, (15 more...)

2007.11609

Country:

Oceania > New Zealand (0.04)
North America > United States > North Carolina > Guilford County > Greensboro (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Alabama (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Kazhuparambil, Subramaniam, Kaushik, Abhishek

Cooking Is All About People: Comment Classification On Cookery Channels Using BERT and Classification Models (Malayalam-English Mix-Code)

arXiv.org Machine LearningJul-22-2020

The scope of a lucrative career promoted by Google through its video distribution platform YouTube has attracted a large number of users to become content creators. An important aspect of this line of work is the feedback received in the form of comments which show how well the content is being received by the audience. However, volume of comments coupled with spam and limited tools for comment classification makes it virtually impossible for a creator to go through each and every comment and gather constructive feedback. Automatic classification of comments is a challenge even for established classification models, since comments are often of variable lengths riddled with slang, symbols and abbreviations. This is a greater challenge where comments are multilingual as the messages are often rife with the respective vernacular. In this work, we have evaluated top-performing classification models for classifying comments which are a mix of different combinations of English and Malayalam (only English, only Malayalam and Mix of English and Malayalam). The statistical analysis of results indicates that Multinomial Naive Bayes, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest and Decision Trees offer similar level of accuracy in comment classification. Further, we have also evaluated 3 multilingual transformer based language models (BERT, DISTILBERT and XLM) and compared their performance to the traditional machine learning classification techniques. XLM was the top-performing BERT model with an accuracy of 67.31. Random Forest with Term Frequency Vectorizer was the best performing model out of all the traditional classification models with an accuracy of 63.59.

artificial intelligence, classification, machine learning, (16 more...)

2007.04249

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment (0.68)
Information Technology > Services (0.68)
Health & Medicine (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Dallakyan, Aramayis, Pourahmadi, Mohsen

Fused-Lasso Regularized Cholesky Factors of Large Nonstationary Covariance Matrices of Longitudinal Data

arXiv.org Machine LearningJul-21-2020

Smoothness of the subdiagonals of the Cholesky factor of large covariance matrices is closely related to the degrees of nonstationarity of autoregressive models for time series and longitudinal data. Heuristically, one expects for a nearly stationary covariance matrix the entries in each subdiagonal of the Cholesky factor of its inverse to be nearly the same in the sense that sum of absolute values of successive terms is small. Statistically such smoothness is achieved by regularizing each subdiagonal using fused-type lasso penalties. We rely on the standard Cholesky factor as the new parameters within a regularized normal likelihood setup which guarantees: (1) joint convexity of the likelihood function, (2) strict convexity of the likelihood function restricted to each subdiagonal even when $n

covariance matrix, matrix, subdiagonal, (15 more...)

2007.11168

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science (0.66)

Abraham, Louis, Becigneul, Gary, Coleman, Benjamin, Scholkopf, Bernhard, Shrivastava, Anshumali, Smola, Alexander

Bloom Origami Assays: Practical Group Testing

arXiv.org Machine LearningJul-21-2020

We study the problem usually referred to as group testing in the context of COVID-19. Given n samples collected from patients, how should we select and test mixtures of samples to maximize information and minimize the number of tests? Group testing is a well-studied problem with several appealing solutions, but recent biological studies impose practical constraints for COVID-19 that are incompatible with traditional methods. Furthermore, existing methods use unnecessarily restrictive solutions, which were devised for settings with more memory and compute constraints than the problem at hand. This results in poor utility. In the new setting, we obtain strong solutions for small values of n using evolutionary strategies. We then develop a new method combining Bloom filters with belief propagation to scale to larger values of n (more than 100) with good empirical results. We also present a more accurate decoding algorithm that is tailored for specific COVID-19 settings. This work demonstrates the practical gap between dedicated algorithms and well-known generic solutions. Our efforts results in a new and practical multiplex method yielding strong empirical performance without mixing more than a chosen number of patients into the same probe. Finally, we briefly discuss adaptive methods, casting them into the framework of adaptive sub-modularity.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2008.02641

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Cetin, Irem, Petersen, Steffen E., Napel, Sandy, Camara, Oscar, Ballester, Miguel Angel Gonzalez, Lekadir, Karim

A radiomics approach to analyze cardiac alterations in hypertension

arXiv.org Machine LearningJul-21-2020

Hypertension is a medical condition that is well-established as a risk factor for many major diseases. For example, it can cause alterations in the cardiac structure and function over time that can lead to heart related morbidity and mortality. However, at the subclinical stage, these changes are subtle and cannot be easily captured using conventional cardiovascular indices calculated from clinical cardiac imaging. In this paper, we describe a radiomics approach for identifying intermediate imaging phenotypes associated with hypertension. The method combines feature selection and machine learning techniques to identify the most subtle as well as complex structural and tissue changes in hypertensive subgroups as compared to healthy individuals. Validation based on a sample of asymptomatic hearts that include both hypertensive and non-hypertensive cases demonstrate that the proposed radiomics model is capable of detecting intensity and textural changes well beyond the capabilities of conventional imaging phenotypes, indicating its potential for improved understanding of the longitudinal effects of hypertension on cardiovascular health and disease.

artificial intelligence, hypertension, machine learning, (14 more...)

doi: 10.1109/ISBI.2019.8759440

2007.10717

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)

#artificialintelligenceJul-20-2020, 21:40:31 GMT

Common Practices -- Part 4

These are the lecture notes for FAU's YouTube Lecture "Deep Learning". This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. If you spot mistakes, please let us know!

artificial intelligence, classifier, machine learning, (16 more...)

Genre:

Research Report (0.70)
Instructional Material > Course Syllabus & Notes (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

#artificialintelligenceJul-20-2020, 16:36:24 GMT

20 Questions to excel in Machine Learning Interview

Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you're using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set. Variance is error due to too much complexity in the learning algorithm you're using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You'll be carrying too much noise from your training data for your model to be very useful for your test data.

artificial intelligence, machine learning, probability, (14 more...)

Industry: Health & Medicine > Therapeutic Area (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.74)