AITopics

doi: 10.1016/j.future.2017.08.053

1608.00621

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Agrawal, Amritanshu, Fu, Wei, Menzies, Tim

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

arXiv.org Artificial IntelligenceNov-7-2017

Context: Topic modeling finds human-readable structures in unstructured textual data. A widely used topic modeler is Latent Dirichlet allocation. When run on different datasets, LDA suffers from "order effects" i.e. different topics are generated if the order of training data is shuffled. Such order effects introduce a systematic error for any study. This error can relate to misleading results;specifically, inaccurate topic descriptions and a reduction in the efficacy of text mining classification results. Objective: To provide a method in which distributions generated by LDA are more stable and can be used for further analysis. Method: We use LDADE, a search-based software engineering tool that tunes LDA's parameters using DE (Differential Evolution). LDADE is evaluated on data from a programmer information exchange site (Stackoverflow), title and abstract text of thousands ofSoftware Engineering (SE) papers, and software defect reports from NASA. Results were collected across different implementations of LDA (Python+Scikit-Learn, Scala+Spark); across different platforms (Linux, Macintosh) and for different kinds of LDAs (VEM,or using Gibbs sampling). Results were scored via topic stability and text mining classification accuracy. Results: In all treatments: (i) standard LDA exhibits very large topic instability; (ii) LDADE's tunings dramatically reduce cluster instability; (iii) LDADE also leads to improved performances for supervised as well as unsupervised learning. Conclusion: Due to topic instability, using standard LDA with its "off-the-shelf" settings should now be depreciated. Also, in future, we should require SE papers that use LDA to test and (if needed) mitigate LDA topic instability. Finally, LDADE is a candidate technology for effectively and efficiently reducing that instability.

data mining, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1608.08176

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

arXiv.org Machine LearningNov-6-2017

End-to-End Abnormality Detection in Medical Imaging

Wu, Dufan, Kim, Kyungsang, Dong, Bin, Li, Quanzheng

Nearly all of the deep learning based image analysis methods work on reconstructed images, which are obtained from original acquisitions via solving inverse problems. The reconstruction algorithms are designed for human observers, but not necessarily optimized for DNNs. It is desirable to train the DNNs directly from the original data which lie in a different domain with the images. In this work, we proposed an end-to-end DNN for abnormality detection in medical imaging. A DNN was built as the unrolled version of iterative reconstruction algorithms to map the acquisitions to images, and followed by a 3D convolutional neural network (CNN) to detect the abnormality in the reconstructed images. The two networks were trained jointly in order to optimize the entire DNN for the detection task from the original acquisitions. The DNN was implemented for lung nodule detection in low-dose chest CT. The proposed end-to-end DNN demonstrated better sensitivity and accuracy for the task compared to a two-step approach, in which the reconstruction and detection DNNs were trained separately. A significant reduction of false positive rate on suspicious lesions were observed, which is crucial for the known over-diagnosis in low-dose lung CT imaging. The images reconstructed by the proposed end-to-end network also presented enhanced details in the region of interest.

artificial intelligence, machine learning, neural network, (18 more...)

1711.02074

Country:

Asia (0.29)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-6-2017

Trimmed Density Ratio Estimation

Liu, Song, Takeda, Akiko, Suzuki, Taiji, Fukumizu, Kenji

Density ratio estimation (DRE) [18, 11, 27] is an important tool in various branches of machine learning and statistics. Due to its ability of directly modelling the differences between two probability density functions, DRE finds its applications in change detection [13, 6], twosample test [32] and outlier detection [1, 26]. In recent years, a sampling framework called Generative Adversarial Network (GAN) (see e.g., [9, 19]) uses the density ratio function to compare artificial samples from a generative distribution and real samples from an unknown distribution. DRE has also been widely discussed in statistical literatures for adjusting nonparametric density estimation [5], stabilizing the estimation of heavy tailed distribution [7] and fitting multiple distributions at once [8]. However, as a density ratio function can grow unbounded, DRE can suffer from robustness and stability issues: a few corrupted points may completely mislead the estimator (see Figure 2 in Section 6 for example).

artificial intelligence, data mining, machine learning, (16 more...)

1703.03216

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.48)
(4 more...)

#artificialintelligenceNov-5-2017, 16:20:12 GMT

Artificial intelligence helps detect ovarian cancer early and accurately

Ovarian cancer is difficult to diagnose, particularly in its early stages, when survival rates are much higher. Because there is no consistently reliable screening test to detect ovarian cancer, most women are diagnosed with the disease when it's in an advanced stage. However, researchers at Brigham and Women's Hospital and Dana-Farber Cancer Institute have developed a non-invasive diagnostic test using artificial intelligence for the accurate detection of true cases of early-stage disease. Results of their study were published online this week in the journal eLife. By combining next generation sequencing with artificial intelligence, researchers have created a novel blood test based on serum microRNAs--small, non-coding pieces of genetic material that help control where and when genes are activated--for the early diagnosis of ovarian cancer.

cancer, detect ovarian cancer, ovarian cancer, (12 more...)

Country: Europe > Poland (0.05)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Ovarian Cancer (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)

#artificialintelligenceNov-5-2017, 11:05:26 GMT

Testing Machine Learning Algorithms with K-Fold Cross Validation - Talend

In an earlier post on Applying Machine Learning to IoT Sensors, I discussed the process for classifying sensor data with a machine learning algorithm. In this post, I'll give a background on choosing an algorithm, then using a validation technique. For the technique, I'll show how to apply it, and how it can be built using the Talend Studio without hand coding. Given a prediction scenario involving a machine learning algorithm, the first question to ask is what is the appropriate machine learning algorithm? Taking the example of predicting a user's activity based on mobile phone accelerometer data, we must be able to classify a category for the data (resting, walking, or running).

artificial intelligence, machine learning, training dataset, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.45)

#artificialintelligenceNov-4-2017, 23:15:23 GMT

How to assess quality and correctness of classification models? Part 4 - ROC Curve

We test the classifier for different alpha thresholds. Recall that alpha is the threshold of the estimated probability, above which an observation is assigned to one category (positive class) and below to the other category (negative class).

artificial intelligence, machine learning, quality and correctness, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Pleiss, Geoff, Raghavan, Manish, Wu, Felix, Kleinberg, Jon, Weinberger, Kilian Q.

On Fairness and Calibration

arXiv.org Machine LearningNov-3-2017

The machine learning community has become increasingly concerned with the potential for bias and discrimination in predictive models. This has motivated a growing line of work on what it means for a classification procedure to be "fair." In this paper, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates. We show that calibration is compatible only with a single error constraint (i.e. equal false-negatives rates across groups), and show that any algorithm that satisfies this relaxation is no better than randomizing a percentage of predictions for an existing classifier. These unsettling findings, which extend and generalize existing results, are empirically confirmed on several datasets.

artificial intelligence, classifier, machine learning, (16 more...)

1709.02012

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Law > Civil Rights & Constitutional Law (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceNov-2-2017, 00:20:04 GMT

New blood test developed to diagnose ovarian cancer

Investigators from Brigham and Women's Hospital and Dana-Farber Cancer Institute are leveraging the power of artificial intelligence to develop a new technique to detect ovarian cancer early and accurately. The team has identified a network of circulating microRNAs - small, non-coding pieces of genetic material - that are associated with risk of ovarian cancer and can be detected from a blood sample. Their findings are published online in eLife. Most women are diagnosed with ovarian cancer when the disease is at an advanced stage, at which point only about a quarter of patients will survive for at least five years. But for women whose cancer is serendipitously picked up at an early stage, survival rates are much higher.

artificial intelligence, machine learning, ovarian cancer, (17 more...)

Country:

North America > United States (0.30)
Europe > Poland > Łódź Province > Łódź (0.05)

Genre: Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Ovarian Cancer (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.52)

Bouchard, Kristofer E., Bujan, Alejandro F., Roosta-Khorasani, Farbod, Ubaru, Shashanka, Prabhat, null, Snijders, Antoine M., Mao, Jian-Hua, Chang, Edward F., Mahoney, Michael W., Bhattacharyya, Sharmodeep

Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

arXiv.org Machine LearningNov-2-2017

The increasing size and complexity of scientific data could dramatically enhance discovery and prediction for basic scientific applications. Realizing this potential, however, requires novel statistical analysis methods that are both interpretable and predictive. We introduce Union of Intersections (UoI), a flexible, modular, and scalable framework for enhanced model selection and estimation. Methods based on UoI perform model selection and model estimation through intersection and union operations, respectively. We show that UoI-based methods achieve low-variance and nearly unbiased estimation of a small number of interpretable features, while maintaining high-quality prediction accuracy. We perform extensive numerical investigation to evaluate a UoI algorithm ($UoI_{Lasso}$) on synthetic and real data. In doing so, we demonstrate the extraction of interpretable functional networks from human electrophysiology recordings as well as accurate prediction of phenotypes from genotype-phenotype data with reduced features. We also show (with the $UoI_{L1Logistic}$ and $UoI_{CUR}$ variants of the basic framework) improved prediction parsimony for classification and matrix factorization on several benchmark biomedical data sets. These results suggest that methods based on the UoI framework could improve interpretation and prediction in data-driven discovery across scientific fields.

algorithm, artificial intelligence, machine learning, (15 more...)

1705.07585

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)