Accuracy
Big data is used to sentence criminals, can algorithms predict future risk?
In 2013, a man named Eric L. Loomis was sentenced for eluding police and driving a car without the owner's consent. When the judge weighed Loomis' sentence, he considered an array of evidence, including the results of an automated risk assessment tool called COMPAS. Loomis' COMPAS score indicated he was at a "high risk" of committing new crimes. Considering this prediction, the judge sentenced him to seven years. Loomis challenged his sentence, arguing it was unfair to use the data-driven score against him.
50 Questions to Test True Data Science Knowledge
Explain what regularization is and why it is useful. What are the benefits and drawbacks of specific methods, such as ridge regression and LASSO? Explain what a local optimum is and why it is important in a specific context, such as k-means clustering. What are specific ways for determining if you have a local optimum problem? What can be done to avoid local optima?
Ten Steps of EM Suffice for Mixtures of Two Gaussians
Daskalakis, Constantinos, Tzamos, Christos, Zampetakis, Manolis
The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, $\tilde{O}(d/\epsilon^2)$ samples suffice to compute the unknown vectors to within $\epsilon$ in Mahalanobis distance, where $d$ is the dimension. In particular, the error rate of the EM based estimator is $\tilde{O}\left(\sqrt{d \over n}\right)$ where $n$ is the number of samples, which is optimal up to logarithmic factors.
Embedding Feature Selection for Large-scale Hierarchical Classification
Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time and minimizes the memory requirements by compressing the total size of learned model weight vectors. Majority of the studies have also shown feature selection to be competent and successful in improving the classification accuracy by removing irrelevant features. In this work, we investigate various filter-based feature selection methods for dimensionality reduction to solve the large-scale HC problem. Our experimental evaluation on text and image datasets with varying distribution of features, classes and instances shows upto 3x order of speed-up on massive datasets and upto 45% less memory requirements for storing the weight vectors of learned model without any significant loss (improvement for some datasets) in the classification accuracy. Source Code: https://cs.gmu.edu/~mlbio/featureselection.
Graphical Nonconvex Optimization for Optimal Estimation in Gaussian Graphical Models
Sun, Qiang, Tan, Kean Ming, Liu, Han, Zhang, Tong
We consider the problem of learning high-dimensional Gaussian graphical models. The graphical lasso is one of the most popular methods for estimating Gaussian graphical models. However, it does not achieve the oracle rate of convergence. In this paper, we propose the graphical nonconvex optimization for optimal estimation in Gaussian graphical models, which is then approximated by a sequence of convex programs. Our proposal is computationally tractable and produces an estimator that achieves the oracle rate of convergence. The statistical error introduced by the sequential approximation using the convex programs are clearly demonstrated via a contraction property. The rate of convergence can be further improved using the notion of sparsity pattern. The proposed methodology is then extended to semiparametric graphical models. We show through numerical studies that the proposed estimator outperforms other popular methods for estimating Gaussian graphical models.
UFC 212 Betting Odds, PPV Info For Jose Aldo vs. Max Holloway And Entire Fight Card
The rightful owner of the title Conor McGregor never defended will finally be determined Saturday night. The UFC featherweight championship fight between Jose Aldo and Max Holloway highlights UFC 212 from Rio de Janeiro, Brazil. The main event is the only championship fight on the card, and the latest UFC 212 betting odds indicate that the current champion will hold onto his belt. The event costs $69.99 to order in HD on pay-per-view, and it's scheduled to start at 9 p.m. EDT. Fans can watch the fights with a live stream online by ordering UFC 212 at ufc.tv for $59.99.
Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data
Ayhan, Murat Seckin, Raghavan, Vijay, Initiative, Alzheimer's disease Neuroimaging
Alzheimer's disease is a major cause of dementia. Its diagnosis requires accurate biomarkers that are sensitive to disease stages. In this respect, we regard probabilistic classification as a method of designing a probabilistic biomarker for disease staging. Probabilistic biomarkers naturally support the interpretation of decisions and evaluation of uncertainty associated with them. In this paper, we obtain probabilistic biomarkers via Gaussian Processes. Gaussian Processes enable probabilistic kernel machines that offer flexible means to accomplish Multiple Kernel Learning. Exploiting this flexibility, we propose a new variation of Automatic Relevance Determination and tackle the challenges of high dimensionality through multiple kernels. Our research results demonstrate that the Gaussian Process models are competitive with or better than the well-known Support Vector Machine in terms of classification performance even in the cases of single kernel learning. Extending the basic scheme towards the Multiple Kernel Learning, we improve the efficacy of the Gaussian Process models and their interpretability in terms of the known anatomical correlates of the disease. For instance, the disease pathology starts in and around the hippocampus and entorhinal cortex. Through the use of Gaussian Processes and Multiple Kernel Learning, we have automatically and efficiently determined those portions of neuroimaging data. In addition to their interpretability, our Gaussian Process models are competitive with recent deep learning solutions under similar settings.
The machine learning paradox
The O'Reilly Artificial Intelligence conference in New York is June 26-29, 2017. To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images?
Blockchains for Artificial Intelligence โ The BigchainDB Blog
And, it was first published on Dataconomy on Dec 21, 2016; I'm reposting here for ease of access.] In recent years, AI (artificial intelligence) researchers have finally cracked problems that they've worked on for decades, from Go to human-level speech recognition. A key piece was the ability to gather and learn on mountains of data, which pulled error rates past the success line. In short, big data has transformed AI, to an almost unreasonable level. Blockchain technology could transform AI too, in its own particular ways. Some applications of blockchains to AI are mundane, like audit trails on AI models. Some appear almost unreasonable, like AI that can own itself -- AI DAOs. All of them are opportunities. This article will explore these applications. Before we discuss applications, let's first review what's different about blockchains compared to traditional big-data distributed databases like MongoDB. We can think of blockchains as "blue ocean" databases: they escape the "bloody red ocean" of sharks competing in an existing market, opting instead to be in a blue ocean of uncontested market space.
Using Artificial Intelligence to Reduce Customer Churn in Private Banking Ayasdi
It is no secret that private banking is in turmoil. While our view is that large banks possess a massive competitive advantage given the amount of data they create, trade in and see โ private banking is an area of concern. Technology driven start-ups have made real inroads with millennials and private wealth management growth has stalled at many banks. Given the fixed cost nature of the supporting infrastructure this can quickly eat into earnings. There are a number of areas that banks can focus on, from aligning costs more effectively to enhancing the customer experience โ but one chronic and elusive prize is churn.