Performance Analysis
Choice of K in K-fold Cross Validation for Classification in Financial Market
Cross Validation is often used as a tool for model selection across classifiers. As discussed in detail in the following paper https://ssrn.com/abstract However, one question often pops up: how to choose K in K-fold cross validation. The rule-of-thumb choice often suggested by literature based on non-financial market is K 10. The question is: is it true for Financial Market?
Graphical Nonconvex Optimization for Optimal Estimation in Gaussian Graphical Models
Sun, Qiang, Tan, Kean Ming, Liu, Han, Zhang, Tong
We consider the problem of learning high-dimensional Gaussian graphical models. The graphical lasso is one of the most popular methods for estimating Gaussian graphical models. However, it does not achieve the oracle rate of convergence. In this paper, we propose the graphical nonconvex optimization for optimal estimation in Gaussian graphical models, which is then approximated by a sequence of convex programs. Our proposal is computationally tractable and produces an estimator that achieves the oracle rate of convergence. The statistical error introduced by the sequential approximation using the convex programs are clearly demonstrated via a contraction property. The rate of convergence can be further improved using the notion of sparsity pattern. The proposed methodology is then extended to semiparametric graphical models. We show through numerical studies that the proposed estimator outperforms other popular methods for estimating Gaussian graphical models.
UFC 212 Betting Odds, PPV Info For Jose Aldo vs. Max Holloway And Entire Fight Card
The rightful owner of the title Conor McGregor never defended will finally be determined Saturday night. The UFC featherweight championship fight between Jose Aldo and Max Holloway highlights UFC 212 from Rio de Janeiro, Brazil. The main event is the only championship fight on the card, and the latest UFC 212 betting odds indicate that the current champion will hold onto his belt. The event costs $69.99 to order in HD on pay-per-view, and it's scheduled to start at 9 p.m. EDT. Fans can watch the fights with a live stream online by ordering UFC 212 at ufc.tv for $59.99.
Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data
Ayhan, Murat Seckin, Raghavan, Vijay, Initiative, Alzheimer's disease Neuroimaging
Alzheimer's disease is a major cause of dementia. Its diagnosis requires accurate biomarkers that are sensitive to disease stages. In this respect, we regard probabilistic classification as a method of designing a probabilistic biomarker for disease staging. Probabilistic biomarkers naturally support the interpretation of decisions and evaluation of uncertainty associated with them. In this paper, we obtain probabilistic biomarkers via Gaussian Processes. Gaussian Processes enable probabilistic kernel machines that offer flexible means to accomplish Multiple Kernel Learning. Exploiting this flexibility, we propose a new variation of Automatic Relevance Determination and tackle the challenges of high dimensionality through multiple kernels. Our research results demonstrate that the Gaussian Process models are competitive with or better than the well-known Support Vector Machine in terms of classification performance even in the cases of single kernel learning. Extending the basic scheme towards the Multiple Kernel Learning, we improve the efficacy of the Gaussian Process models and their interpretability in terms of the known anatomical correlates of the disease. For instance, the disease pathology starts in and around the hippocampus and entorhinal cortex. Through the use of Gaussian Processes and Multiple Kernel Learning, we have automatically and efficiently determined those portions of neuroimaging data. In addition to their interpretability, our Gaussian Process models are competitive with recent deep learning solutions under similar settings.
The machine learning paradox
The O'Reilly Artificial Intelligence conference in New York is June 26-29, 2017. To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images?
Cross-validation in R: a do-it-yourself and a black box approach
In my previous post, we saw that R-squared can lead to a misleading interpretation of the quality of our regression fit, in terms of prediction power. One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfittting. In this type of validation, one case in our data set is used as the test set, while the remaining cases are used as the training set. We iterate through the data set, until all cases have served as the test set.
Blockchains for Artificial Intelligence – The BigchainDB Blog
And, it was first published on Dataconomy on Dec 21, 2016; I'm reposting here for ease of access.] In recent years, AI (artificial intelligence) researchers have finally cracked problems that they've worked on for decades, from Go to human-level speech recognition. A key piece was the ability to gather and learn on mountains of data, which pulled error rates past the success line. In short, big data has transformed AI, to an almost unreasonable level. Blockchain technology could transform AI too, in its own particular ways. Some applications of blockchains to AI are mundane, like audit trails on AI models. Some appear almost unreasonable, like AI that can own itself -- AI DAOs. All of them are opportunities. This article will explore these applications. Before we discuss applications, let's first review what's different about blockchains compared to traditional big-data distributed databases like MongoDB. We can think of blockchains as "blue ocean" databases: they escape the "bloody red ocean" of sharks competing in an existing market, opting instead to be in a blue ocean of uncontested market space.
Using Artificial Intelligence to Reduce Customer Churn in Private Banking Ayasdi
It is no secret that private banking is in turmoil. While our view is that large banks possess a massive competitive advantage given the amount of data they create, trade in and see – private banking is an area of concern. Technology driven start-ups have made real inroads with millennials and private wealth management growth has stalled at many banks. Given the fixed cost nature of the supporting infrastructure this can quickly eat into earnings. There are a number of areas that banks can focus on, from aligning costs more effectively to enhancing the customer experience – but one chronic and elusive prize is churn.
WWE Extreme Rules 2017: Predictions, Match Card For 'Monday Night Raw' PPV
WWE's next pay-per-view is set for Sunday night in Baltimore with Extreme Rules 2017. It will exclusively feature wrestlers from the "Monday Night Raw" roster. Six matches are on the Extreme Rules card, and the main event will determine the No.1 contender for Brock Lesnar's WWE Universal Championship. Below are predictions for every match at the event. Reigns is really the only superstar that has no chance in this match. The winner is expected to face Lesnar at the Great Balls of Fire PPV on July 9, and Reigns vs. Lesnar would only happen at SummerSlam or WrestleMania.
Optimization of Tree Ensembles
Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches. In our drug design case, we show how our approach can identify compounds that efficiently trade-off predicted performance and novelty with respect to existing, known compounds. In our customized pricing case, we show how our approach can efficiently determine optimal store-level prices under a random forest model that delivers excellent predictive accuracy.