Goto

Collaborating Authors

 Accuracy


Forecasting market states

arXiv.org Machine Learning

In common terminology, there are periods of'bull' market in which prices are more likely to rise and periods of'bear' market in which prices are more likely to fall. These different'states' of markets are commonly attributed in literature to unobservable, orlatent, regimes representing a set of macroeconomic, market and sentiment variables. Many time series models presented in literature tried to capture this phenomenon. Among the most popular methods, it is worth mentioning the TAR models (Tong 1978), trying to estimate'structural breaks' in the time series process, and the Markov Switching models (Hamilton 1989), where the change in regimes are parametrized by means of an unobserved state variable typically modelledas Markov chain. However, the application of TAR models in finance is frequently criticized since it cannot be established with certainty when a structural break has occurred in economic time series and the prior knowledge of major economic events could lead to bias in inference (Campbellet al. 1997). Markov switching models, on the other hand, are highly affected by the curse of dimensionality. In particular, for slightly more complex dynamics than the original proposal (Hamilton 1989), we need to rely on variational inference techniques or MCMC methods (Tsay 2005, Kim and Nelson 1999). This implies that, in a multivariate context and particularly if November 27, 2018 ForecastingMarketStates v2.1


Sentiment Analysis of Financial News Articles using Performance Indicators

arXiv.org Machine Learning

Mining financial text documents and understanding the sentiments of individual investors, institutions and markets is an important and challenging problem in the literature. Current approaches to mine sentiments from financial texts largely rely on domain specific dictionaries. However, dictionary based methods often fail to accurately predict the polarity of financial texts. This paper aims to improve the state-of-the-art and introduces a novel sentiment analysis approach that employs the concept of financial and non-financial performance indicators. It presents an association rule mining based hierarchical sentiment classifier model to predict the polarity of financial texts as positive, neutral or negative. The performance of the proposed model is evaluated on a benchmark financial dataset. The model is also compared against other state-of-the-art dictionary and machine learning based approaches and the results are found to be quite promising. The novel use of performance indicators for financial sentiment analysis offers interesting and useful insights.


5 Key Terms You Should Know About Machine Learning MarkTechPost

#artificialintelligence

Machine learning as a whole is changing the way that we are assessing various algorithmic approaches for problem-solving in our world. Many developers are using this concept to generate improvements with complex decisions and tasks worldwide. Machine learning does represent the future in algorithmic approaches, and it's a model that can help us to the advanced technology of a whole. If you're interested in getting into machine learning, it's very important that you understand some of the basic concepts involved with the machine learning process and development in machine learning. This term has to do with the varying levels of sensitivity and specificity that is directly represented in the curve with ROC.


Predicting Diabetes Disease Evolution Using Financial Records and Recurrent Neural Networks

arXiv.org Machine Learning

Managing patients with chronic diseases is a major and growing healthcare challenge in several countries. A chronic condition, such as diabetes, is an illness that lasts a long time and does not go away, and often leads to the patient's health gradually getting worse. While recent works involve raw electronic health record (EHR) from hospitals, this work uses only financial records from health plan providers to predict diabetes disease evolution with a self-attentive recurrent neural network. The use of financial data is due to the possibility of being an interface to international standards, as the records standard encodes medical procedures. The main goal was to assess high risk diabetics, so we predict records related to diabetes acute complications such as amputations and debridements, revascularization and hemodialysis. Our work succeeds to anticipate complications between 60 to 240 days with an area under ROC curve ranging from 0.81 to 0.94. In this paper we describe the first half of a work-in-progress developed within a health plan provider with ROC curve ranging from 0.81 to 0.83. This assessment will give healthcare providers the chance to intervene earlier and head off hospitalizations. We are aiming to deliver personalized predictions and personalized recommendations to individual patients, with the goal of improving outcomes and reducing costs


Distinguishing correlation from causation using genome-wide association studies

arXiv.org Machine Learning

Genome-wide association studies (GWAS) have emerged as a rich source of genetic clues into disease biology, and they have revealed strong genetic correlations among many diseases and traits. Some of these genetic correlations may reflect causal relationships. We developed a method to quantify causal relationships between genetically correlated traits using GWAS summary association statistics. In particular, our method quantifies what part of the genetic component of trait 1 is also causal for trait 2 using mixed fourth moments $E(\alpha_1^2\alpha_1\alpha_2)$ and $E(\alpha_2^2\alpha_1\alpha_2)$ of the bivariate effect size distribution. If trait 1 is causal for trait 2, then SNPs affecting trait 1 (large $\alpha_1^2$) will have correlated effects on trait 2 (large $\alpha_1\alpha_2$), but not vice versa. We validated this approach in extensive simulations. Across 52 traits (average $N=331$k), we identified 30 putative genetically causal relationships, many novel, including an effect of LDL cholesterol on decreased bone mineral density. More broadly, we demonstrate that it is possible to distinguish between genetic correlation and causation using genetic association data.


Recent Advances in Open Set Recognition: A Survey

arXiv.org Machine Learning

In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers not only to accurately classify the seen classes, but also to effectively deal with the unseen ones. This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, experiment setup and evaluation metrics. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also overview the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.


Synthetic Lung Nodule 3D Image Generation Using Autoencoders

arXiv.org Machine Learning

Computer aided diagnosis, where a software tool analyzes the patient's medical imaging results to suggest a possible diagnosis, is a promising direction: froman input low-resolution 3D CT scan, image processing techniques can be used to classify nodules in the lung scan as potentially cancerous or benign. But such systems require quality 3D training images to ensure the classifiers are adequately trained with sufficient generality. Cancerous lung nodule detection still suffers from a dearth of training images which hampers the ability to effectively automate and improve the analysis of CT scans for cancer risks (Valente et al., 2016). In this work, we propose to address this problem by automatically generating synthetic 3D images of nodules, to augment the training dataset of such systems with meaningful (yet computer-generated) lung nodules images. This is the full length paper for work originally presentedat the 3rd International Workshop on Biomedical Informatics with Optimization and Machine Learning in conjuction with International Joint Conference on Artificial Intelligence (IJCAI) (Kommrusch & Pouchet, 2018). Li et al. showed how to analyze nodules using computed features from the 3D images (such as volume, degree of compactness and irregularity, etc.) (Q.


How to Use Heuristics for Differential Privacy

arXiv.org Machine Learning

We develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, for which performance can be empirically evaluated. However, privacy guarantees cannot be evaluated empirically, and must be proven --- without making heuristic assumptions. We show that learning problems over broad classes of functions can be solved privately and efficiently, assuming the existence of a non-private oracle for solving the same problem. Our first algorithm yields a privacy guarantee that is contingent on the correctness of the oracle. We then give a reduction which applies to a class of heuristics which we call certifiable, which allows us to convert oracle-dependent privacy guarantees to worst-case privacy guarantee that hold even when the heuristic standing in for the oracle might fail in adversarial ways. Finally, we consider a broad class of functions that includes most classes of simple boolean functions studied in the PAC learning literature, including conjunctions, disjunctions, parities, and discrete halfspaces. We show that there is an efficient algorithm for privately constructing synthetic data for any such class, given a non-private learning oracle. This in particular gives the first oracle-efficient algorithm for privately generating synthetic data for contingency tables. The most intriguing question left open by our work is whether or not every problem that can be solved differentially privately can be privately solved with an oracle-efficient algorithm. While we do not resolve this, we give a barrier result that suggests that any generic oracle-efficient reduction must fall outside of a natural class of algorithms (which includes the algorithms given in this paper).


An Adaptive Oversampling Learning Method for Class-Imbalanced Fault Diagnostics and Prognostics

arXiv.org Machine Learning

Data-driven fault diagnostics and prognostics suffers from class-imbalance problem in industrial systems and it raises challenges to common machine learning algorithms as it becomes difficult to learn the features of the minority class samples. Synthetic oversampling methods are commonly used to tackle these problems by generating the minority class samples to balance the distributions between majority and minority classes. However, many of oversampling methods are inappropriate that they cannot generate effective and useful minority class samples according to different distributions of data, which further complicate the process of learning samples. Thus, this paper proposes a novel adaptive oversampling technique: EM-based Weighted Minority Oversampling TEchnique (EWMOTE) for industrial fault diagnostics and prognostics. The methods comprises a weighted minority sampling strategy to identify hard-to-learn informative minority fault samples and Expectation Maximization (EM) based imputation algorithm to generate fault samples. To validate the performance of the proposed methods, experiments are conducted in two real datasets. The results show that the method could achieve better performance on not only binary class, but multi-class imbalance learning task in different imbalance ratios than other oversampling-based baseline models.


The Taboo Trap: Behavioural Detection of Adversarial Samples

arXiv.org Machine Learning

Deep Neural Networks (DNNs) have become a powerful tool for a wide range of problems. Yet recent work has shown an increasing variety of adversarial samples that can fool them. Most existing detection mechanisms impose significant costs, either by using additional classifiers to spot adversarial samples, or by requiring the DNN to be restructured. In this paper, we introduce a novel defence. We train our DNN so that, as long as it is working as intended on the kind of inputs we expect, its behavior is constrained, in that a set of behaviors are taboo. If it is exposed to adversarial samples, they will often cause a taboo behavior, which we can detect. As an analogy, we can imagine that we are teaching our robot good manners; if it's ever rude, we know it's come under some bad influence. This defence mechanism is very simple and, although it involves a modest increase in training, has almost zero computation overhead at runtime -- making it particularly suitable for use in embedded systems. Taboos can be both subtle and diverse. Just as humans' choice of language can convey a lot of information about location, affiliation, class and much else that can be opaque to outsiders but that enables members of the same group to recognise each other, so also taboo choice can encode and hide information. We can use this to make adversarial attacks much harder. It is a well-established design principle that the security of a system should not depend on the obscurity of its design, but of some variable (the key) which can differ between implementations and be changed as necessary. We explain how taboos can be used to equip a classifier with just such a key, and to tune the keying mechanism to adversaries of various capabilities. We evaluate the performance of a prototype against a wide range of attacks and show how our simple defense can work well in practice.