A new comprehensive approach to nonlinear time series analysis and modeling is developed in the present paper. We introduce novel data-specific mid-distribution based Legendre Polynomial (LP) like nonlinear transformations of the original time series Y(t) that enables us to adapt all the existing stationary linear Gaussian time series modeling strategy and made it applicable for non-Gaussian and nonlinear processes in a robust fashion. The emphasis of the present paper is on empirical time series modeling via the algorithm LPTime. We demonstrate the effectiveness of our theoretical framework using daily S&P 500 return data between Jan/2/1963 - Dec/31/2009. Our proposed LPTime algorithm systematically discovers all the `stylized facts' of the financial time series automatically all at once, which were previously noted by many researchers one at a time.
We study the problem of learning to rank from multiple sources. Though multi-view learning and learning to rank have been studied extensively leading to a wide range of applications, multi-view learning to rank as a synergy of both topics has received little attention. The aim of the paper is to propose a composite ranking method while keeping a close correlation with the individual rankings simultaneously. We propose a multi-objective solution to ranking by capturing the information of the feature mapping from both within each view as well as across views using autoencoder-like networks. Moreover, a novel end-to-end solution is introduced to enhance the joint ranking with minimum view-specific ranking loss, so that we can achieve the maximum global view agreements within a single optimization process. The proposed method is validated on a wide variety of ranking problems, including university ranking, multi-view lingual text ranking and image data ranking, providing superior results.
We propose a new localized inference algorithm for answering marginalization queries in large graphical models with the correlation decay property. Given a query variable and a large graphical model, we define a much smaller model in a local region around the query variable in the target model so that the marginal distribution of the query variable can be accurately approximated. We introduce two approximation error bounds based on the Dobrushin's comparison theorem and apply our bounds to derive a greedy expansion algorithm that efficiently guides the selection of neighbor nodes for localized inference. We verify our theoretical bounds on various datasets and demonstrate that our localized inference algorithm can provide fast and accurate approximation for large graphical models.
Modern web sites provide easy access to large amounts of data via open application programming interfaces. Users interacting with these sites constantly change the underlying data sets, which can be represented in graph-structured form. Nodes in these dynamic graph structures exhibit dependencies over time. Analysing these dependencies is crucial for understanding and predicting the dynamics inherent to temporally changing graph structures on the web. When the graphs become large however, it is not feasible to take into account all properties of the graph and in general it is unclear how to choose the appropriate features. Moreover, comparing two nodes becomes difficult, if the nodes do not share exactly the same features. In this work we propose an algorithm that automatically learns the features that govern temporal dependencies between nodes in large dynamic graph structures. We present preliminary results of applying the algorithm to data collected from the web, discuss potential extensions of the framework and anticipate how a major problem in data mining, sparse data, could be tackled by leveraging Linked Data.
We present a novel framework for multi-label learning that explicitly addresses the challenge arising from the large number of classes and a small size of training data. The key assumption behind this work is that two examples tend to have large overlap in their assigned class memberships if they share high similarity in their input patterns. We capitalize this assumption by first computing two sets of similarities, one based on the input patterns of examples, and the other based on the class memberships of the examples. We then search for the optimal assignment of class memberships to the unlabeled data that minimizes the difference between these two sets of similarities. The optimization problem is formulated as a constrained Nonnegative Matrix Factorization (NMF) problem, and an algorithm is presented to efficiently find the solution. Compared to the existing approaches for multi-label learning, the proposed approach is advantageous in that it is able to explore both the unlabeled data and the correlation among different classes simultaneously. Experiments with text categorization show that our approach performs significantly better than several state-of-the-art classification techniques when the number of classes is large and the size of training data is small.