Goto

Collaborating Authors

 Alibaba Group


Accelerated Sparse Linear Regression via Random Projection

AAAI Conferences

In this paper, we present an accelerated numerical method based on random projection for sparse linear regression. Previous studies have shown that under appropriate conditions, gradient-based methods enjoy a geometric convergence rate when applied to this problem. However, the time complexity of evaluating the gradient is as large as $\mathcal{O}(nd)$, where $n$ is the number of data points and $d$ is the dimensionality, making those methods inefficient for large-scale and high-dimensional dataset. To address this limitation, we first utilize random projection to find a rank-$k$ approximator for the data matrix, and reduce the cost of gradient evaluation to $\mathcal{O}(nk+dk)$, a significant improvement when $k$ is much smaller than $d$ and $n$. Then, we solve the sparse linear regression problem via a proximal gradient method with a homotopy strategy to generate sparse intermediate solutions. Theoretical analysis shows that our method also achieves a global geometric convergence rate, and moreover the sparsity of all the intermediate solutions are well-bounded over the iterations. Finally, we conduct experiments to demonstrate the efficiency of the proposed method.


Stochastic Optimization for Kernel PCA

AAAI Conferences

Kernel Principal Component Analysis (PCA) is a popular extension of PCA which is able to find nonlinear patterns from data. However, the application of kernel PCA to large-scale problems remains a big challenge, due to its quadratic space complexity and cubic time complexity in the number of examples. To address this limitation, we utilize techniques from stochastic optimization to solve kernel PCA with linear space and time complexities per iteration. Specifically, we formulate it as a stochastic composite optimization problem, where a nuclear norm regularizer is introduced to promote low-rankness, and then develop a simple algorithm based on stochastic proximal gradient descent. During the optimization process, the proposed algorithm always maintains a low-rank factorization of iterates that can be conveniently held in memory. Compared to previous iterative approaches, a remarkable property of our algorithm is that it is equipped with an explicit rate of convergence. Theoretical analysis shows that the solution of our algorithm converges to the optimal one at an O(1/T) rate, where T is the number of iterations.


Fast and Accurate Refined Nystrรถm-Based Kernel SVM

AAAI Conferences

In this paper, we focus on improving the performance of the Nystrรถm based kernel SVM. Although the Nystrรถm approximation has been studied extensively and its application to kernel classification has been exhibited in several studies, there still exists a potentially large gap between the performance of classifier learned with the Nystrรถm approximation and that learned with the original kernel. In this work, we make novel contributions to bridge the gap without increasing the training costs too much by proposing a refined Nystrรถm based kernel classifier. We adopt a two-step approach that in the first step we learn a sufficiently good dual solution and in the second step we use the obtained dual solution to construct a new set of bases for the Nystrรถm approximation to re-train a refined classifier. Our approach towards learning a good dual solution is based on a sparse-regularized dual formulation with the Nystrรถm approximation, which can be solved with the same time complexity as solving the standard formulation. We justify our approach by establishing a theoretical guarantee on the error of the learned dual solution in the first step with respect to the optimal dual solution under appropriate conditions. The experimental results demonstrate that (i) the obtained dual solution by our approach in the first step is closer to the optimal solution and yields improved prediction performance; and (ii) the second step using the obtained dual solution to re-train the model further improves the performance.


Portfolio Choices with Orthogonal Bandit Learning

AAAI Conferences

The investigation and development of new methods from diverse perspectives to shed light on portfolio choice problems has never stagnated in financial research. Recently, multi-armed bandits have drawn intensive attention in various machine learning applications in online settings. The tradeoff between exploration and exploitation to maximize rewards in bandit algorithms naturally establishes a connection to portfolio choice problems. In this paper, we present a bandit algorithm for conducting online portfolio choices by effectually exploiting correlations among multiple arms. Through constructing orthogonal portfolios from multiple assets and integrating with the upper confidence bound bandit framework, we derive the optimal portfolio strategy that represents the combination of passive and active investments according to a risk-adjusted reward function. Compared with oft-quoted trading strategies in finance and machine learning fields across representative real-world market datasets, the proposed algorithm demonstrates superiority in both risk-adjusted return and cumulative wealth.


Nystrom Approximation for Sparse Kernel Methods: Theoretical Analysis and Empirical Evaluation

AAAI Conferences

While if kernels are not Kernel methods (Schรถlkopf and Smola 2002; Xu et al. 2009) low rank, Nystrรถm approximations can usually lead to suboptimal have received a lot of attention in recent studies of machine performances. To alleviate the strong assumption in learning. These methods project data into high-dimensional the seeking of the approximation bounds, we take a more or even infinite-dimensional spaces via kernel mapping general assumption that the design matrix K ensuring the restricted functions. Despite the strong generalization ability induced isometric property (Koltchinskii 2011). In particular, by kernel methods, they usually suffer from the high computation the new assumption obeys the restricted eigenvalue condition complexity of calculating the kernel matrix (also (Koltchinskii 2011; Bickel, Ritov, and Tsybakov 2009), called Gram matrix). Although low-rank decomposition which has been shown to be more general than several techniques(e.g., Cholesky Decomposition (Fine and Scheinberg other similar assumptions used in sparsity literature (Candes 2002; Bach and Jordan 2005)), and truncating methods(e.g., and Tao 2007; Donoho, Elad, and Temlyakov 2006; Kernel Tapering (Shen, Xu, and Allebach 2014; Zhang and Huang 2008). Based on the restricted eigenvalue Furrer, Genton, and Nychka 2006)) can accelerate the calculation condition, we have provided error bounds for kernel approximation of the kernel matrix, they still need to compute the and recovery rate in sparse kernel regression.


Transaction Costs-Aware Portfolio Optimization via Fast Lowner-John Ellipsoid Approximation

AAAI Conferences

However, implementing such a strategy requires combining the VFI framework with policy parameterization, rebalancing continually as assets prices fluctuate, the proposed ADP method enjoys complementary advantages and therefore will lead to high or even infinite transaction of low approximation errors from VFI and high computational costs. Since then researchers have tried to address this issue efficiency from policy parameterization. Briefly, by solving Merton's portfolio problem in the presence the components from VFI pave the way for effectively parameterizing of transaction costs. Thereinto, the proportional transaction a complex policy in a high-dimensional space; costs model, as a suitable model for brokerage commissions the components from policy parameterization provide a and bid-ask spread costs, typifies the common situation pathway to efficiently evaluating the strategy and bypassing for normal investors (Brandt 2010; Cvitanic 2001; the issue of error amplification.


Multi-View Point Registration via Alternating Optimization

AAAI Conferences

Multi-view point registration is a relatively less studied problem compared with two-view point registration. Directly applying pairwise registration often leads to matching discrepancy as the mapping between two point sets can be determined either by direct correspondences or by any intermediate point set. Also, the local two-view registration tends to be sensitive to noises. We propose a novel multi-view registration method, where the optimal registration is achieved via an efficient and effective alternating concave minimization process. We further extend our solution to a general case in practice of registration among point sets with different cardinalities. Extensive empirical evaluations of peer methods on both synthetic data and real images suggest our method is robust to large disturbance. In particular, it is shown that our method outperforms peer point matching methods and performs competitively against graph matching approaches. The latter approaches utilize the additional second-order information at the cost of exponentially increased run-time, thus usually being less efficient.


Probabilistic Attributed Hashing

AAAI Conferences

Due to the simplicity and efficiency, many hashing methods have recently been developed for large-scale similarity search. Most of the existing hashing methods focus on mapping low-level features to binary codes, but neglect attributes that are commonly associated with data samples. Attribute data, such as image tag, product brand, and user profile, can represent human recognition better than low-level features. However, attributes have specific characteristics, including high-dimensional, sparse and categorical properties, which is hardly leveraged into the existing hashing learning frameworks. In this paper, we propose a hashing learning framework, Probabilistic Attributed Hashing (PAH), to integrate attributes with low-level features. The connections between attributes and low-level features are built through sharing a common set of latent binary variables, i.e. hash codes, through which attributes and features can complement each other. Finally, we develop an efficient iterative learning algorithm, which is generally feasible for large-scale applications. Extensive experiments and comparison study are conducted on two public datasets, i.e., DBLP and NUS-WIDE. The results clearly demonstrate that the proposed PAH method substantially outperforms the peer methods.