AITopics | Ping Li

The method of random projection has been a popular tool for data compression, similarity search, and machine learning. In many practical scenarios, applying quantization on randomly projected data could be very helpful to further reduce storage cost and facilitate more efficient retrievals, while only suffering from little loss in accuracy. In real-world applications, however, data collected from different sources may be quantized under different schemes, which calls for a need to study the asymmetric quantization problem. In this paper, we investigate the cosine similarity estimators derived in such setting under the Lloyd-Max (LM) quantization scheme. We thoroughly analyze the biases and variances of a series of estimators including the basic simple estimators, their normalized versions, and their debiased versions. Furthermore, by studying the monotonicity, we show that the expectation of proposed estimators increases with the true cosine similarity, on a broader family of stair-shaped quantizers. Experiments on nearest neighbor search justify the theory and illustrate the effectiveness of our proposed estimators.

data mining, information retrieval, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling

Ping Li, Xiaoyun Li, Cun-Hui Zhang

Neural Information Processing SystemsMar-26-2025, 09:40:27 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Möbius Transformation for Fast Inner Product Search on Graph

Zhixin Zhou, Shulong Tan, Zhaozhuo Xu, Ping Li

Neural Information Processing SystemsMar-22-2025, 16:37:21 GMT

We present a fast search on graph algorithm for Maximum Inner Product Search (MIPS). This optimization problem is challenging since traditional Approximate Nearest Neighbor (ANN) search methods may not perform efficiently in the nonmetric similarity measure.

data mining, information retrieval, machine learning, (21 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.28)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Add feedback

Outlier Detection and Robust PCA Using a Convex Measure of Innovation

Mostafa Rahmani, Ping Li

Neural Information Processing SystemsJan-27-2025, 20:15:29 GMT

This paper presents a provable and strong algorithm, termed Innovation Search (iSearch), to robust Principal Component Analysis (PCA) and outlier detection. An outlier by definition is a data point which does not participate in forming a low dimensional structure with a large number of data points in the data. In other words, an outlier carries some innovation with respect to most of the other data points.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Towards Practical Alternating Least-Squares for CCA

Zhiqiang Xu, Ping Li

Neural Information Processing SystemsJan-26-2025, 11:44:37 GMT

Alternating least-squares (ALS) is a simple yet effective solver for canonical correlation analysis (CCA). In terms of ease of use, ALS is arguably practitioners' first choice. Despite recent provably guaranteed variants, the empirical performance often remains unsatisfactory. To promote the practical use of ALS for CCA, we propose truly alternating least-squares. Instead of approximately solving two independent linear systems, in each iteration, it simply solves two coupled linear systems of half the size. It turns out that this coupling procedure is able to bring significant performance improvements in practical setting.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (0.68)
North America > Canada (0.47)
North America > United States (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Random Projections with Asymmetric Quantization

Xiaoyun Li, Ping Li

Neural Information Processing SystemsJan-26-2025, 04:41:29 GMT

The method of random projection has been a popular tool for data compression, similarity search, and machine learning. In many practical scenarios, applying quantization on randomly projected data could be very helpful to further reduce storage cost and facilitate more efficient retrievals, while only suffering from little loss in accuracy. In real-world applications, however, data collected from different sources may be quantized under different schemes, which calls for a need to study the asymmetric quantization problem. In this paper, we investigate the cosine similarity estimators derived in such setting under the Lloyd-Max (LM) quantization scheme. We thoroughly analyze the biases and variances of a series of estimators including the basic simple estimators, their normalized versions, and their debiased versions. Furthermore, by studying the monotonicity, we show that the expectation of proposed estimators increases with the true cosine similarity, on a broader family of stair-shaped quantizers. Experiments on nearest neighbor search justify the theory and illustrate the effectiveness of our proposed estimators.

data mining, information retrieval, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling

Ping Li, Xiaoyun Li, Cun-Hui Zhang

Neural Information Processing SystemsJan-26-2025, 02:27:44 GMT

Jaccard similarity is widely used as a distance measure in many machine learning and search applications. Typically, hashing methods are essential for the use of Jaccard similarity to be practical in large-scale settings. For hashing binary (0/1) data, the idea of one permutation hashing (OPH) with densification significantly accelerates traditional minwise hashing algorithms while providing unbiased and accurate estimates. In this paper, we propose a "re-randomization" strategy in the process of densification and we show that it achieves the smallest variance among existing densification schemes. The success of this idea inspires us to generalize one permutation hashing to weighted (non-binary) data, resulting in the so-called "binwise consistent weighted sampling (BCWS)" algorithm. We analyze the behavior of BCWS and compare it with a recent alternative. Experiments on a range of datasets and tasks confirm the effectiveness of proposed methods. We expect that BCWS will be adopted in practice for training kernel machines and fast similarity search.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Generalization Error Analysis of Quantized Compressive Learning

Xiaoyun Li, Ping Li

Neural Information Processing SystemsJan-22-2025, 00:56:59 GMT

In this paper, we consider the learning problem where the projected data is further compressed by scalar quantization, which is called quantized compressive learning. Generalization error bounds are derived for three models: nearest neighbor (NN) classifier, linear classifier and least squares regression. Besides studying finite sample setting, our asymptotic analysis shows that the inner product estimators have deep connection with NN and linear classification problem through the variance of their debiased counterparts. By analyzing the extra error term brought by quantization, our results provide useful implications to the choice of quantizers in applications involving different learning tasks. Empirical study is also conducted to validate our theoretical findings.

artificial intelligence, machine learning, quantizer, (17 more...)

Neural Information Processing Systems

Country: