data vector
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- (14 more...)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
- Oceania > Australia > New South Wales > Sydney (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- (12 more...)
Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation
We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, in many practical applications such as federated learning, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments for PCA, K-Means and Logistic Regression, which show that our estimators consistently outperform more sophisticated and expensive sparsification methods.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Random Projections with Asymmetric Quantization
The method of random projection has been a popular tool for data compression, similarity search, and machine learning. In many practical scenarios, applying quantization on randomly projected data could be very helpful to further reduce storage cost and facilitate more efficient retrievals, while only suffering from little loss in accuracy.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- (14 more...)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- (12 more...)
Generating Synthetic Relational Tabular Data via Structural Causal Models
Hoppe, Frederik, Franz, Astrid, Kleinemeier, Lars, Göbel, Udo
Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast quantities of synthetic tabular datasets derived from structural causal models (SCMs), demonstrates the critical role synthetic data plays in developing powerful tabular foundation models. However, most real-world tabular data exists in relational formats spanning multiple interconnected tables - a structure not adequately addressed by current generation methods. In this work, we extend the SCM-based approach by developing a novel framework that generates realistic synthetic relational tabular data including causal relationships across tables. Our experiments confirm that this framework is able to construct relational datasets with complex inter-table dependencies mimicking real-world scenarios.
Kernel Recursive Least Squares Dictionary Learning Algorithm
Alipoor, Ghasem, Skretting, Karl
Data factorization methods have met with considerable success in discovering latent features of the signals encountered in wide-ranging applications. In this way, the representation bases, which make up the columns of the basis matrix or dictionary, are learned from the available samples of the target environment. An example is the sparse representation (SR) in which the dictionary is intended to best represent the data with a small number of atoms, much smaller than the dimension of the signal space. It has been shown that, in addition to a more informative representation of signals, imposing sparsity constraints on the representation coefficients can improve the generalization performance and the computational efficiency [1, 2, 3]. Furthermore, the sparse representation is more robust to noise, redundancy, and missing data. These features are mainly attributed to the fact that the intrinsic dimension of natural signals is usually much smaller than their apparent dimension and hence SR in an appropriate dictionary can extract these intrinsic features more efficiently. SR has been a successful strategy and has received considerable attention and achieved state-of-the-art results in many applications, e.g.
- North America > United States (0.28)
- Europe > Norway > Western Norway > Rogaland > Stavanger (0.04)
- Asia > Middle East > Iran (0.04)
- (3 more...)
Quasicyclic Principal Component Analysis
Rumsey, Susanna E., Draper, Stark C., Kschischang, Frank R.
We present quasicyclic principal component analysis (QPCA), a generalization of principal component analysis (PCA), that determines an optimized basis for a dataset in terms of families of shift-orthogonal principal vectors. This is of particular interest when analyzing cyclostationary data, whose cyclic structure is not exploited by the standard PCA algorithm. We first formulate QPCA as an optimization problem, which we show may be decomposed into a series of PCA problems in the frequency domain. We then formalize our solution as an explicit algorithm and analyze its computational complexity. Finally, we provide some examples of applications of QPCA to cyclostationary signal processing data, including an investigation of carrier pulse recovery, a presentation of methods for estimating an unknown oversampling rate, and a discussion of an appropriate approach for pre-processing data with a non-integer oversampling rate in order to better apply the QPCA algorithm.
Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation
We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, in many practical applications such as federated learning, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments for PCA, K-Means and Logistic Regression, which show that our estimators consistently outperform more sophisticated and expensive sparsification methods.