AITopics

Learning over massive data stored in different locations is essential in many real-world applications. However, sharing data is full of challenges due to the increasing demands of privacy and security with the growing use of smart mobile devices and IoT devices. Federated learning provides a potential solution to privacy-preserving and secure machine learning, by means of jointly training a global model without uploading data distributed on multiple devices to a central server. However, most existing work on federated learning adopts machine learning models with full-precision weights, and almost all these models contain a large number of redundant parameters that do not need to be transmitted to the server, consuming an excessive amount of communication costs. To address this issue, we propose a federated trained ternary quantization (FTTQ) algorithm, which optimizes the quantized networks on the clients through a self-learning quantization factor. A convergence proof of the quantization factor and the unbiasedness of FTTQ is given. In addition, we propose a ternary federated averaging protocol (T-FedAvg) to reduce the upstream and downstream communication of federated learning systems. Empirical experiments are conducted to train widely used deep learning models on publicly available datasets, and our results demonstrate the effectiveness of FTTQ and T-FedAvg compared with the canonical federated learning algorithms in reducing communication costs and maintaining the learning performance.

algorithm, federated learning, learning, (13 more...)

2003.03564

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > United Kingdom > England > Surrey > Guildford (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Li, Xiang, Wang, Chao, Tan, Jiwei, Zeng, Xiaoyi, Ou, Dan, Zheng, Bo

For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each modality; or learn dynamic weights of different modalities for different items through technique like attention mechanism. However, a problem is that there usually exists common redundant information across multiple modalities. The dynamic weights of different modalities computed by using the redundant information may not correctly reflect the different importance of each modality. To address this, we explore the complementarity and redundancy of modalities by considering modality-specific and modality-invariant features differently. We propose a novel Multimodal Adversarial Representation Network (MARN) for the CTR prediction task. A multimodal attention network first calculates the weights of multiple modalities for each item according to its modality-specific features. Then a multimodal adversarial network learns modality-invariant representations where a double-discriminators strategy is introduced. Finally, we achieve the multimodal item representations by combining both modality-specific and modality-invariant representations. We conduct extensive experiments on both public and industrial datasets, and the proposed method consistently achieves remarkable improvements to the state-of-the-art methods. Moreover, the approach has been deployed in an operational E-commerce system and online A/B testing further demonstrates the effectiveness.

modality, multiple modality, representation, (14 more...)

doi: 10.1145/3366423.3380163

2003.07162

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry: Information Technology > Services > e-Commerce Services (0.75)

Technology:

Information Technology > Communications (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.93)
Information Technology > e-Commerce (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Veroneze, Rosana, Von Zuben, Fernando J.

New advances in enumerative biclustering algorithms with online partitioning

This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. By avoiding a priori partitioning and itemization of the dataset, RIn-Close_CVC implements an online partitioning, which is demonstrated here to guide to more informative biclustering results. The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, as formally proved here, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime; additional ability to handle datasets with missing values; and additional ability to operate with attributes characterized by distinct distributions or even mixed data types. The experimental results include synthetic and real-world datasets used to perform scalability and sensitivity analyses. As a practical case study, a parsimonious set of relevant and interpretable mixed-attribute-type rules is obtained in the context of supervised descriptive pattern mining.

algorithm, bicluster, dataset, (13 more...)

2003.04726

Country:

South America > Brazil > São Paulo > Campinas (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovakia > Košice > Košice (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Biomedical Informatics (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.45)

Yang, Qianqian, Mashhadi, Mahdi Boloursaz, Gunduz, Deniz

Distributed Deep Convolutional Compression for Massive MIMO CSI Feedback

Massive multiple-input multiple-output (MIMO) systems require downlink channel state information (CSI) at the base station (BS) to achieve spatial diversity and multiplexing gains. In a frequency division duplex (FDD) multiuser massive MIMO network, each user needs to compress and feedback its downlink CSI to the BS. The CSI overhead scales with the numbers of antennas, users and subcarriers, and becomes a major bottleneck for the overall spectral efficiency. In this paper, we propose a deep learning (DL)-based CSI compression scheme, called DeepCMC, composed of convolutional layers followed by quantization and entropy coding blocks. In comparison with previous deep learning DL-based CSI reduction structures, DeepCMC includes quantization and entropy coding blocks and minimizes a weighted rate-distortion cost which enables a trade-off between the CSI quality and its feedback overhead. Simulation results demonstrate that DeepCMC outperforms the state of the art CSI compression schemes in terms of the reconstruction quality of CSI for the same compression rate. We also propose a distributed version of DeepCMC for a multi-user MIMO scenario to encode and reconstruct the CSI from multiple users in a distributed manner. Distributed DeepCMC not only utilizes the inherent CSI structures of a single MIMO user for compression, but also benefits from the correlations among the channel matrices of nearby users to further improve the performance in comparison with DeepCMC.

architecture, deepcmc, matrix, (15 more...)

2003.04684

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Removing Disparate Impact of Differentially Private Stochastic Gradient Descent on Model Accuracy

Xu, Depeng, Du, Wei, Wu, Xintao

When we enforce differential privacy in machine learning, the utility-privacy trade-off is different w.r.t. each group. Gradient clipping and random noise addition disproportionately affect underrepresented and complex classes and subgroups, which results in inequality in utility loss. In this work, we analyze the inequality in utility loss by differential privacy and propose a modified differentially private stochastic gradient descent (DPSGD), called DPSGD-F, to remove the potential disparate impact of differential privacy on the protected group. DPSGD-F adjusts the contribution of samples in a group depending on the group clipping bias such that differential privacy has no disparate impact on group utility. Our experimental evaluation shows how group sample size and group clipping bias affect the impact of differential privacy in DPSGD, and how adaptive clipping for each group helps to mitigate the disparate impact caused by differential privacy in DPSGD-F.

differential privacy, gradient, privacy, (17 more...)

2003.03699

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Angle-Based Cost-Sensitive Multicategory Classification

Yang, Yi, Guo, Yuxuan, Chang, Xiangyu

Many real-world classification problems come with costs which can vary for different types of misclassification. It is thus important to develop cost-sensitive classifiers which minimize the total misclassification cost. Although binary cost-sensitive classifiers have been well-studied, solving multicategory classification problems is still challenging. A popular approach to address this issue is to construct K classification functions for a K-class problem and remove the redundancy by imposing a sum-to-zero constraint. However, such method usually results in higher computational complexity and inefficient algorithms. In this paper, we propose a novel angle-based cost-sensitive classification framework for multicategory classification without the sum-to-zero constraint. Loss functions that included in the angle-based cost-sensitive classification framework are further justified to be Fisher consistent. To show the usefulness of the framework, two cost-sensitive multicategory boosting algorithms are derived as concrete instances. Numerical experiments demonstrate that proposed boosting algorithms yield competitive classification performances against other existing boosting approaches.

algorithm, classification, loss function, (14 more...)

2003.03691

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Credit (0.93)
Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Ghazanfari, Behzad, Afghah, Fatemeh, Hajiaghayi, MohammadTaghi

Inverse Feature Learning: Feature learning based on Representation Learning of Error

This paper proposes inverse feature learning as a novel supervised feature learning technique that learns a set of high-level features for classification based on an error representation approach. The key contribution of this method is to learn the representation of error as high-level features, while current representation learning methods interpret error by loss functions which are obtained as a function of differences between the true labels and the predicted ones. One advantage of such learning method is that the learned features for each class are independent of learned features for other classes; therefore, this method can learn simultaneously meaning that it can learn new classes without retraining. Error representation learning can also help with generalization and reduce the chance of over-fitting by adding a set of impactful features to the original data set which capture the relationships between each instance and different classes through an error generation and analysis process. This method can be particularly effective in data sets, where the instances of each class have diverse feature representations or the ones with imbalanced classes. The experimental results show that the proposed method results in significantly better performance compared to the state-of-the-art classification techniques for several popular data sets. We hope this paper can open a new path to utilize the proposed perspective of error representation learning in different feature learning domains.

inverse feature, learning, representation, (13 more...)

2003.03689

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > Colorado (0.04)
North America > United States > Arizona > Coconino County > Flagstaff (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)

Geometry and Topology of Deep Neural Networks' Decision Boundaries

Liu, Bo

Geometry and topology of decision regions are closely related with classification performance and robustness against adversarial attacks. In this paper, we use differential geometry and topology to explore theoretically the geometrical and topological properties of decision regions produced by deep neural networks (DNNs). The goals are to obtain some geometrical and topological properties of decision regions for given DNN models, and provide some principled guidances to designing and regularizing DNNs. At first, we give the curvatures of decision boundaries in terms of network weights. Based on the rotation index theorem and Gauss-Bonnet-Chern theorem, we then propose methods to identify the closeness and connectivity of given decision boundaries, and obtain the Euler characteristics of closed ones, all without the need to solve decision boundaries explicitly. Finally, we give necessary conditions on network architectures in order to produce closed decision boundaries, and sufficient conditions on network weights for producing zero curvature (flat or developable) decision boundaries.

boundary, decision boundary, input space, (14 more...)

2003.03687

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets

Runge, Jakob

We consider causal discovery from time series using conditional independence (CI) based network learning algorithms such as the PC algorithm. The PC algorithm is divided into a skeleton phase where adjacencies are determined based on efficiently selected CI tests and subsequent phases where links are oriented utilizing the Markov and Faithfulness assumptions. Here we show that autocorrelation makes the PC algorithm much less reliable with very low adjacency and orientation detection rates and inflated false positives. We propose a new algorithm, called PCMCI$^+$ that extends the PCMCI method from [Runge et al., 2019b] to also include discovery of contemporaneous links. It separates the skeleton phase for lagged and contemporaneous conditioning sets and modifies the conditioning sets for the individual CI tests. We show that this algorithm now benefits from increasing autocorrelation and yields much more adjacency detection power and especially more orientation recall for contemporaneous links while controlling false positives and having much shorter runtimes. Numerical experiments indicate that the algorithm can be of considerable use in many application scenarios for dozens of variables and large time delays.

autocorrelation, ci test, pcmci 0, (14 more...)

2003.03685

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Canary Islands (0.04)
Europe > Germany (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)

Ilhan, Fatih, Kozat, Suleyman Serdar

Modeling of Spatio-Temporal Hawkes Processes with Randomized Kernels

We investigate spatio-temporal event analysis using point processes. Inferring the dynamics of event sequences spatiotemporally has many practical applications including crime prediction, social media analysis, and traffic forecasting. In particular, we focus on spatio-temporal Hawkes processes that are commonly used due to their capability to capture excitations between event occurrences. We introduce a novel inference framework based on randomized transformations and gradient descent to learn the process. We replace the spatial kernel calculations by randomized Fourier feature-based transformations. The introduced randomization by this representation provides flexibility while modeling the spatial excitation between events. Moreover, the system described by the process is expressed within closed-form in terms of scalable matrix operations. During the optimization, we use maximum likelihood estimation approach and gradient descent while properly handling positivity and orthonormality constraints. The experiment results show the improvements achieved by the introduced method in terms of fitting capability in synthetic and real datasets with respect to the conventional inference methods in the spatio-temporal Hawkes process literature. We also analyze the triggering interactions between event types and how their dynamics change in space and time through the interpretation of learned parameters.

event sequence, point process, spatiotemporal hawke process, (13 more...)

2003.03671

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)