AITopics | Gang, Arpita

Collaborating Authors

Gang, Arpita

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

Gang, Arpita, Bajwa, Waheed U.

arXiv.org Artificial IntelligenceFeb-15-2022

Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack `exact' or `global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results.

artificial intelligence, eigenvector, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TSP.2022.3229635

2108.12373

Country: North America > United States (0.67)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)

Add feedback

A Linearly Convergent Algorithm for Distributed Principal Component Analysis

Gang, Arpita, Bajwa, Waheed U.

arXiv.org Machine LearningJan-4-2021

Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. This paper focuses on this dual objective of PCA, namely, dimensionality reduction and decorrelation of features, which requires estimating the eigenvectors of a data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors. The ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or communications overhead of these solutions remain a concern. With an eye towards communications efficiency, this paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the eigenvectors of a data covariance matrix when data are distributed across an undirected and arbitrarily connected network of machines. Furthermore, the proposed algorithm is shown to converge linearly to a neighborhood of the true solution. Numerical results are also shown to demonstrate the efficacy of the proposed solution.

artificial intelligence, eigenvector, survey article, (18 more...)

arXiv.org Machine Learning

2101.013

Country: North America > United States (0.92)

Genre: Research Report (0.81)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)

Add feedback

Adversary-resilient Inference and Machine Learning: From Distributed to Decentralized

Yang, Zhixiong, Gang, Arpita, Bajwa, Waheed U.

arXiv.org Machine LearningAug-22-2019

Statistical inference and machine learning algorithms have traditionally been developed for data available at a single location. Unlike this centralized setting, modern datasets are increasingly being distributed across multiple physical entities (sensors, devices, machines, data centers, etc.) for a multitude of reasons that range from storage, memory, and computational constraints to privacy concerns and engineering needs. This has necessitated the development of inference and learning algorithms capable of operating on non-collocated data. Such algorithms can be divided into two broad categories, namely, distributed algorithms and decentralized algorithms . Distributed algorithms correspond to the setup in which the data-bearing entities (henceforth referred to as "nodes") only communicate with a single entity (referred to as master node, central server, parameter server, fusion center, etc.), which is tasked with generating the final result. Such distributed setups arise in the context of parallel computing, where the focus is computational speedups and/or overcoming memory/storage bottlenecks, and federated systems, where "raw" data collected by individual nodes cannot be shared with the master node due to either communication constraints (e.g., sensor networks) or privacy concerns (e.g., smartphone data). Decentralized algorithms, on the other hand, correspond to the setup that lacks a central server; instead, individual nodes in this setup communicate among themselves over a network (often ad hoc) to reach a common solution (i.e., achieve consensus) at all nodes. Such decentralized setups arise either out of the need to eliminate single points of failure in distributed setups or due to practical constraints, as in the internet of things and autonomous systems. We refer the reader to Figure 1 for examples of distributed and decentralized setups.Is it distributed or is it decentralized? Inference and learning from non-collocated data have been studied for decades in computer science, control, signal processing, and statistics. Both among and within these disciplines, however, there is no consensus on use of the terms "distributed" and "decentralized." Though many works share the definitions provided in here, there are numerous authors who use these two terms interchangeably, while there are some other authors who reverse these definitions. Inference and machine learning algorithms involving non-collocated data are broadly divisible into the categories of ( i) distributed algorithms and ( ii) decentralized algorithms.

node, optimization problem, survey article, (19 more...)

arXiv.org Machine Learning

1908.08649

Country:

North America > United States (0.14)
Asia > India (0.14)
Asia > China (0.14)

Genre:

Research Report (0.64)
Overview (0.47)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Communications > Networks > Sensor Networks (0.86)

Add feedback