AITopics | Beschastnikh, Ivan

Collaborating Authors

Beschastnikh, Ivan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalable Data Point Valuation in Decentralized Learning

Pandl, Konstantin D., Huang, Chun-Yin, Beschastnikh, Ivan, Li, Xiaoxiao, Thiebes, Scott, Sunyaev, Ali

arXiv.org Artificial IntelligenceMay-1-2023

Existing research on data valuation in federated and swarm learning focuses on valuing client contributions and works best when data across clients is independent and identically distributed (IID). In practice, data is rarely distributed IID. We develop an approach called DDVal for decentralized data valuation, capable of valuing individual data points in federated and swarm learning. DDVal is based on sharing deep features and approximating Shapley values through a k-nearest neighbor approximation method. This allows for novel applications, for example, to simultaneously reward institutions and individuals for providing data to a decentralized machine learning task. The valuation of data points through DDVal allows to also draw hierarchical conclusions on the contribution of institutions, and we empirically show that the accuracy of DDVal in estimating institutional contributions is higher than existing Shapley value approximation methods for federated learning. Specifically, it reaches a cosine similarity in approximating Shapley values of 99.969 % in both, IID and non-IID data distributions across institutions, compared with 99.301 % and 97.250 % for the best state of the art methods. DDVal scales with the number of data points instead of the number of clients, and has a loglinear complexity. This scales more favorably than existing approaches with an exponential complexity. We show that DDVal is especially efficient in data distribution scenarios with many clients that have few data points - for example, more than 16 clients with 8,000 data points each. By integrating DDVal into a decentralized system, we show that it is not only suitable for centralized federated learning, but also decentralized swarm learning, which aligns well with the research on emerging internet technologies such as web3 to reward users for providing data to algorithms.

artificial intelligence, machine learning, valuation, (17 more...)

arXiv.org Artificial Intelligence

2305.01657

Country:

Europe > Germany > Baden-Württemberg (0.14)
North America > Canada > British Columbia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (0.97)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.68)

Add feedback

GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated Learning

He, Shiqi, Yan, Qifan, Wu, Feijie, Wang, Lanjun, Lécuyer, Mathias, Beschastnikh, Ivan

arXiv.org Artificial IntelligenceDec-2-2022

Federated learning (FL) is an effective technique to directly involve edge devices in machine learning training while preserving client privacy. However, the substantial communication overhead of FL makes training challenging when edge devices have limited network bandwidth. Existing work to optimize FL bandwidth overlooks downstream transmission and does not account for FL client sampling. In this paper we propose GlueFL, a framework that incorporates new client sampling and model compression algorithms to mitigate low download bandwidths of FL clients. GlueFL prioritizes recently used clients and bounds the number of changed positions in compression masks in each round. Across three popular FL datasets and three state-of-the-art strategies, GlueFL reduces downstream client bandwidth by 27% on average and reduces training time by 29% on average. One important strategy is client sampling, which limits the number of clients that perform training in Federated learning (FL) moves machine learning (ML) training each round (McMahan et al., 2017; Luo et al., 2022). In FL, edge clients communicate with a sampling reduces both upstream and downstream bandwidth.

artificial intelligence, gluefl, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2212.01523

Genre: Research Report (0.81)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Biscotti: A Ledger for Private and Secure Peer-to-Peer Machine Learning

Shayan, Muhammad, Fung, Clement, Yoon, Chris J. M., Beschastnikh, Ivan

arXiv.org Machine LearningFeb-22-2019

Federated Learning is the current state of the art in supporting secure multi-party ML: data is maintained on the owner's device and is aggregated through a secure protocol. However, this process assumes a trusted centralized infrastructure for coordination and clients must trust that the central service does not maliciously omit client contributions or use the byproducts of client data. As a response, we propose Biscotti: a fully decentralized P2P approach to multi-party ML, which uses blockchain and crypto primitives to coordinate a privacy-preserving ML process between peering clients. Our evaluation demonstrates that Biscotti is scalable, fault tolerant, and defends against known attacks. For example, Biscotti is able to protect the performance of the global model at scale even when 45% of adversaries are trying to poison the model. The implementation can be found at: https://github.com/DistributedML/Biscotti

biscotti, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1811.09904

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Iroko: A Framework to Prototype Reinforcement Learning for Data Center Traffic Control

Ruffy, Fabian, Przystupa, Michael, Beschastnikh, Ivan

arXiv.org Machine LearningDec-24-2018

Recent networking research has identified that data-driven congestion control (CC) can be more efficient than traditional CC in TCP. Deep reinforcement learning (RL), in particular, has the potential to learn optimal network policies. However, RL suffers from instability and over-fitting, deficiencies which so far render it unacceptable for use in datacenter networks. In this paper, we analyze the requirements for RL to succeed in the datacenter context. We present a new emulator, Iroko, which we developed to support different network topologies, congestion control algorithms, and deployment scenarios. Iroko interfaces with the OpenAI gym toolkit, which allows for fast and fair evaluation of different RL and traditional CC algorithms under the same conditions. We present initial benchmarks on three deep RL algorithms compared to TCP New Vegas and DCTCP. Our results show that these algorithms are able to learn a CC policy which exceeds the performance of TCP New Vegas on a dumbbell and fat-tree topology. We make our emulator open-source and publicly available: https://github.com/dcgym/iroko

artificial intelligence, reinforcement learning, télécommunications, (19 more...)

arXiv.org Machine Learning

1812.09975

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Telecommunications (0.94)
Information Technology > Services (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Mitigating Sybils in Federated Learning Poisoning

Fung, Clement, Yoon, Chris J. M., Beschastnikh, Ivan

arXiv.org Machine LearningAug-14-2018

Machine learning (ML) over distributed data is relevant to a variety of domains. Existing approaches, such as federated learning, compose the outputs computed by a group of devices at a central aggregator and run multi-round algorithms to generate a globally shared model. Unfortunately, such approaches are susceptible to a variety of attacks, including model poisoning, which is made substantially worse in the presence of sybils. In this paper we first evaluate the vulnerability of federated learning to sybil-based poisoning attacks. We then describe FoolsGold, a novel defense to this problem that identifies poisoning sybils based on the diversity of client contributions in the distributed learning process. Unlike prior work, our system does not assume that the attackers are in the minority, requires no auxiliary information outside of the learning process, and makes fewer assumptions about clients and their data. In our evaluation we show that FoolsGold exceeds the capabilities of existing state of the art approaches to countering ML poisoning attacks. Our results hold for a variety of conditions, including different distributions of data, varying poisoning targets, and various attack strategies.

deep learning, foolsgold, neural network, (18 more...)

arXiv.org Machine Learning

1808.04866

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback