AITopics

2405.07316

Country:

North America > United States > Wisconsin > Eau Claire County > Eau Claire (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Arizona (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Esmaeili, Seyed A., Shin, Suho, Slivkins, Aleksandrs

Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents

arXiv.org Artificial IntelligenceDec-13-2023

We consider a variant of the stochastic multi-armed bandit problem. Specifically, the arms are strategic agents who can improve their rewards or absorb them. The utility of an agent increases if she is pulled more or absorbs more of her rewards but decreases if she spends more effort improving her rewards. Agents have heterogeneous properties, specifically having different means and able to improve their rewards up to different levels. Further, a non-empty subset of agents are ''honest'' and in the worst case always give their rewards without absorbing any part. The principal wishes to obtain a high revenue (cumulative reward) by designing a mechanism that incentives top level performance at equilibrium. At the same time, the principal wishes to be robust and obtain revenue at least at the level of the honest agent with the highest mean in case of non-equilibrium behaviour. We identify a class of MAB algorithms which we call performance incentivizing which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium and are robust under any strategy profile. Interestingly, we show that UCB is an example of such a MAB algorithm. Further, in the case where the top performance level is unknown we show that combining second price auction ideas with performance incentivizing algorithms achieves performance at least at the second top level while also being robust.

algorithm, equilibrium, mab algorithm, (16 more...)

2312.07929

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

arXiv.org Artificial IntelligenceMar-19-2023

Deceptive Reinforcement Learning in Model-Free Domains

Lewis, Alan, Miller, Tim

This paper investigates deceptive reinforcement learning for privacy preservation in model-free and continuous action space domains. In reinforcement learning, the reward function defines the agent's objective. In adversarial scenarios, an agent may need to both maximise rewards and keep its reward function private from observers. Recent research presented the ambiguity model (AM), which selects actions that are ambiguous over a set of possible reward functions, via pre-trained $Q$-functions. Despite promising results in model-based domains, our investigation shows that AM is ineffective in model-free domains due to misdirected state space exploration. It is also inefficient to train and inapplicable in continuous action space domains. We propose the deceptive exploration ambiguity model (DEAM), which learns using the deceptive policy during training, leading to targeted exploration of the state space. DEAM is also applicable in continuous action spaces. We evaluate DEAM in discrete and continuous action space path planning environments. DEAM achieves similar performance to an optimal model-based version of AM and outperforms a model-free version of AM in terms of path cost, deceptiveness and training efficiency. These results extend to the continuous domain.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2303.10838

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Vial, Daniel, Shakkottai, Sanjay, Srikant, R.

Robust Multi-Agent Bandits Over Undirected Graphs

arXiv.org Artificial IntelligenceJan-26-2023

We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a network to minimize regret but $m$ malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur $O( (m + K/n) \log (T) / \Delta )$ regret in this setting, where $K$ is the number of arms and $\Delta$ is the arm gap. For $m \ll K$, this improves over the single-agent baseline regret of $O(K\log(T)/\Delta)$. In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in $K$ and $n$. In light of this negative result, we propose a new algorithm for which the $i$-th agent has regret $O( ( d_{\text{mal}}(i) + K/n) \log(T)/\Delta)$ on any connected and undirected graph, where $d_{\text{mal}}(i)$ is the number of $i$'s neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

artificial intelligence, data mining, machine learning, (18 more...)

2203.00076

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)
(2 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-5-2022

A Reputation System for Market Security and Equity

Kolonin, Anton, Duong, Deborah, Goertzel, Ben, Pennachin, Cassio, Iklé, Matt, Znidar, Nejc, Argentieri, Marco

We simulate a reputation system in a market to optimise the balance between market security and market equity. We introduce a method of using a reputation system that will stabilise the distribution of wealth in a market in a fair manner. We also introduce metrics of a modified Gini that takes production quality into account, a way to use a weighted Pearson as a tool to optimise balance.

artificial intelligence, machine learning, social media, (18 more...)

2210.02362

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Social Media (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.70)

Mugunthan, Vaikkunth, Rahman, Ravi, Kagal, Lalana

BlockFLow: An Accountable and Privacy-Preserving Solution for Federated Learning

arXiv.org Machine LearningJul-7-2020

Federated learning enables the development of a machine learning model among collaborating agents without requiring them to share their underlying data. However, malicious agents who train on random data, or worse, on datasets with the result classes inverted, can weaken the combined model. BlockFLow is an accountable federated learning system that is fully decentralized and privacy-preserving. Its primary goal is to reward agents proportional to the quality of their contribution while protecting the privacy of the underlying datasets and being resilient to malicious adversaries. Specifically, BlockFLow incorporates differential privacy, introduces a novel auditing mechanism for model contribution, and uses Ethereum smart contracts to incentivize good behavior. Unlike existing auditing and accountability methods for federated learning systems, our system does not require a centralized test dataset, sharing of datasets between the agents, or one or more trusted auditors; it is fully decentralized and resilient up to a 50% collusion attack in a malicious trust model. When run on the public Ethereum blockchain, BlockFLow uses the results from the audit to reward parties with cryptocurrency based on the quality of their contribution. We evaluated BlockFLow on two datasets that offer classification tasks solvable via logistic regression models. Our results show that the resultant auditing scores reflect the quality of the honest agents' datasets. Moreover, the scores from dishonest agents are statistically lower than those from the honest agents. These results, along with the reasonable blockchain costs, demonstrate the effectiveness of BlockFLow as an accountable federated learning system.

agent, artificial intelligence, machine learning, (16 more...)

2007.03856

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Vial, Daniel, Shakkottai, Sanjay, Srikant, R.

Robust Multi-Agent Multi-Armed Bandits

arXiv.org Machine LearningJul-7-2020

There has been recent interest in collaborative multi-agent bandits, where groups of agents share recommendations to decrease per-agent regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include honest and malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with them, i.e., "blacklist" them. We show that collaboration indeed decreases regret for this algorithm, when the number of malicious agents is small compared to the number of arms, and crucially without assumptions on the malicious agents' behavior. Thus, our algorithm is robust against any malicious recommendation strategy.

agent, algorithm, artificial intelligence, (18 more...)

2007.03812

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Illinois (0.04)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Gupta, Nirupam, Vaidya, Nitin H.

Byzantine Fault Tolerant Distributed Linear Regression

arXiv.org Machine LearningApr-4-2019

This paper considers the problem of Byzantine fault tolerance in distributed linear regression in a multi-agent system. However, the proposed algorithms are given for a more general class of distributed optimization problems, of which distributed linear regression is a special case. The system comprises of a server and multiple agents, where each agent is holding a certain number of data points and responses that satisfy a linear relationship (could be noisy). The objective of the server is to determine this relationship, given that some of the agents in the system (up to a known number) are Byzantine faulty (aka. actively adversarial). We show that the server can achieve this objective, in a deterministic manner, by robustifying the original distributed gradient descent method using norm based filters, namely 'norm filtering' and 'norm-cap filtering', incurring an additional log-linear computation cost in each iteration. The proposed algorithms improve upon the existing methods on three levels: i) no assumptions are required on the probability distribution of data points, ii) system can be partially asynchronous, and iii) the computational overhead (in order to handle Byzantine faulty agents) is log-linear in number of agents and linear in dimension of data points. The proposed algorithms differ from each other in the assumptions made for their correctness, and the gradient filter they use.

artificial intelligence, assumption, machine learning, (15 more...)

1903.08752

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)

Rahnama, Arash, Antsaklis, Panos J.

Resilient Learning-Based Control for Synchronization of Passive Multi-Agent Systems under Attack

arXiv.org Machine LearningSep-28-2017

In this paper, we show synchronization for a group of output passive agents that communicate with each other according to an underlying communication graph to achieve a common goal. We propose a distributed event-triggered control framework that will guarantee synchronization and considerably decrease the required communication load on the band-limited network. We define a general Byzantine attack on the event-triggered multi-agent network system and characterize its negative effects on synchronization. The Byzantine agents are capable of intelligently falsifying their data and manipulating the underlying communication graph by altering their respective control feedback weights. We introduce a decentralized detection framework and analyze its steady-state and transient performances. We propose a way of identifying individual Byzantine neighbors and a learning-based method of estimating the attack parameters. Lastly, we propose learning-based control approaches to mitigate the negative effects of the adversarial attack.

agent, artificial intelligence, byzantine agent, (15 more...)

1709.10142

Country: North America > United States (0.27)

Genre: Research Report (0.49)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)