Goto

Collaborating Authors

 exploitation


Learning to Balance Altruism and Self-interest Based on Empathy in Mixed-Motive Games

Neural Information Processing Systems

Real-world multi-agent scenarios often involve mixed motives, demanding altruistic agents capable of self-protection against potential exploitation. However, existing approaches often struggle to achieve both objectives. In this paper, based on that empathic responses are modulated by learned social relationships between agents, we propose LASE (**L**earning to balance **A**ltruism and **S**elf-interest based on **E**mpathy), a distributed multi-agent reinforcement learning algorithm that fosters altruistic cooperation through gifting while avoiding exploitation by other agents in mixed-motive games. LASE allocates a portion of its rewards to co-players as gifts, with this allocation adapting dynamically based on the social relationship --- a metric evaluating the friendliness of co-players estimated by counterfactual reasoning. In particular, social relationship measures each co-player by comparing the estimated $Q$-function of current joint action to a counterfactual baseline which marginalizes the co-player's action, with its action distribution inferred by a perspective-taking module. Comprehensive experiments are performed in spatially and temporally extended mixed-motive games, demonstrating LASE's ability to promote group collaboration without compromising fairness and its capacity to adapt policies to various types of interactive co-players.


A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Neural Information Processing Systems

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions about individual people (such as criminal recidivism prediction, lending, and sequential drug trials), exploration corresponds to explicitly sacrificing the well-being of one individual for the potential future benefit of others. In such settings, one might like to run a ``greedy'' algorithm, which always makes the optimal decision for the individuals at hand --- but doing this can result in a catastrophic failure to learn. In this paper, we consider the linear contextual bandit problem and revisit the performance of the greedy algorithm. We give a smoothed analysis, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve ``no regret'', perhaps (depending on the specifics of the setting) with a constant amount of initial training data. This suggests that in slightly perturbed environments, exploration and exploitation need not be in conflict in the linear setting.


LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward

Neural Information Processing Systems

Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreover, the interference from task-irrelevant observations in the episodic count may cause its intrinsic motivation to overlook task-related important changes of states, and the novelty in an episodic manner can lead to repeatedly revisit the familiar states across episodes. In order to resolve these issues, in this paper, we propose a learnable hash-based episodic count, which we name LECO, that efficiently performs as a task-specific intrinsic reward in hard exploration problems. In particular, the proposed intrinsic reward consists of the episodic novelty and the task-specific modulation where the former employs a vector quantized variational autoencoder to automatically obtain the discrete state codes for fast counting while the latter regulates the episodic novelty by learning a modulator to optimize the task-specific extrinsic reward. The proposed LECO specifically enables the automatic transition from exploration to exploitation during reinforcement learning. We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in MiniGrid and DMLab environments.


Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

Neural Information Processing Systems

We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies $\pi^{\text{O}}$ and $\pi^{\text{E}}$: $\pi^{\text{O}}$ (``O'' for ``online'') interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while $\pi^{\text{E}}$ (``E'' for ``exploit'') exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i.e., $\pi^{\text{E}}=\pi^{\text{O}}$) for the risk-averse users. We individually consider the gap-independent vs.~gap-dependent settings. For the former, we prove that the separation is indeed not beneficial from a minimax perspective. For the latter, we show that if choosing Pessimistic Value Iteration as the exploitation algorithm to produce $\pi^{\text{E}}$, we can achieve a constant regret for risk-averse users independent of the number of episodes $K$, which is in sharp contrast to the $\Omega(\log K)$ regret for any online RL algorithms in the same setting, while the regret of $\pi^{\text{O}}$ (almost) maintains its online regret optimality and does not need to compromise for the success of $\pi^{\text{E}}$.


OpenAI's Child Exploitation Reports Increased Sharply This Year

WIRED

OpenAI's Child Exploitation Reports Increased Sharply This Year The company made 80 times as many reports to the National Center for Missing & Exploited Children during the first six months of 2025 as it did in the same period a year prior. OpenAI sent 80 times as many child exploitation incident reports to the National Center for Missing & Exploited Children during the first half of 2025 as it did during a similar time period in 2024, according to a recent update from the company. The NCMEC's CyberTipline is a Congressionally authorized clearinghouse for reporting child sexual abuse material (CSAM) and other forms of child exploitation. Companies are required by law to report apparent child exploitation to the CyberTipline. When a company sends a report, NCMEC reviews it and then forwards it to the appropriate law enforcement agency for investigation.


BEACON: A Unified Behavioral-Tactical Framework for Explainable Cybercrime Analysis with Large Language Models

Sachdeva, Arush, Saravanan, Rajendraprasad, Sarkar, Gargi, Vemuri, Kavita, Shukla, Sandeep Kumar

arXiv.org Artificial Intelligence

Cybercrime has emerged as one of the most pervasive and economically destructive consequences of global digitalization. Contemporary online fraud and deception-based crimes now account for unprecedented financial losses worldwide, exceeding trillions of United States dollars (USD) annually (Morgan, 2016), while also inflicting severe psychological, social, and reputational harm on victims. Unlike classical cyberattacks targeting systems and networks, modern cybercrime increasingly exploits human vulnerabilities rather than purely technical weaknesses, relying on deception, persuasion, impersonation, emotional coercion, and trust manipulation as primary attack vectors (Holt, 2019; Yao, Zheng, Wu, Wu, Gao, Wang and Yang, 2025; Sarkar and Shukla, 2023; Sarkar, Singh, Kumar and Shukla, 2023). Existing cybersecurity frameworks, such as the Cyber Kill Chain and the MITRE ATT&CK framework, provide powerful abstractions for understanding technically sophisticated cyberattacks targeting enterprise systems and critical infrastructure (MITRE Corporation, 2025b,a). However, these models are fundamentally system-centric: they describe how adversaries compromise digital infrastructure, escalate privileges, and exfiltrate data. In contrast, cybercrime, particularly scams, fraud, impersonation, and extortion, primarily targets individual decision-making processes (Louderback and Antonaccio, 2017), often without exploiting any software vulnerability at all. Consequently, the investigative needs of cybercrime differ substantially from those of traditional cyberattacks.


A Taxonomy of Pix Fraud in Brazil: Attack Methodologies, AI-Driven Amplification, and Defensive Strategies

Pizzolato, Glener Lanes, Lopes, Brenda Medeiros, Schepke, Claudio, Kreutz, Diego

arXiv.org Artificial Intelligence

This work presents a review of attack methodologies targeting Pix, the instant payment system launched by the Central Bank of Brazil in 2020. The study aims to identify and classify the main types of fraud affecting users and financial institutions, highlighting the evolution and increasing sophistication of these techniques. The methodology combines a structured literature review with exploratory interviews conducted with professionals from the banking sector. The results show that fraud schemes have evolved from purely social engineering approaches to hybrid strategies that integrate human manipulation with technical exploitation. The study concludes that security measures must advance at the same pace as the growing complexity of attack methodologies, with particular emphasis on adaptive defenses and continuous user awareness.


Data Flows and Colonial Regimes in Africa: A Critical Analysis of the Colonial Futurities Embedded in AI Ecosystems

A, Ndaka., F, Avila-Acosta., H, Mbula-Ndaka., C, Amera., S, Chauke., E, Majiwa.

arXiv.org Artificial Intelligence

Data Flows and Colonial Regimes in Africa: A Critical Analysis of the Colonial Futurities Embedded in AI Recommendation Algorithms Angella Ndaka, University of Witwatersrand, Johannesburg, South Africa Fátima Ávila - Acosta, Berlin Graduate School of Social Sciences at Humboldt University, Berlin, Germany Harnred Mbula, Centre for Epistemic Justice, Nairobi, Kenya Christine Amera, Centre for Epistemic Justice, Nairobi Kenya Sandra Tiyani Chauke University of Pretoria, South Africa Eucabeth Majiwa Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya Abstract In the last few years, Africa has experienced growth in a thriving ecosystem of Artificial Intelligence (AI) technologies and systems, developed and promoted by both local and global technology players. While the sociotechnical imaginaries about these syst ems promote AI as critical to achiev ing Africa's sustainable development agenda, some of them have subtly permeated society, recreating new values, cultures, practices, and histories that threaten to marginalize minority groups in the region. Africa predominantly frames AI as an imaginary solution to address complex social challenges; however, the narrative subtly ignores deeper power - related concerns, including data governance, embedded algorithmic colonialism, and the exploitation that propag ates new digital colonial sites. However, the development of current AI ethics in Africa is in its infancy and predominantly framed through lenses of Western perspective, with the social and ethical impacts of the AI innovations and application on African epistemologies and worldviews not prioritized. To ensure that people on the African continent leverage the benefits of AI, these social and ethical impacts o f AI need to be critically and explicitly considered and addressed. This chapter will therefore seek to frame the elemental and invisible problems of AI and big data in the African context by examining digital sites and infrastructure through the lens of power and interests. It will present reflections on how these sites are using AI recommendation algorithms to recreate new digital societies in the region, how they have the potential to propagate algorithmic colonialism and negative gender norms, and what this means for the regional sustainable development agenda. The chapter proposes adopting business models that embrace response - ability and consider the existence of alternative socio - material worlds of AI. These reflections will mainly come from ongoing discussions with Kenyan social media users in this author's user space talks, which take place every month. Keywords: Artificial Intelligence; algorithmic colonialism; Data; response - ability; digital sites Section 1: Introduction The growing global interest, combined with rising investments in AI skilling and infrastructure development, is a key driver of the expanding landscape of AI technologies and systems across Africa.


HALO: High-Altitude Language-Conditioned Monocular Aerial Exploration and Navigation

Tao, Yuezhan, Ong, Dexter, Cladera, Fernando, Hughes, Jason, Taylor, Camillo J., Chaudhari, Pratik, Kumar, Vijay

arXiv.org Artificial Intelligence

Abstract-- We demonstrate real-time high-altitude aerial metric-semantic mapping and exploration using a monocular camera paired with a global positioning system (GPS) and an inertial measurement unit (IMU). Our system, named HALO, addresses two key challenges: (i) real-time dense 3D reconstruction using vision at large distances, and (ii) mapping and exploration of large-scale outdoor environments with accurate scene geometry and semantics. We demonstrate that HALO can plan informative paths that exploit this information to complete missions with multiple tasks specified in natural language. We use real-world experiments on a custom quadrotor platform to demonstrate that (i) all modules can run onboard the robot, and that (ii) in diverse environments HALO can support effective autonomous execution of missions covering up to 24,600 sq. Experiment videos and more details can be found on our project page: https://tyuezhan.github. Aerial robots operating at high altitudes have a large effective field-of-view, this can be used very effectively for mapping and exploration. However, high-altitude aerial operations present some unusual challenges in perception. For example, consumer-grade LiDARs provide accurate depth but the point density at large distances is low. LiDARs are also expensive, heavy and do not provide the same richness of information as cameras. Vision-based systems are also more attractive because they are inexpensive and lightweight.


On Explore-Then-Commit Strategies

Neural Information Processing Systems

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known.