Goto

Collaborating Authors

 Europe








Meta in row after sacking workers who say they saw smart glasses users having sex

BBC News

Meta is under pressure to explain why it cancelled a major contract with a company it was using to train AI, shortly after some of its Kenya-based workers alleged they had to view graphic content captured by Meta smart glasses. In February, workers at the company, Sama, told two Swedish newspapers they had witnessed glasses users going to the toilet and having sex . Less than two months later, Meta ended its contract with Sama, which Sama said would result in 1,108 workers being made redundant. Meta says it's because Sama did not meet its standards, a criticism Sama rejects. A Kenyan workers' organisation alleges Meta's decision was caused by the staff speaking out.


Debiasing Conditional Stochastic Optimization

Neural Information Processing Systems

In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than the existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems.


Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Neural Information Processing Systems

As trained intelligent systems become increasingly pervasive, multi-agent learning has emerged as a popular framework for studying complex interactions between autonomous agents. Yet, a formal understanding of how and when learners in heterogeneous environments benefit from sharing their respective experiences is still in its infancy. In this paper, we seek answers to these questions in the context of linear contextual bandits. We present a novel distributed learning algorithm based on the upper confidence bound (UCB) algorithm, which we refer to as H-LINUCB, wherein agents cooperatively minimize the group regret under the coordination of a central server. In the setting where the level of heterogeneity or dissimilarity across the environments is known to the agents, we show that H-LINUCB is provably optimal in regimes where the tasks are highly similar or highly dissimilar.