Goto

Collaborating Authors

 mansour




Reviews: Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Neural Information Processing Systems

Two out of three reviewers appreciated the contributions of this paper, with one expert reviewer praising almost every aspect of the paper. On the negative side, one reviewer took issue with the proposed setting, highlighting that the utility of the proposed objective function is somewhat dubious in the general context of multi-objective decision making. I agree with this reviewer in that having "multi-objective" in the title of the paper may set the wrong expectations for some readers, and I suggest that the authors consider changing the title of the paper for its final version to avoid such misunderstandings. Furthermore, the final version should discuss the relationship between this paper and the very recent work of Rosenberg and Mansour (2019) that studies essentially the same problem in episodic MDPs. Other than these concerns, the paper is worthy of being published without major changes.


Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

Tiapkin, Daniil, Chzhen, Evgenii, Stoltz, Gilles

arXiv.org Artificial Intelligence

In this paper, we consider the problem of learning in adversarial Markov decision processes [MDPs] with an oblivious adversary in a full-information setting. The agent interacts with an environment during $T$ episodes, each of which consists of $H$ stages, and each episode is evaluated with respect to a reward function that will be revealed only at the end of the episode. We propose an algorithm, called APO-MVP, that achieves a regret bound of order $\tilde{\mathcal{O}}(\mathrm{poly}(H)\sqrt{SAT})$, where $S$ and $A$ are sizes of the state and action spaces, respectively. This result improves upon the best-known regret bound by a factor of $\sqrt{S}$, bridging the gap between adversarial and stochastic MDPs, and matching the minimax lower bound $\Omega(\sqrt{H^3SAT})$ as far as the dependencies in $S,A,T$ are concerned. The proposed algorithm and analysis completely avoid the typical tool given by occupancy measures; instead, it performs policy optimization based only on dynamic programming and on a black-box online linear optimization strategy run over estimated advantage functions, making it easy to implement. The analysis leverages two recent techniques: policy optimization based on online linear optimization strategies (Jonckheere et al., 2023) and a refined martingale analysis of the impact on values of estimating transitions kernels (Zhang et al., 2023).


Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

Levy, Orin, Cohen, Alon, Cassel, Asaf, Mansour, Yishay

arXiv.org Artificial Intelligence

We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an $\widetilde{O}(H^{2.5} \sqrt{ T|S||A| ( \mathcal{R}(\mathcal{O}) + H \log(\delta^{-1}) )})$ regret guarantee, with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon and $\mathcal{R}(\mathcal{O}) = \mathcal{R}(\mathcal{O}_{\mathrm{sq}}^\mathcal{F}) + \mathcal{R}(\mathcal{O}_{\mathrm{log}}^\mathcal{P})$ is the sum of the regression oracles' regret, used to approximate the context-dependent rewards and dynamics, respectively. To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.


TomoSAM: a 3D Slicer extension using SAM for tomography segmentation

Semeraro, Federico, Quintart, Alexandre, Izquierdo, Sergio Fraile, Ferguson, Joseph C.

arXiv.org Artificial Intelligence

For this reason, many deep learning techniques have been proposed to perform the segmentation task, but they usually require a large amount of labeled data to train, as well as considerable computational resources. Our work is motivated by the objective of modeling Thermal Protection Systems (TPS), in order to estimate their material properties and predict their response to the harsh environmental conditions experienced during atmospheric entry. The Porous Microstructure Analysis (PuMA) software [1] was developed to provide a robust and efficient framework for the digital characterization of 3D microstructures.


Refined Analysis of FPL for Adversarial Markov Decision Processes

Wang, Yuanhao, Dong, Kefan

arXiv.org Machine Learning

We consider the adversarial Markov Decision Process (MDP) problem, where the rewards for the MDP can be adversarially chosen, and the transition function can be either known or unknown. In both settings, Follow-the-PerturbedLeader (FPL) based algorithms have been proposed in previous literature. However, the established regret bounds for FPL based algorithms are worse than algorithms based on mirrordescent. We improve the analysis of FPL based algorithms in both settings, matching the current best regret bounds using faster and simpler algorithms.


Whistleblower: Google Partners with China on 'AI Manhattan Project' Breitbart

#artificialintelligence

Mansour asked Vorhies about Google's business engagements in China. "Google has gotten in trouble in the past for doing business with the Communist government of China," Mansour said. "And I wanted to ask you were those efforts ongoing when you were with the company? Can you give us any insight into that? Because it was quite troubling."


Unsupervised Domain Adaptation Based on Source-guided Discrepancy

Kuroki, Seiichi, Charoenphakdee, Nontawat, Bao, Han, Honda, Junya, Sato, Issei, Sugiyama, Masashi

arXiv.org Machine Learning

Unsupervised domain adaptation is the problem setting where data generating distributions in the source and target domains are different, and labels in the target domain are unavailable. One important question in unsupervised domain adaptation is how to measure the difference between the source and target domains. A previously proposed discrepancy that does not use the source domain labels requires high computational cost to estimate and may lead to a loose generalization error bound in the target domain. To mitigate these problems, we propose a novel discrepancy called source-guided discrepancy ($S$-disc), which exploits labels in the source domain. As a consequence, $S$-disc can be computed efficiently with a finite sample convergence guarantee. In addition, we show that $S$-disc can provide a tighter generalization error bound than the one based on an existing discrepancy. Finally, we report experimental results that demonstrate the advantages of $S$-disc over the existing discrepancies.


India is using facial-recognition to reunite missing children with their families

#artificialintelligence

In 2005, Danish brothers David and Christopher Mikkelsen met a young Afghan refugee called Mansour. Four months after fleeing Kabul and the Taliban with his parents and five siblings, Mansour became separated from them, ending up in Denmark with no idea what had happened to his family or where they were.