AITopics | Cohen, Alon

Collaborating Authors

Cohen, Alon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Average reward reinforcement learning with unknown mixing times

Zahavy, Tom, Cohen, Alon, Kaplan, Haim, Mansour, Yishay

arXiv.org Machine LearningMay-23-2019

We derive and analyze learning algorithms for policy evaluation, apprenticeship learning, and policy gradient for average reward criteria. Existing algorithms explicitly require an upper bound on the mixing time. In contrast, we build on ideas from Markov chain theory and derive sampling algorithms that do not require such an upper bound. For these algorithms, we provide theoretical bounds on their sample-complexity and running time.

artificial intelligence, reinforcement learning, stationary distribution, (19 more...)

arXiv.org Machine Learning

1905.09704

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Massachusetts > Middlesex County (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret

Cohen, Alon, Koren, Tomer, Mansour, Yishay

arXiv.org Machine LearningFeb-17-2019

Optimal control theory dates back to the 1950s, and has been applied successfully to numerous real-worldengineering problems (e.g., Bermúdez and Martinez, 1994; Chen and Islam, 2005; Lenhart and Workman, 2007; Geering, 2007). Classical results in control theory pertain to asymptotic convergenceand stability of dynamical systems, and recently, there has been a renewed interest in such problems from a learning-theoretic perspective with a focus on finite-time convergence guarantees andcomputational tractability. Perhaps the most well-studied model in optimal control is Linear-Quadratic (LQ) control. In this model, both the state and the action are real-valued vectors. The dynamics of the environment are linear in both the state and action, and are perturbed by Gaussian noise; the cost is quadratic in the state and action vectors.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Machine Learning

1902.06223

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Learning and Generalization for Matching Problems

Cohen, Alon, Hassidim, Avinatan, Kaplan, Haim, Mansour, Yishay, Moran, Shay

arXiv.org Machine LearningFeb-13-2019

We study a classic algorithmic problem through the lens of statistical learning. That is, we consider a matching problem where the input graph is sampled from some distribution. This distribution is unknown to the algorithm; however, an additional graph which is sampled from the same distribution is given during a training phase (preprocessing). More specifically, the algorithmic problem is to match $k$ out of $n$ items that arrive online to $d$ categories ($d\ll k \ll n$). Our goal is to design a two-stage online algorithm that retains a small subset of items in the first stage which contains an offline matching of maximum weight. We then compute this optimal matching in a second stage. The added statistical component is that before the online matching process begins, our algorithms learn from a training set consisting of another matching instance drawn from the same unknown distribution. Using this training set, we learn a policy that we apply during the online matching process. We consider a class of online policies that we term \emph{thresholds policies}. For this class, we derive uniform convergence results both for the number of retained items and the value of the optimal matching. We show that the number of retained items and the value of the offline optimal matching deviate from their expectation by $O(\sqrt{k})$. This requires usage of less-standard concentration inequalities (standard ones give deviations of $O(\sqrt{n})$). Furthermore, we design an algorithm that outputs the optimal offline solution with high probability while retaining only $O(k\log \log n)$ items in expectation.

artificial intelligence, evolutionary algorithm, optimal solution, (16 more...)

arXiv.org Machine Learning

1902.04741

Country:

Oceania > Australia (0.14)
Europe > United Kingdom > England (0.14)
Europe > Germany (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Incentivizing the Dynamic Workforce: Learning Contracts in the Gig-Economy

Cohen, Alon, Koren, Moran, Deligkas, Argyrios

arXiv.org Machine LearningNov-16-2018

In principal-agent models, a principal offers a contract to an agent to perform a certain task. The agent exerts a level of effort that maximizes her utility. The principal is oblivious to the agent's chosen level of effort, and conditions her wage only on possible outcomes. In this work, we consider a model in which the principal is unaware of the agent's utility and action space. She sequentially offers contracts to identical agents, and observes the resulting outcomes. We present an algorithm for learning the optimal contract under mild assumptions. We bound the number of samples needed for the principal obtain a contract that is within $\epsilon$ of her optimal net profit for every $\epsilon>0$.

artificial intelligence, big data, contract, (19 more...)

arXiv.org Machine Learning

1811.06736

Country: Europe (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Corporate Training (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Online Linear Quadratic Control

Cohen, Alon, Hassidim, Avinatan, Koren, Tomer, Lazic, Nevena, Mansour, Yishay, Talwar, Kunal

arXiv.org Machine LearningJun-19-2018

We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Crucially, and in contrast to previously proposed relaxations, the feasible solutions of our SDP all correspond to "strongly stable" policies that mix exponentially fast to a steady state.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1806.07104

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Planning and Learning with Stochastic Action Sets

Boutilier, Craig, Cohen, Alon, Daniely, Amit, Hassidim, Avinatan, Mansour, Yishay, Meshi, Ofer, Mladenov, Martin, Schuurmans, Dale

arXiv.org Artificial IntelligenceMay-7-2018

In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundations. We show that optimal policies and value functions in this model have a structure that admits a compact representation. From an RL perspective, we show that Q-learning with sampled action sets is sound. In model-based settings, we consider two important special cases: when individual actions are available with independent probabilities; and a sampling-based model for unknown distributions. We develop poly-time value and policy iteration methods for both cases; and in the first, we offer a poly-time linear programming solution.

artificial intelligence, mdp, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1805.02363

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Online Learning with Feedback Graphs Without the Graphs

Cohen, Alon, Hazan, Tamir, Koren, Tomer

arXiv.org Machine LearningMay-23-2016

We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emph{never fully revealed} to the learner. We show a large gap between the adversarial and the stochastic cases. In the adversarial case, we prove that even for dense feedback graphs, the learner cannot improve upon a trivial regret bound obtained by ignoring any additional feedback besides her own loss. In contrast, in the stochastic case we give an algorithm that achieves $\widetilde \Theta(\sqrt{\alpha T})$ regret over $T$ rounds, provided that the independence numbers of the hidden feedback graphs are at most $\alpha$. We also extend our results to a more general feedback model, in which the learner does not necessarily observe her own loss, and show that, even in simple cases, concealing the feedback graphs might render a learnable problem unlearnable.

computer based training, educational technology, graph, (23 more...)

arXiv.org Machine Learning

1605.07018

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.62)

Add feedback