AITopics | Learning Management

Collaborating Authors

Learning Management

News Overviews Instructional Materials AI-Alerts Classics

Optimization, Learning, and Games with Predictable Sequences

Neural Information Processing SystemsMar-13-2024, 23:18:25 GMT

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Hölder-smooth functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T) T).

algorithm, optimization, sequence, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Add feedback

Nearly Optimal Algorithms for Private Online Learning in Full information and Bandit Settings

Neural Information Processing SystemsMar-13-2024, 20:16:45 GMT

We give differentially private algorithms for a large class of online learning algorithms, in both the full information and bandit settings. Our algorithms aim to minimize a convex loss function which is a sum of smaller convex loss terms, one for each data point. To design our algorithms, we modify the popular mirror descent approach, or rather a variant called follow the approximate leader. The technique leads to the first nonprivate algorithms for private online learning in the bandit setting. In the full information setting, our algorithms improve over the regret bounds of previous work (due to Dwork, Naor, Pitassi and Rothblum (2010) and Jain, Kothari and Thakurta (2012)). In many cases, our algorithms (in both settings) match the dependence on the input length, T, of the optimal nonprivate regret bounds up to logarithmic factors in T. Our algorithms require logarithmic space and update time.

algorithm, approximate leader, privacy, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.47)

Industry:

Education > Educational Setting > Online (0.83)
Information Technology (0.68)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Online Learning with Switching Costs and Other Adaptive Adversaries

Neural Information Processing SystemsMar-13-2024, 19:07:38 GMT

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize --in a nearly complete manner-- the power of adaptive adversaries with bounded memories and switching costs.

adversary, bandit feedback, oblivious adversary, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.47)

Industry: Education > Educational Setting > Online (0.50)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.50)
Information Technology > Artificial Intelligence > Machine Learning (0.47)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Adaptive Market Making via Online Learning

Neural Information Processing SystemsMar-13-2024, 18:52:43 GMT

We consider the design of strategies for market making in an exchange. A market maker generally seeks to profit from the difference between the buy and sell price of an asset, yet the market maker also takes exposure risk in the event of large price movements. Profit guarantees for market making strategies have typically required certain stochastic assumptions on the price fluctuations of the asset in question; for example, assuming a model in which the price process is mean reverting. We propose a class of "spread-based" market making strategies whose performance can be controlled even under worst-case (adversarial) settings. We prove structural properties of these strategies which allows us to design a master algorithm which obtains low regret relative to the best such strategy in hindsight. We run a set of experiments showing favorable performance on recent real-world stock price data.

algorithm, limit order, trader, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Michigan (0.04)

Industry:

Banking & Finance > Trading (1.00)
Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.41)

Add feedback

Dimension-Free Exponentiated Gradient

Neural Information Processing SystemsMar-13-2024, 17:23:58 GMT

I present a new online learning algorithm that extends the exponentiated gradient framework to infinite dimensional spaces.

algorithm, exp, regularizer, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.37)

Add feedback

The Pareto Regret Frontier

Neural Information Processing SystemsMar-13-2024, 17:23:07 GMT

Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how.

absolute loss, algorithm, trade-off, (15 more...)

Neural Information Processing Systems

Country:

Europe > Poland (0.04)
Oceania > Australia > Queensland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Industry: Education > Educational Setting > Online (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.35)

Add feedback

Online Learning in Episodic Markovian Decision Processes by Relative Entropy Policy Search

Neural Information Processing SystemsMar-13-2024, 17:03:16 GMT

We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 L|X ||A|T log(|X ||A|/L) in the bandit setting and 2L T log(|X ||A|/L) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions.

algorithm, decision process, learner, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Hungary > Budapest > Budapest (0.04)
(3 more...)

Industry: Education > Educational Setting > Online (0.72)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback

Online Learning of Dynamic Parameters in Social Networks Alexander Rakhlin

Neural Information Processing SystemsMar-13-2024, 17:02:10 GMT

This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a finite time.

agent, estimator, msd, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:

Information Technology > Services (0.61)
Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
(2 more...)

Add feedback

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

Neural Information Processing SystemsMar-13-2024, 16:30:24 GMT

We study the problem of online learning Markov Decision Processes (MDPs) when both the transition distributions and loss functions are chosen by an adversary. We present an algorithm that, under a mixing assumption, achieves O( T log |Π| + log |Π|) regret with respect to a comparison set of policies Π. The regret is independent of the size of the state and action spaces. When expectations over sample paths can be computed efficiently and the comparison set Π has polynomial size, this algorithm is efficient. We also consider the episodic adversarial online shortest path problem.

algorithm, loss function, shortest path problem, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Oceania > Australia > Queensland (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.62)

Add feedback

Online Learning with Costly Features and Labels

Neural Information Processing SystemsMar-13-2024, 15:22:49 GMT

This paper introduces the online probing problem: In each round, the learner is able to purchase the values of a subset of feature values. After the learner uses this information to come up with a prediction for the given round, he then has the option of paying to see the loss function that he is evaluated against. Either way, the learner pays for both the errors of his predictions and also whatever he chooses to observe, including the cost of observing the loss function for the given round and the cost of the observed features. We consider two variations of this problem, depending on whether the learner can observe the label for free or not. We provide algorithms and upper and lower bounds on the regret for both variants. We show that a positive cost for observing the label significantly increases the regret of the problem.

algorithm, learner, predictor, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.15)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Industry: Education > Educational Setting > Online (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.52)

Add feedback