Fairness is a highly subjective concept and is not different when comes to machine learning. We typically feels that the referees are "unfair" to our favorite team when they lose a close match or that any outcome is extremely "fair" when it goes our way. Given that machine learning models cannot rely on subjectivity, we need an efficient way to quantify fairness. A lot of research has been done in this area mostly framing fairness as an outcome optimization problem. Recently, Google AI research open sourced the Tensor Flow Constrained Optimization Library(TFCO), an optimization framework that can be used for optimizing different objectives of a machine learning model including fairness.
Defining the roadmaps for Artificial Intelligence applications for railway operations and network management Applications are invited for a PhD studentship in innovative approaches in artificial intelligence for railway scheduling and operations, to be based in Institute for Transport Studies at University of Leeds. The position is an opportunity to combine cutting-edge research at the intersection of railway scheduling and artificial intelligence techniques such as machine learning, neural networks. The overall objective of the PhD research project is to investigate the potential of Artificial Intelligence (AI) in the rail sector and contribute to the definition of roadmaps for future research in operational intelligence and network management. In particular, the student will develop and compare different AI approaches, e.g. machine learning, deep and reinforcement learning, for railway traffic planning and management. He or she will have a chance to investigate using AI for solving combinatorial optimization problems, AI for supporting optimization models, with special focus on the optimization models for railway operations and management.
Most product-development tasks are complex optimization problems. Design teams approach them iteratively, refining an initial best guess through rounds of engineering analysis, interpretation, and refinement. But each such iteration takes time and money, and teams may achieve only a handful of iterations within the development timeline. Because teams rarely have the opportunity to explore alternative solutions that depart significantly from their base-case assumptions, too often the final design is suboptimal. Today's technology offers an alternative.
Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel, model-free, policy search algorithm, POIS, applicable in both action-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation; then we define a surrogate objective function, which is optimized offline whenever a new batch of trajectories is collected.
Deep learning presents notorious computational challenges. These challenges include, but are not limited to, the non-convexity of learning objectives and estimating the quantities needed for optimization algorithms, such as gradients. While we do not address the non-convexity, we present an optimization solution that ex- ploits the so far unused "geometry" in the objective function in order to best make use of the estimated gradients. Previous work attempted similar goals with preconditioned methods in the Euclidean space, such as L-BFGS, RMSprop, and ADA-grad. In stark contrast, our approach combines a non-Euclidean gradient method with preconditioning.
We present a new algorithmic approach for the task of finding a chordal Markov network structure that maximizes a given scoring function. The algorithm is based on branch and bound and integrates dynamic programming for both domain pruning and for obtaining strong bounds for search-space pruning. Empirically, we show that the approach dominates in terms of running times a recent integer programming approach (and thereby also a recent constraint optimization approach) for the problem. Papers published at the Neural Information Processing Systems Conference.
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). OLP uses its experience so far to estimate the MDP. It chooses actions by optimistically maximizing estimated future rewards over a set of next-state transition probabilities that are close to the estimates: a computation that corresponds to solving linear programs. We show that the total expected reward obtained by OLP up to time $T$ is within $C(P)\log T$ of the reward obtained by the optimal policy, where $C(P)$ is an explicit, MDP-dependent constant. OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities and the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm.
The control of high-dimensional, continuous, non-linear systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques such as Differential Dynamic Programming (DDP) are not directly subject to the curse of dimensionality, but generate only local controllers. In this paper, we introduce Receding Horizon DDP (RH-DDP), an extension to the classic DDP algorithm, which allows us to construct stable and robust controllers based on a library of local-control trajectories. We demonstrate the effectiveness of our approach on a series of high-dimensional control problems using a simulated multi-link swimming robot. These experiments show that our approach effectively circumvents dimensionality issues, and is capable of dealing effectively with problems with (at least) 34 state and 14 action dimensions.