AITopics

Li, Yanjun, Bresler, Yoram

Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere

Multichannel blind deconvolution is the problem of recovering an unknown signal $f$ and multiple unknown channels $x_i$ from convolutional measurements $y_i=x_i \circledast f$ ($i=1,2,\dots,N$). We consider the case where the $x_i$'s are sparse, and convolution with $f$ is invertible. Our nonconvex optimization formulation solves for a filter $h$ on the unit sphere that produces sparse output $y_i\circledast h$. Under some technical assumptions, we show that all local minima of the objective function correspond to the inverse filter of $f$ up to an inherent sign and shift ambiguity, and all saddle points have strictly negative curvatures. This geometric structure allows successful recovery of $f$ and $x_i$ using a simple manifold gradient descent algorithm with random initialization. Our theoretical findings are complemented by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods.

deconvolution, optimization problem, upstream oil & gas, (17 more...)

Country:

North America > United States > Illinois (0.14)
North America > Canada (0.14)

Industry:

Health & Medicine (0.94)
Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.38)

WARLOP, Romain, Lazaric, Alessandro, Mary, Jérémie

Fighting Boredom in Recommender Systems with Linear Reinforcement Learning

A common assumption in recommender systems (RS) is the existence of a best fixed recommendation strategy. Such strategy may be simple and work at the item level (e.g., in multi-armed bandit it is assumed one best fixed arm/item exists) or implement more sophisticated RS (e.g., the objective of A/B testing is to find the best fixed RS and execute it thereafter). We argue that this assumption is rarely verified in practice, as the recommendation process itself may impact the user’s preferences. For instance, a user may get bored by a strategy, while she may gain interest again, if enough time passed since the last time that strategy was used. In this case, a better approach consists in alternating different solutions at the right frequency to fully exploit their potential. In this paper, we first cast the problem as a Markov decision process, where the rewards are a linear function of the recent history of actions, and we show that a policy considering the long-term influence of the recommendations may outperform both fixed-action and contextual greedy policies. We then introduce an extension of the UCRL algorithm ( L IN UCRL ) to effectively balance exploration and exploitation in an unknown environment, and we derive a regret bound that is independent of the number of states. Finally, we empirically validate the model assumptions and the algorithm in a number of realistic scenarios.

artificial intelligence, optimal policy, upstream oil & gas, (21 more...)

Country:

North America > United States (0.29)
Europe > France (0.28)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Tukuljac, Helena Peic, Deleforge, Antoine, Gribonval, Remi

MULAN: A Blind and Off-Grid Method for Multichannel Echo Retrieval

This paper addresses the general problem of blind echo retrieval, i.e., given M sensors measuring in the discrete-time domain M mixtures of K delayed and attenuated copies of an unknown source signal, can the echo location and weights be recovered? This problem has broad applications in fields such as sonars, seismology, ultrasounds or room acoustics. It belongs to the broader class of blind channel identification problems, which have been intensively studied in signal processing. All existing methods proceed in two steps: (i) blind estimation of sparse discrete-time filters and (ii) echo information retrieval by peak picking. The precision of these methods is fundamentally limited by the rate at which the signals are sampled: estimated echo locations are necessary on-grid, and since true locations never match the sampling grid, the weight estimation precision is also strongly limited. This is the so-called basis-mismatch problem in compressed sensing. We propose a radically different approach to the problem, building on top of the framework of finite-rate-of-innovation sampling. The approach operates directly in the parameter-space of echo locations and weights, and enables near-exact blind and off-grid echo retrieval from discrete-time measurements. It is shown to outperform conventional methods by several orders of magnitudes in precision.

artificial intelligence, signal processing, upstream oil & gas, (17 more...)

Country:

Europe > France (0.28)
North America > Canada (0.14)
North America > United States (0.14)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Fruit, Ronan, Pirotta, Matteo, Lazaric, Alessandro

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e.g., in mountain car, the product space of speed and position contains configurations that are not physically reachable). This results in weakly-communicating or multi-chain MDPs. In this paper, we introduce TUCRL, the first algorithm able to perform efficient exploration-exploitation in any finite Markov Decision Process (MDP) without requiring any form of prior knowledge. In particular, for any MDP with $S^c$ communicating states, $A$ actions and $\Gamma^c \leq S^c$ possible communicating next states, we derive a $O(D^c \sqrt{\Gamma^c S^c A T}) regret bound, where $D^c$ is the diameter (i.e., the length of the longest shortest path between any two states) of the communicating part of the MDP. This is in contrast with optimistic algorithms (e.g., UCRL, Optimistic PSRL) that suffer linear regret in weakly-communicating MDPs, as well as posterior sampling or regularised algorithms (e.g., REGAL), which require prior knowledge on the bias span of the optimal policy to bias the exploration to achieve sub-linear regret. We also prove that in weakly-communicating MDPs, no algorithm can ever achieve a logarithmic growth of the regret without first suffering a linear regret for a number of steps that is exponential in the parameters of the MDP. Finally, we report numerical simulations supporting our theoretical findings and showing how TUCRL overcomes the limitations of the state-of-the-art.

artificial intelligence, reinforcement learning, upstream oil & gas, (18 more...)

Country:

North America > United States > Virginia (0.14)
North America > Canada (0.14)

Industry: Energy > Oil & Gas > Upstream (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Liu, Tianyi, Li, Shiyang, Shi, Jianping, Zhou, Enlu, Zhao, Tuo

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) have been widely used in distributed machine learning, e.g., training large collaborative filtering systems and deep neural networks. Due to current technical limit, however, establishing convergence properties of Async-MSGD for these highly complicated nonoconvex problems is generally infeasible. Therefore, we propose to analyze the algorithm through a simpler but nontrivial nonconvex problems --- streaming PCA. This allows us to make progress toward understanding Aync-MSGD and gaining new insights for more general problems. Specifically, by exploiting the diffusion approximation of stochastic optimization, we establish the asymptotic rate of convergence of Async-MSGD for streaming PCA. Our results indicate a fundamental tradeoff between asynchrony and momentum: To ensure convergence and acceleration through asynchrony, we have to reduce the momentum (compared with Sync-MSGD). To the best of our knowledge, this is the first theoretical attempt on understanding Async-MSGD for distributed nonconvex stochastic optimization. Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD.

convergence, deep learning, upstream oil & gas, (18 more...)

Country:

North America > United States (0.14)
North America > Canada (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Data center cooling using model-predictive control

Lazic, Nevena, Boutilier, Craig, Lu, Tyler, Wong, Eehern, Roy, Binz, Ryu, MK, Imwalle, Greg

Despite the impressive recent advances in reinforcement learning (RL) algorithms, their deployment to real-world physical systems is often complicated by unexpected events, limited data, and the potential for expensive failures. In this paper, we describe an application of RL "in the wild" to the task of regulating temperatures and airflow inside a large-scale data center (DC). Adopting a data-driven, modelbased approach,we demonstrate that an RL agent with little prior knowledge is able to effectively and safely regulate conditions on a server floor after just a few hours of exploration, while improving operational efficiency relative to existing PID controllers.

controller, survey article, upstream oil & gas, (18 more...)

Country:

North America > United States (0.28)
North America > Canada (0.14)

Industry:

Information Technology > Services (0.86)
Energy > Oil & Gas > Upstream (0.65)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Fromm, Joshua, Patel, Shwetak, Philipose, Matthai

Heterogeneous Bitwidth Binarization in Convolutional Neural Networks

Recent work has shown that fast, compact low-bitwidth neural networks can be surprisingly accurate. These networks use homogeneous binarization: all parameters in each layer or (more commonly) the whole model have the same low bitwidth (e.g., 2 bits). However, modern hardware allows efficient designs where each arithmetic instruction can have a custom bitwidth, motivating heterogeneous binarization, where every parameter in the network may have a different bitwidth. In this paper, we show that it is feasible and useful to select bitwidths at the parameter granularity during training. For instance a heterogeneously quantized version of modern networks such as AlexNet and MobileNet, with the right mix of 1-, 2- and 3-bit parameters that average to just 1.4 bits can equal the accuracy of homogeneous 2-bit versions of these networks. Further, we provide analyses to show that the heterogeneously binarized systems yield FPGA- and ASIC-based implementations that are correspondingly more efficient in both circuit area and energy efficiency than their homogeneous counterparts.

accuracy, deep learning, upstream oil & gas, (19 more...)

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Meta-Reinforcement Learning of Structured Exploration Strategies

Gupta, Abhishek, Mendonca, Russell, Liu, YuXuan, Abbeel, Pieter, Levine, Sergey

Exploration is a fundamental challenge in reinforcement learning (RL). Many current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we study how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm - model agnostic exploration with structured noise (MAESN) - to learn exploration strategies fromprior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.

artificial intelligence, exploration, upstream oil & gas, (18 more...)

Country:

North America > United States > California (0.14)
North America > Canada (0.14)
Europe > Germany (0.14)
Europe > Austria > Vienna (0.14)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Differentiable MPC for End-to-end Planning and Control

Amos, Brandon, Jimenez, Ivan, Sacks, Jacob, Boots, Byron, Kolter, J. Zico

We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.

arxiv preprint arxiv, deep learning, downstream oil & gas, (18 more...)

Country: North America > United States (0.14)

Industry: Energy > Oil & Gas > Downstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)