Industry
Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We give theoretical guarantees for the performance of this algorithm over various classes of functions, and discuss the optimality of these results.
Deep Reinforcement Learning from Human Preferences
For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. Our approach separates learning the goal from learning the behavior to achieve it. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on about 0.1% of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any which have been previously learned from human feedback.
Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin
The past decade has seen a revolution in genomic technologies that enabled a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together. Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long Short-Term Memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map.
Dynamic Revenue Sharing
Many online platforms act as intermediaries between a seller and a set of buyers. Examples of such settings include online retailers (such as Ebay) selling items on behalf of sellers to buyers, or advertising exchanges (such as AdX) selling pageviews on behalf of publishers to advertisers. In such settings, revenue sharing is a central part of running such a marketplace for the intermediary, and fixed-percentage revenue sharing schemes are often used to split the revenue among the platform and the sellers. In particular, such revenue sharing schemes require the platform to (i) take at most a constant fraction \alpha of the revenue from auctions and (ii) pay the seller at least the seller declared opportunity cost c for each item sold. A straightforward way to satisfy the constraints is to set a reserve price at c / (1 - \alpha) for each item, but it is not the optimal solution on maximizing the profit of the intermediary.
Regularized Modal Regression with Applications in Cognitive Impairment Prediction
Linear regression models have been successfully used to function estimation and model selection in high-dimensional data analysis. However, most existing methods are built on least squares with the mean square error (MSE) criterion, which are sensitive to outliers and their performance may be degraded for heavy-tailed noise. In this paper, we go beyond this criterion by investigating the regularized modal regression from a statistical learning viewpoint. A new regularized modal regression model is proposed for estimation and variable selection, which is robust to outliers, heavy-tailed noise, and skewed noise. On the theoretical side, we establish the approximation estimate for learning the conditional mode function, the sparsity analysis for variable selection, and the robustness characterization. On the application side, we applied our model to successfully improve the cognitive impairment prediction using the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort data.
Conservative Contextual Linear Bandits
Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields including personalized ad recommendation in online marketing. We formulate a notion of safety for this class of algorithms. We develop a safe contextual linear bandit algorithm, called conservative linear UCB (CLUCB), that simultaneously minimizes its regret and satisfies the safety constraint, i.e., maintains its performance above a fixed percentage of the performance of a baseline strategy, uniformly over time. We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant term that accounts for the loss of being conservative in order to satisfy the safety constraint. We empirically show that our algorithm is safe and validate our theoretical analysis.
A meteor exploded over Ohio and Pennsylvania
A very loud bang accompanied the disintegrating space rock. Although loud, little of the meteor is expected to have survived the atmospheric entry. Breakthroughs, discoveries, and DIY tips sent six days a week. Residents across northeastern Ohio received a rude--or at least extremely unexpected--wake-up call this morning. According to the National Weather Service (NWS), the loud boom experienced across the region around 9 a.m. EDT on March 17 was most likely the result of a meteor disintegrating as it sped through Earth's atmosphere.
- North America > United States > Ohio (0.63)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- Europe > Russia (0.05)
- (2 more...)
Watch: Iranians show daily life under air strikes and regime crackdown
The BBC has obtained footage and interviews from the Iranian capital Tehran which evoke a city of strained nerves, of constant waiting for the next air strike and relentless fear of the state security apparatus. The identities of the people in this report have been protected. While independent journalists still try to gather testimony that offers a credible alternative view, they run the risk of arrest, torture and possibly worse. Displaced Palestinians were told to secure their tents to prevent them being blown away as a storm swept through the enclave. Video filmed by a witness and verified by the BBC shows a drone crashing close to the airport.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.28)
- North America > Central America (0.15)
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.08)
- (21 more...)
- Leisure & Entertainment (1.00)
- Government > Military (1.00)
- Transportation > Infrastructure & Services > Airport (0.35)
GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users
GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users The new model offers performance improvements in reasoning, multimodal understanding and more. The ChatGPT icon, as seen on iPhone 12 running iOS. When OpenAI released GPT-5.4 at the start of March, the company said the new model was designed primarily for professional work like programming and data analysis. Now OpenAI is launching GPT-5.4 mini and nano, and while it is once again highlighting the usefulness of these new systems for tasks like coding, one of the new models is available to Free and Go users . What's more, that model, GPT-5.4 mini, even offers performance that approaches GPT-5.4 in a handful of areas.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)
Dyson's New PencilWash Is Here
Dyson's Newest Wet Floor Cleaner Is Available as of Today The debut follows the release of Dyson's newest robot vacuum and larger wet cleaner last week. Welcome to a new world of mopping options from Dyson. After announcing several new models last year at IFA Berlin, Dyson has begun rolling out its latest suite of vacuums and wet floor cleaners to the public. Last week, Dyson's newest robot vacuum, the Spot+Scrub Ai ($1,200), became available for purchase online, along with the Clean+Wash Hygiene ($500), one of the brand's new wet floor cleaners. The recently announced Dyson PencilWash ($350) is available as of today.
- North America > United States > California (0.05)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
- Retail (0.36)
- Information Technology > Services (0.31)