ab testing
Reinforcement Learning from Statistical Feedback: the Journey from AB Testing to ANT Testing
Han, Feiyang, Wei, Yimin, Liu, Zhaofeng, Qi, Yanxing
Reinforcement Learning from Human Feedback (RLHF) has played a crucial role in the success of large models such as ChatGPT. RLHF is a reinforcement learning framework which combines human feedback to improve learning effectiveness and performance. However, obtaining preferences feedback manually is quite expensive in commercial applications. Some statistical commercial indicators are usually more valuable and always ignored in RLHF. There exists a gap between commercial target and model training. In our research, we will attempt to fill this gap with statistical business feedback instead of human feedback, using AB testing which is a well-established statistical method. Reinforcement Learning from Statistical Feedback (RLSF) based on AB testing is proposed. Statistical inference methods are used to obtain preferences for training the reward network, which fine-tunes the pre-trained model in reinforcement learning framework, achieving greater business value. Furthermore, we extend AB testing with double selections at a single time-point to ANT testing with multiple selections at different feedback time points. Moreover, we design numerical experiences to validate the effectiveness of our algorithm framework.
Rapid and Scalable Bayesian AB Testing
Chennu, Srivas, Maher, Andrew, Pangerl, Christian, Prabanantham, Subash, Bae, Jae Hyeon, Martin, Jamie, Goswami, Bud
AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical power in multivariate designs with many factors, correlations between these factors, the need of sequential testing for early stopping, and the inability to pool knowledge from past tests. Here, we propose a solution that applies hierarchical Bayesian estimation to address the above limitations. In comparison to current sequential AB testing methodology, we increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stopping, without incurring excessive false positive risk. We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests, to accelerate future tests. We underpin our work with a solid theoretical framework that articulates the value of hierarchical estimation. We demonstrate its utility using both numerical simulations and a large set of real-world AB tests. Together, these results highlight the practical value of our approach for statistical inference in the technology industry.
Microsoft is teaching computers to understand cause and effect
AI that analyzes data to help you make decisions is set to be an increasingly big part of business tools, and the systems that do that are getting smarter with a new approach to decision optimization that Microsoft is starting to make available. Machine learning is great at extracting patterns out of large amounts of data but not necessarily good at understanding those patterns, especially in terms of what causes them. A machine learning system might learn that people buy more ice cream in hot weather, but without a common sense understanding of the world, it's just as likely to suggest that if you want the weather to get warmer then you should buy more ice cream. Understanding why things happen helps humans make better decisions, like a doctor picking the best treatment or a business team looking at the results of AB testing to decide which price and packaging will sell more products. There are machine learning systems that deal with causality, but so far this has mostly been restricted to research that focuses on small-scale problems rather than practical, real-world systems because it's been hard to do. Deep learning, which is widely used for machine learning, needs a lot of training data, but humans can gather information and draw conclusions much more efficiently by asking questions, like a doctor asking about your symptoms, a teacher giving students a quiz, a financial advisor understanding whether a low risk or high risk investment is best for you, or a salesperson getting you to talk about what you need from a new car.
Antoine Blondeau's Sentient Technologies: AI For The Unknown Unknowns
Sentient also released a new solution to AB testing called Sentient Ascend. Right now, the conversion rate optimization, CRO, industry, is based on AB testing, where you test a new design against an old design. "We decided we could transform this industry by completely dissolving the concept of AB testing, by thinking of the website no longer being a static property, but a dynamic property that is always evolving based on the way it interacts with your audience. We wanted to enable the marketer to make changes to quickly see market and conversion rate improvements. Our solution points to the fallacy of AB testing, where you come with a defined preconception of what to test against. We want to open the floodgates and show people that they don't have to limit themselves to a test. The number of possible combinations is so large that it's impossible to explore the surface exhaustively or comprehensively, except with an intelligent system that learns from every interaction, and understands from that learning what matters, what doesn't, and progressively builds solutions," says Blondeau.
20 lines of code that will beat A/B testing every time
There's no "Like" button, but if there was, consider it pressed. I've been working on a few algorthims for cross item comparision, where if click me is blue and banner is X size and the different combinations I want to test, So basically creating test sets over individual items which proves more coherent with web design. So I am more testing which random style sheet / page design still randomly provided and measured seems to get the most time, clicks, navigation, etc and pull all of those factors in for a scorecard I know seems like a bunch of work but once you have the scripts it works for any page / site you build from then on and having it automated saves so much time down the road, and talk about great stats to provide to your clients.