Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking
Zhang, Qing, Deng, Alex, Du, Michelle, Gao, Huiji, He, Liwei, Katariya, Sanjeev
–arXiv.org Artificial Intelligence
Evaluation plays a crucial role in the development of ranking algorithms on search and recommender systems. It enables online platforms to create user-friendly features that drive commercial success in a steady and effective manner. The online environment is particularly conducive to applying causal inference techniques, such as randomized controlled experiments (known as A/B test), which are often more challenging to implement in fields like medicine and public policy. However, businesses face unique challenges when it comes to effective A/B test. Specifically, achieving sufficient statistical power for conversion-based metrics can be time-consuming, especially for significant purchases like booking accommodations. While offline evaluations are quicker and more cost-effective, they often lack accuracy and are inadequate for selecting candidates for A/B test. To address these challenges, we developed interleaving and counterfactual evaluation methods to facilitate rapid online assessments for identifying the most promising candidates for A/B tests. Our approach not only increased the sensitivity of experiments by a factor of up to 100 (depending on the approach and metrics) compared to traditional A/B testing but also streamlined the experimental process. The practical insights gained from usage in production can also benefit organizations with similar interests.
arXiv.org Artificial Intelligence
Aug-4-2025
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > Ontario
- Toronto (0.05)
- United States
- California > San Francisco County
- San Francisco (0.14)
- New York > New York County
- New York City (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Francisco County
- Canada > Ontario
- Europe > United Kingdom
- Genre:
- Research Report
- Experimental Study (1.00)
- Strength High (1.00)
- Research Report
- Industry:
- Consumer Products & Services > Hotels (0.43)
- Technology: