AITopics | criteo

Collaborating Authors

criteo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems Benjamin Coleman

Neural Information Processing SystemsFeb-16-2026, 13:52:53 GMT

Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems.

artificial intelligence, machine learning, unified embedding, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems Benjamin Coleman

Neural Information Processing SystemsOct-9-2025, 04:51:28 GMT

Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems.

artificial intelligence, machine learning, unified embedding, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Appendix for TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets Chengrun Y ang 1, Gabriel Bender

Neural Information Processing SystemsAug-14-2025, 17:48:36 GMT

Due to the high costs involved, many works have proposed different methods to reduce the search cost. The first strategy is to reduce the time needed to evaluate each architecture seen during a search. The second strategy is to reduce the number of architectures we need to evaluate during a search. Resource constraints are prevalent in deep learning. Finding architectures with outstanding performance and low costs are important to both NAS research and application.

abs reward, architecture, search space, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

Coleman, Benjamin, Kang, Wang-Cheng, Fahrbach, Matthew, Wang, Ruoxi, Hong, Lichan, Chi, Ed H., Cheng, Derek Zhiyuan

arXiv.org Artificial IntelligenceNov-14-2023

Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely high-cardinality features. This bottleneck has led to substantial progress in alternative embedding algorithms. Many of these methods, however, make the assumption that each feature uses an independent embedding table. This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used across many different categorical features. Our theoretical and empirical analysis reveals that multiplexed embeddings can be decomposed into components from each constituent feature, allowing models to distinguish between features. We show that multiplexed representations lead to Pareto-optimal parameter-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products.

collision, dimension, unified embedding, (13 more...)

arXiv.org Artificial Intelligence

2305.12102

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets

Yang, Chengrun, Bender, Gabriel, Liu, Hanxiao, Kindermans, Pieter-Jan, Udell, Madeleine, Lu, Yifeng, Le, Quoc, Huang, Da

arXiv.org Artificial IntelligenceOct-20-2022

The best neural architecture for a given machine learning problem depends on many factors: not only the complexity and structure of the dataset, but also on resource constraints including latency, compute, energy consumption, etc. Neural architecture search (NAS) for tabular datasets is an important but under-explored problem. Previous NAS algorithms designed for image search spaces incorporate resource constraints directly into the reinforcement learning (RL) rewards. However, for NAS on tabular datasets, this protocol often discovers suboptimal architectures. This paper develops TabNAS, a new and more effective approach to handle resource constraints in tabular NAS using an RL controller motivated by the idea of rejection sampling. TabNAS immediately discards any architecture that violates the resource constraints without training or learning from that architecture. TabNAS uses a Monte-Carlo-based correction to the RL policy gradient update to account for this extra filtering step. Results on several tabular datasets demonstrate the superiority of TabNAS over previous reward-shaping methods: it finds better models that obey the constraints.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2204.07615

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems

Chen, Zhe, Wang, Yuyan, Lin, Dong, Cheng, Derek Zhiyuan, Hong, Lichan, Chi, Ed H., Cui, Claire

arXiv.org Machine LearningAug-16-2020

Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the same data can produce very different prediction results. Ensemble method is one state-of-the-art benchmark for prediction uncertainty estimation. However, ensembles are expensive to train and serve for web-scale traffic. In this paper, we seek to advance the understanding of prediction variation estimated by the ensemble method. Through empirical experiments on two widely used benchmark datasets MovieLens and Criteo in recommender systems, we observe that prediction variations come from various randomness sources, including training data shuffling, and parameter random initialization. By introducing more randomness into model training, we notice that ensemble's mean predictions tend to be more accurate while the prediction variations tend to be higher. Moreover, we propose to infer prediction variation from neuron activation strength and demonstrate the strong prediction power from activation strength features. Our experiment results show that the average R squared on MovieLens is as high as 0.56 and on Criteo is 0.81. Our method performs especially well when detecting the lowest and highest variation buckets, with 0.92 AUC and 0.89 AUC respectively. Our approach provides a simple way for prediction variation estimation, which opens up new opportunities for future work in many interesting areas (e.g.,model-based reinforcement learning) without relying on serving expensive ensemble models.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

2008.07032

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

r/MachineLearning - [N] Laplace's Demon: A Seminar Series about Bayesian Machine Learning at Scale

#artificialintelligenceJun-15-2020, 21:04:31 GMT

We have recently launched an ongoing online seminar series about Bayesian machine learning as scale. The intended audience includes machine learning practitioners and statisticians from academia and industry. Registration is now open for Jake Hofman's 17 June talk: "How visualizing inferential uncertainty can mislead readers about treatment effects in scientific results". Jake is a Senior Principal Researcher at Microsoft Research, New York. The talk is at 15.00 UTC this Wednesday, June 17; to see it in your local time zone please go to the registration page.

artificial intelligence, bayesian machine learning, social media, (13 more...)

#artificialintelligence

Country: North America > United States > New York (0.28)

Genre: Instructional Material > Course Syllabus & Notes (0.64)

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.94)

Add feedback

Optimization Approaches for Counterfactual Risk Minimization with Continuous Actions

Zenati, Houssam, Bietti, Alberto, Martin, Matthieu, Diemert, Eustache, Mairal, Julien

arXiv.org Machine LearningApr-22-2020

Counterfactual reasoning from logged data has become increasingly important for a large range of applications such as web advertising or healthcare. In this paper, we address the problem of counterfactual risk minimization for learning a stochastic policy with a continuous action space. Whereas previous works have mostly focused on deriving statistical estimators with importance sampling, we show that the optimization perspective is equally important for solving the resulting nonconvex optimization problems.Specifically, we demonstrate the benefits of proximal point algorithms and soft-clipping estimators which are more amenable to gradient-based optimization than classical hard clipping. We propose multiple synthetic, yet realistic, evaluation setups, and we release a new large-scale dataset based on web advertising data for this problem that is crucially missing public benchmarks.

dataset, estimator, policy continuous action dataset, (12 more...)

arXiv.org Machine Learning

2004.11722

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > United States > California (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.46)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Criteo's "Not Another Big Data Conference" Returns to Palo Alto

#artificialintelligenceSep-20-2018, 11:05:11 GMT

Criteo S.A., the advertising platform for the open Internet, is hosting its Not Another Big Data Conference (NABDConf) on Tuesday, October 9, 2018, in Palo Alto, Calif., for the second year in a row. Embodying the engineering culture at Criteo, NABDConf will focus on topics of artificial intelligence, deep learning, data systems engineering and scalable engineering. The one day event is designed to appeal to practitioners at all stages of expertise and seniority, and will bring new perspectives across critical areas of modern day software engineering. Speakers from Pinterest, WeWork, Determined AI, Cloudera, Google, MapR and Facebook will host keynotes on topics ranging from "The Sixth Wave of Automation" to "Deep Learning: From Theory to Practice." Criteo engineers will also be leading discussions on machine learning and the BOSS DB project at Criteo.

criteo, data mining, machine learning, (9 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.63)

Industry: Information Technology > Services (0.60)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Filters

Collaborating Authors

criteo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems Benjamin Coleman

4e392aa9bc70ed731d3c9c32810f92fb-Supplemental-Conference.pdf

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems Benjamin Coleman

Appendix for TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets Chengrun Y ang 1, Gabriel Bender

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets

Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems

r/MachineLearning - [N] Laplace's Demon: A Seminar Series about Bayesian Machine Learning at Scale

Optimization Approaches for Counterfactual Risk Minimization with Continuous Actions

Criteo's "Not Another Big Data Conference" Returns to Palo Alto