Goto

Collaborating Authors

 production system



production systems (MSPS), in which model selection is achieved by sequentially deploying a list of candidate models

Neural Information Processing Systems

We thank the reviewers for the in-depth reviews. We will first answer the comments shared by multiple reviewers and then answer the individual comments. We will add additional details about sparse GP, VI and binary observation in suppl. A/B tests for a few weeks and then select the best model, which is the common scenario in industry. Model selection for a time sensitive system is an interesting and open research question for future work.


CORGI: Efficient Pattern Matching With Quadratic Guarantees

Weitekamp, Daniel

arXiv.org Artificial Intelligence

Rule-based systems must solve complex matching problems within tight time constraints to be effective in real-time applications, such as planning and reactive control for AI agents, as well as low-latency relational database querying. Pattern-matching systems can encounter issues where exponential time and space are required to find matches for rules with many underconstrained variables, or which produce combinatorial intermediate partial matches (but are otherwise well-constrained). When online AI systems automatically generate rules from example-driven induction or code synthesis, they can easily produce worst-case matching patterns that slow or halt program execution by exceeding available memory. In our own work with cognitive systems that learn from example, we've found that aggressive forms of anti-unification-based generalization can easily produce these circumstances. To make these systems practical without hand-engineering constraints or succumbing to unpredictable failure modes, we introduce a new matching algorithm called CORGI (Collection-Oriented Relational Graph Iteration). Unlike RETE-based approaches, CORGI offers quadratic time and space guarantees for finding single satisficing matches, and the ability to iteratively stream subsequent matches without committing entire conflict sets to memory. CORGI differs from RETE in that it does not have a traditional $β$-memory for collecting partial matches. Instead, CORGI takes a two-step approach: a graph of grounded relations is built/maintained in a forward pass, and an iterator generates matches as needed by working backward through the graph. This approach eliminates the high-latency delays and memory overflows that can result from populating full conflict sets. In a performance evaluation, we demonstrate that CORGI significantly outperforms RETE implementations from SOAR and OPS5 on a simple combinatorial matching task.






Outbound Modeling for Inventory Management

Savorgnan, Riccardo, Ghai, Udaya, Eisenach, Carson, Foster, Dean

arXiv.org Artificial Intelligence

We study the problem of forecasting the number of units fulfilled (or ``drained'') from each inventory warehouse to meet customer demand, along with the associated outbound shipping costs. The actual drain and shipping costs are determined by complex production systems that manage the planning and execution of customers' orders fulfillment, i.e. from where and how to ship a unit to be delivered to a customer. Accurately modeling these processes is critical for regional inventory planning, especially when using Reinforcement Learning (RL) to develop control policies. For the RL usecase, a drain model is incorporated into a simulator to produce long rollouts, which we desire to be differentiable. While simulating the calls to the internal software systems can be used to recover this transition, they are non-differentiable and too slow and costly to run within an RL training environment. Accordingly, we frame this as a probabilistic forecasting problem, modeling the joint distribution of outbound drain and shipping costs across all warehouses at each time period, conditioned on inventory positions and exogenous customer demand. To ensure robustness in an RL environment, the model must handle out-of-distribution scenarios that arise from off-policy trajectories. We propose a validation scheme that leverages production systems to evaluate the drain model on counterfactual inventory states induced by RL policies. Preliminary results demonstrate the model's accuracy within the in-distribution setting.


Maturity Framework for Enhancing Machine Learning Quality

Castelli, Angelantonio, Chouliaras, Georgios Christos, Goldenberg, Dmitri

arXiv.org Artificial Intelligence

With the rapid integration of Machine Learning (ML) in business applications and processes, it is crucial to ensure the quality, reliability and reproducibility of such systems. We suggest a methodical approach towards ML system quality assessment and introduce a structured Maturity framework for governance of ML. We emphasize the importance of quality in ML and the need for rigorous assessment, driven by issues in ML governance and gaps in existing frameworks. Our primary contribution is a comprehensive open-sourced quality assessment method, validated with empirical evidence, accompanied by a systematic maturity framework tailored to ML systems. Drawing from applied experience at Booking.com, we discuss challenges and lessons learned during large-scale adoption within organizations. The study presents empirical findings, highlighting quality improvement trends and showcasing business outcomes. The maturity framework for ML systems, aims to become a valuable resource to reshape industry standards and enable a structural approach to improve ML maturity in any organization.


Review for NeurIPS paper: Model Selection for Production System via Automated Online Experiments

Neural Information Processing Systems

Summary and Contributions: The paper proposes a model selection algorithm called Model Selection with Automated Online Experiments (AOE) that is designed for use in production systems. In the problem statement, it is stated that the goal of the model selection problem is to select the model from a set of candidate models that maximises a metric of interest. It is assumed that the metric of interest can be expressed as the average immediate feedback from each of a model's predictions. AOE uses both historical log data and data collected from a small budget of online experiments to inform the choice of model. A distribution for the accumulative metric, or expected immediate feedback, is derived.