production data
- North America > United States > California (0.04)
- North America > Puerto Rico (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > Promising Solution (0.67)
- Banking & Finance (1.00)
- Health & Medicine (0.93)
High-Fidelity And Complex Test Data Generation For Google SQL Code Generation Services
Kannan, Shivasankari, Chung, Yeounoh, Gondi, Amita, Swadell, Tristan, Ozcan, Fatma
The demand for high-fidelity test data is paramount in industrial settings where access to production data is largely restricted. Traditional data generation methods often fall short, struggling with low-fidelity and the ability to model complex data structures and semantic relationships that are critical for testing complex SQL code generation services like Natural Language to SQL (NL2SQL). In this paper, we address the critical need for generating syntactically correct and semantically relevant high-fidelity mock data for complex data structures that includes columns with nested structures that we frequently encounter in Google workloads. We highlight the limitations of existing approaches used in production, particularly their inability to handle large and complex data structures, as well as the lack of semantically coherent test data that lead to limited test coverage. We demonstrate that by leveraging Large Language Models (LLMs) and incorporating strategic pre- and post-processing steps, we can generate syntactically correct and semantically relevant high-fidelity test data that adheres to complex structural constraints and maintains semantic integrity to the SQL test targets (queries/functions). This approach supports comprehensive testing of complex SQL queries involving joins, aggregations, and even deeply nested subqueries, ensuring robust evaluation of SQL code generation services, like NL2SQL and SQL Code Assistant. Our results demonstrate the practical utility of an LLM (\textit{gemini}) based test data generation for industrial SQL code generation services where generating high-fidelity test data is essential due to the frequent unavailability and inaccessibility of production datasets for testing.
- Europe > United Kingdom (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse
Tagliabue, Jacopo, Greco, Ciro
Starting from this prototype, we conclude by outlining practical next steps for a full agentic lakehouse. The paper is organized as follows. After reviewing agent-friendly abstractions (Section II), we address key safety objections for high-stakes scenarios (Section III). Once safety is established, we describe a ReAct [12] loop built on these abstractions (Section IV). We put forward our working prototype as a feasibility demonstration of safe-by-design data agents, not as a full-fledged experimental benchmark. We believe that sharing working code is of great value to the community, especially in times of quickly shifting mental models. However, it is important to remember that our fundamental insights - programmability and safety - can be replicated independently of the chosen APIs. For these reasons, we believe our paper to be valuable to a wide range of practitioners: on one hand, those looking for a new mental map of this uncharted territory; on the other, those looking to be inspired by tinkering with existing implementations and inspecting systems working at scale.
- North America > United States (0.05)
- Europe > Germany (0.04)
- North America > United States > California (0.04)
- North America > Puerto Rico (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > Promising Solution (0.67)
- Banking & Finance (1.00)
- Health & Medicine (0.93)
Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails
Cheng, Kellen Tan, Gentile, Anna Lisa, DeLuca, Chad, Ren, Guang-Jie
The pervasiveness of large language models (LLMs) in enterprise settings has also brought forth a significant amount of risks associated with their usage. Guardrails technologies aim to mitigate this risk by filtering LLMs' input/output text through various detectors. However, developing and maintaining robust detectors faces many challenges, one of which is the difficulty in acquiring production-quality labeled data on real LLM outputs prior to deployment. In this work, we propose backprompting, a simple yet intuitive solution to generate production-like labeled data for health advice guardrails development. Furthermore, we pair our backprompting method with a sparse human-in-the-loop clustering technique to label the generated data. Our aim is to construct a parallel corpus roughly representative of the original dataset yet resembling real LLM output. We then infuse existing datasets with our synthetic examples to produce robust training data for our detector. We test our technique in one of the most difficult and nuanced guardrails: the identification of health advice in LLM output, and demonstrate improvement versus other solutions. Our detector is able to outperform GPT-4o by up to 3.73%, despite having 400x less parameters.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning
Liu, Tao, Xu, Qi, Shi, Wei, Hua, Zhigang, Yang, Shuang
Session-level dynamic ad load optimization aims to personalize the density and types of delivered advertisements in real time during a user's online session by dynamically balancing user experience quality and ad monetization. Traditional causal learning-based approaches struggle with key technical challenges, especially in handling confounding bias and distribution shifts. In this paper, we develop an offline deep Q-network (DQN)-based framework that effectively mitigates confounding bias in dynamic systems and demonstrates more than 80% offline gains compared to the best causal learning-based production baseline. Moreover, to improve the framework's robustness against unanticipated distribution shifts, we further enhance our framework with a novel offline robust dueling DQN approach. This approach achieves more stable rewards on multiple OpenAI-Gym datasets as perturbations increase, and provides an additional 5% offline gains on real-world ad delivery data. Deployed across multiple production systems, our approach has achieved outsized topline gains. Post-launch online A/B tests have shown double-digit improvements in the engagement-ad score trade-off efficiency, significantly enhancing our platform's capability to serve both consumers and advertisers.
- North America > Canada > Ontario > Toronto (0.05)
- North America > United States > California > Santa Clara County > Sunnyvale (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Sequential Harmful Shift Detection Without Labels
Amoukou, Salim I., Bewley, Tom, Mishra, Saumitra, Lecue, Freddy, Magazzeni, Daniele, Veloso, Manuela
We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments, which requires no access to ground truth data labels. It builds upon the work of Podkopaev and Ramdas [2022], who address scenarios where labels are available for tracking model errors over time. Our solution extends this framework to work in the absence of labels, by employing a proxy for the true error. This proxy is derived using the predictions of a trained error estimator. Experiments show that our method has high power and false alarm control under various distribution shifts, including covariate and label shifts and natural shifts over geography and time.
- North America > United States > California (0.04)
- North America > Puerto Rico (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > Promising Solution (0.87)
- Banking & Finance (1.00)
- Health & Medicine (0.93)
Generative AI-driven forecasting of oil production
Gandhi, Yash, Zheng, Kexin, Jha, Birendra, Nomura, Ken-ichi, Nakano, Aiichiro, Vashishta, Priya, Kalia, Rajiv K.
Forecasting oil production from oilfields with multiple wells is an important problem in petroleum and geothermal energy extraction, as well as energy storage technologies. The accuracy of oil forecasts is a critical determinant of economic projections, hydrocarbon reserves estimation, construction of fluid processing facilities, and energy price fluctuations. Leveraging generative AI techniques, we model time series forecasting of oil and water productions across four multi-well sites spanning four decades. Our goal is to effectively model uncertainties and make precise forecasts to inform decision-making processes at the field scale. We utilize an autoregressive model known as TimeGrad and a variant of a transformer architecture named Informer, tailored specifically for forecasting long sequence time series data. Predictions from both TimeGrad and Informer closely align with the ground truth data. The overall performance of the Informer stands out, demonstrating greater efficiency compared to TimeGrad in forecasting oil production rates across all sites.
- Energy > Oil & Gas > Upstream (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
Trimming the Risk: Towards Reliable Continuous Training for Deep Learning Inspection Systems
Abbassi, Altaf Allah, Braiek, Houssem Ben, Khomh, Foutse, Reid, Thomas
The industry increasingly relies on deep learning (DL) technology for manufacturing inspections, which are challenging to automate with rule-based machine vision algorithms. DL-powered inspection systems derive defect patterns from labeled images, combining human-like agility with the consistency of a computerized system. However, finite labeled datasets often fail to encompass all natural variations necessitating Continuous Training (CT) to regularly adjust their models with recent data. Effective CT requires fresh labeled samples from the original distribution; otherwise, selfgenerated labels can lead to silent performance degradation. To mitigate this risk, we develop a robust CT-based maintenance approach that updates DL models using reliable data selections through a two-stage filtering process. The initial stage filters out low-confidence predictions, as the model inherently discredits them. The second stage uses variational auto-encoders and histograms to generate image embeddings that capture latent and pixel characteristics, then rejects the inputs of substantially shifted embeddings as drifted data with erroneous overconfidence. Then, a fine-tuning of the original DL model is executed on the filtered inputs while validating on a mixture of recent production and original datasets. This strategy mitigates catastrophic forgetting and ensures the model adapts effectively to new operational conditions. Evaluations on industrial inspection systems for popsicle stick prints and glass bottles using critical real-world datasets showed less than 9% of erroneous self-labeled data are retained after filtering and used for fine-tuning, improving model performance on production data by up to 14% without compromising its results on original validation data.
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Health & Medicine (1.00)
- Education (0.68)
Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data
Hammami, Haitham, Baligand, Louis, Petrovski, Bojan
In the financial industry, identifying the location of parties involved in payments is a major challenge in the context of various regulatory requirements. For this purpose address parsing entails extracting fields such as street, postal code, or country from free text message attributes. While payment processing platforms are updating their standards with more structured formats such as SWIFT with ISO 20022, address parsing remains essential for a considerable volume of messages. With the emergence of Transformers and Generative Large Language Models (LLM), we explore the performance of state-of-the-art solutions given the constraint of processing a vast amount of daily data. This paper also aims to show the need for training robust models capable of dealing with real-world noisy transactional data. Our results suggest that a well fine-tuned Transformer model using early-stopping significantly outperforms other approaches. Nevertheless, generative LLMs demonstrate strong zero-shot performance and warrant further investigations.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report > New Finding (0.68)
- Research Report > Promising Solution (0.48)
- Law (0.88)
- Banking & Finance (0.86)