production environment
Proposing a Framework for Machine Learning Adoption on Legacy Systems
Rahman, Ashiqur, Alhoori, Hamed
The integration of machine learning (ML) is critical for industrial competitiveness, yet its adoption is frequently stalled by the prohibitive costs and operational disruptions of upgrading legacy systems. The financial and logistical overhead required to support the full ML lifecycle presents a formidable barrier to widespread implementation, particularly for small and medium-sized enterprises. This paper introduces a pragmatic, API-based framework designed to overcome these challenges by strategically decoupling the ML model lifecycle from the production environment. Our solution delivers the analytical power of ML to domain experts through a lightweight, browser-based interface, eliminating the need for local hardware upgrades and ensuring model maintenance can occur with zero production downtime. This human-in-the-loop approach empowers experts with interactive control over model parameters, fostering trust and facilitating seamless integration into existing workflows. By mitigating the primary financial and operational risks, this framework offers a scalable and accessible pathway to enhance production quality and safety, thereby strengthening the competitive advantage of the manufacturing sector.
- North America > United States > Illinois > DeKalb County > DeKalb (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.94)
- Banking & Finance (0.68)
- Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.68)
Improving Visual Recommendation on E-commerce Platforms Using Vision-Language Models
Yada, Yuki, Akiyama, Sho, Watanabe, Ryo, Ueno, Yuta, Shido, Yusuke, Rusli, Andre
On large-scale e-commerce platforms with tens of millions of active monthly users, recommending visually similar products is essential for enabling users to efficiently discover items that align with their preferences. This study presents the application of a vision-language model (VLM) -- which has demonstrated strong performance in image recognition and image-text retrieval tasks -- to product recommendations on Mercari, a major consumer-to-consumer marketplace used by more than 20 million monthly users in Japan. Specifically, we fine-tuned SigLIP, a VLM employing a sigmoid-based contrastive loss, using one million product image-title pairs from Mercari collected over a three-month period, and developed an image encoder for generating item embeddings used in the recommendation system. Our evaluation comprised an offline analysis of historical interaction logs and an online A/B test in a production environment. In offline analysis, the model achieved a 9.1% improvement in nDCG@5 compared with the baseline. In the online A/B test, the click-through rate improved by 50% whereas the conversion rate improved by 14% compared with the existing model. These results demonstrate the effectiveness of VLM-based encoders for e-commerce product recommendations and provide practical insights into the development of visual similarity-based recommendation systems.
- Asia > Japan (0.24)
- Europe > Czechia > Prague (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.89)
PB-IAD: Utilizing multimodal foundation models for semantic industrial anomaly detection in dynamic manufacturing environments
Hofmann, Bernd, Scheck, Albert, Franke, Joerg, Bruendl, Patrick
The detection of anomalies in manufacturing processes is crucial to ensure product quality and identify process deviations. Statistical and data-driven approaches remain the standard in industrial anomaly detection, yet their adaptability and usability are constrained by the dependence on extensive annotated datasets and limited flexibility under dynamic production conditions. Recent advances in the perception capabilities of foundation models provide promising opportunities for their adaptation to this downstream task. This paper presents PB-IAD (Prompt-based Industrial Anomaly Detection), a novel framework that leverages the multimodal and reasoning capabilities of foundation models for industrial anomaly detection. Specifically, PB-IAD addresses three key requirements of dynamic production environments: data sparsity, agile adaptability, and domain user centricity. In addition to the anomaly detection, the framework includes a prompt template that is specifically designed for iteratively implementing domain-specific process knowledge, as well as a pre-processing module that translates domain user inputs into effective system prompts. This user-centric design allows domain experts to customise the system flexibly without requiring data science expertise. The proposed framework is evaluated by utilizing GPT-4.1 across three distinct manufacturing scenarios, two data modalities, and an ablation study to systematically assess the contribution of semantic instructions. Furthermore, PB-IAD is benchmarked to state-of-the-art methods for anomaly detection such as PatchCore. The results demonstrate superior performance, particularly in data-sparse scenarios and low-shot settings, achieved solely through semantic instructions.
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (9 more...)
webMCP: Efficient AI-Native Client-Side Interaction for Agent-Ready Web Design
Current AI agents create significant barriers for users by requiring extensive processing to understand web pages, making AI-assisted web interaction slow and expensive. This paper introduces webMCP (Web Machine Context & Procedure), a client-side standard that embeds structured interaction metadata directly into web pages, enabling more efficient human-AI collaboration on existing websites. webMCP transforms how AI agents understand web interfaces by providing explicit mappings between page elements and user actions. Instead of processing entire HTML documents, agents can access pre-structured interaction data, dramatically reducing computational overhead while maintaining task accuracy. A comprehensive evaluation across 1,890 real API calls spanning online shopping, authentication, and content management scenarios demonstrates webMCP reduces processing requirements by 67.6% while maintaining 97.9% task success rates compared to 98.8% for traditional approaches. Users experience significantly lower costs (34-63% reduction) and faster response times across diverse web interactions. Statistical analysis confirms these improvements are highly significant across multiple AI models. An independent WordPress deployment study validates practical applicability, showing consistent improvements across real-world content management workflows. webMCP requires no server-side modifications, making it deployable across millions of existing websites without technical barriers. These results establish webMCP as a viable solution for making AI web assistance more accessible and sustainable, addressing the critical gap between user interaction needs and AI computational requirements in production environments.
- Research Report > New Finding (0.47)
- Research Report > Experimental Study (0.47)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Services > e-Commerce Services (0.37)
- Information Technology > Communications > Web (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions
Tang, Bangsheng, Fu, Carl Chengyan, Kou, Fei, Sizov, Grigory, Zhang, Haoci, Park, Jason, Liu, Jiawen, You, Jie, Yang, Qirui, Mehta, Sachin, Cai, Shengyong, Wang, Xiaodong, Liu, Xingyu, Li, Yunlu, Zhou, Yanjun, Wei, Wei, Zhao, Zhiwei, Qi, Zixi, Victoria, Adolfo, Ibrahim, Aya, Wasti, Bram, Kim, Changkyu, Haziza, Daniel, Sun, Fei, Delfin, Giancarlo, Guo, Emily, Ouyang, Jialin, Lee, Jaewon, Huang, Jianyu, Reizenstein, Jeremy, Fang, Lu, Zhu, Quinn, Verma, Ria, Mihailescu, Vlad, Guo, Xingwen, Cui, Yan, Hu, Ye, Lee, Yejin
Speculative decoding is a standard method for accelerating the inference speed of large language models. However, scaling it for production environments poses several engineering challenges, including efficiently implementing different operations (e.g., tree attention and multi-round speculative decoding) on GPU. In this paper, we detail the training and inference optimization techniques that we have implemented to enable EAGLE-based speculative decoding at a production scale for Llama models. With these changes, we achieve a new state-of-the-art inference latency for Llama models. For example, Llama4 Maverick decodes at a speed of about 4 ms per token (with a batch size of one) on 8 NVIDIA H100 GPUs, which is 10% faster than the previously best known method. Furthermore, for EAGLE-based speculative decoding, our optimizations enable us to achieve a speed-up for large batch sizes between 1.4x and 2.0x at production scale.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
A Production Scheduling Framework for Reinforcement Learning Under Real-World Constraints
Hoss, Jonathan, Schelling, Felix, Klarmann, Noah
The classical Job Shop Scheduling Problem (JSSP) focuses on optimizing makespan under deterministic constraints. Real-world production environments introduce additional complexities that cause traditional scheduling approaches to be less effective. Reinforcement learning (RL) holds potential in addressing these challenges, as it allows agents to learn adaptive scheduling strategies. However, there is a lack of a comprehensive, general-purpose frameworks for effectively training and evaluating RL agents under real-world constraints. To address this gap, we propose a modular framework that extends classical JSSP formulations by incorporating key real-world constraints inherent to the shopfloor, including transport logistics, buffer management, machine breakdowns, setup times, and stochastic processing conditions, while also supporting multi-objective optimization. The framework is a customizable solution that offers flexibility in defining problem instances and configuring simulation parameters, enabling adaptation to diverse production scenarios. A standardized interface ensures compatibility with various RL approaches, providing a robust environment for training RL agents and facilitating the standardized comparison of different scheduling methods under dynamic and uncertain conditions. We release JobShopLab as an open-source tool for both research and industrial applications, accessible at: https://github.com/proto-lab-ro/jobshoplab
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Personalized Interpolation: An Efficient Method to Tame Flexible Optimization Window Estimation
Zhang, Xin, Li, Weiliang, Li, Rui, Fu, Zihang, Tang, Tongyi, Zhang, Zhengyu, Chen, Wen-Yen, Noorshams, Nima, Jasapara, Nirav, Ding, Xiaowen, Wen, Ellie, Feng, Xue
In the realm of online advertising, optimizing conversions is crucial for delivering relevant products to users and enhancing business outcomes. Predicting conversion events is challenging due to variable delays between user interactions, such as impressions or clicks, and the actual conversions. These delays differ significantly across various advertisers and products, necessitating distinct optimization time windows for targeted conversions. To address this, we introduce a novel approach named the \textit{Personalized Interpolation} method, which innovatively builds upon existing fixed conversion window models to estimate flexible conversion windows. This method allows for the accurate estimation of conversions across a variety of delay ranges, thus meeting the diverse needs of advertisers without increasing system complexity. To validate the efficacy of our proposed method, we conducted comprehensive experiments using ads conversion model. Our experiments demonstrate that this method not only achieves high prediction accuracy but also does so more efficiently than other existing solutions. This validation underscores the potential of our Personalized Interpolation method to significantly enhance conversion optimization in real-world online advertising systems, promising improved targeting and effectiveness in advertising strategies.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Essex > Colchester (0.04)
- Marketing (0.69)
- Information Technology > Services (0.69)
- Health & Medicine (0.46)
- Information Technology > Data Science (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
ContextModule: Improving Code Completion via Repository-level Contextual Information
Guan, Zhanming, Liu, Junlin, Liu, Jierui, Peng, Chao, Liu, Dexin, Sun, Ningyuan, Jiang, Bo, Li, Wenchao, Liu, Jie, Zhu, Hang
Large Language Models (LLMs) have demonstrated impressive capabilities in code completion tasks, where they assist developers by predicting and generating new code in real-time. However, existing LLM-based code completion systems primarily rely on the immediate context of the file being edited, often missing valuable repository-level information, user behaviour and edit history that could improve suggestion accuracy. Additionally, challenges such as efficiently retrieving relevant code snippets from large repositories, incorporating user behavior, and balancing accuracy with low-latency requirements in production environments remain unresolved. In this paper, we propose ContextModule, a framework designed to enhance LLM-based code completion by retrieving and integrating three types of contextual information from the repository: user behavior-based code, similar code snippets, and critical symbol definitions. By capturing user interactions across files and leveraging repository-wide static analysis, ContextModule improves the relevance and precision of generated code. We implement performance optimizations, such as index caching, to ensure the system meets the latency constraints of real-world coding environments. Experimental results and industrial practise demonstrate that ContextModule significantly improves code completion accuracy and user acceptance rates.
Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Deng, Yangtao, Shi, Xiang, Jiang, Zhuo, Zhang, Xingjian, Zhang, Lei, Zhang, Zhang, Li, Bo, Song, Zuquan, Zhu, Hang, Liu, Gaohong, Li, Fuliang, Wang, Shuguang, Lin, Haibin, Ye, Jianxi, Yu, Minlan
Large-scale distributed model training requires simultaneous training on up to thousands of machines. Faulty machine detection is critical when an unexpected fault occurs in a machine. From our experience, a training task can encounter two faults per day on average, possibly leading to a halt for hours. To address the drawbacks of the time-consuming and labor-intensive manual scrutiny, we propose Minder, an automatic faulty machine detector for distributed training tasks. The key idea of Minder is to automatically and efficiently detect faulty distinctive monitoring metric patterns, which could last for a period before the entire training task comes to a halt. Minder has been deployed in our production environment for over one year, monitoring daily distributed training tasks where each involves up to thousands of machines. In our real-world fault detection scenarios, Minder can accurately and efficiently react to faults within 3.6 seconds on average, with a precision of 0.904 and F1-score of 0.893.
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Health & Medicine (0.92)
- Information Technology (0.70)
Optimizing Job Shop Scheduling in the Furniture Industry: A Reinforcement Learning Approach Considering Machine Setup, Batch Variability, and Intralogistics
Schneevogt, Malte, Binninger, Karsten, Klarmann, Noah
This paper explores the potential application of Deep Reinforcement Learning in the furniture industry. To offer a broad product portfolio, most furniture manufacturers are organized as a job shop, which ultimately results in the Job Shop Scheduling Problem (JSSP). The JSSP is addressed with a focus on extending traditional models to better represent the complexities of real-world production environments. Existing approaches frequently fail to consider critical factors such as machine setup times or varying batch sizes. A concept for a model is proposed that provides a higher level of information detail to enhance scheduling accuracy and efficiency. The concept introduces the integration of DRL for production planning, particularly suited to batch production industries such as the furniture industry. The model extends traditional approaches to JSSPs by including job volumes, buffer management, transportation times, and machine setup times. This enables more precise forecasting and analysis of production flows and processes, accommodating the variability and complexity inherent in real-world manufacturing processes. The RL agent learns to optimize scheduling decisions. It operates within a discrete action space, making decisions based on detailed observations. A reward function guides the agent's decision-making process, thereby promoting efficient scheduling and meeting production deadlines. Two integration strategies for implementing the RL agent are discussed: episodic planning, which is suitable for low-automation environments, and continuous planning, which is ideal for highly automated plants. While episodic planning can be employed as a standalone solution, the continuous planning approach necessitates the integration of the agent with ERP and Manufacturing Execution Systems. This integration enables real-time adjustments to production schedules based on dynamic changes.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)