ape
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
LangMark: A Multilingual Dataset for Automatic Post-Editing
Velazquez, Diego, Grace, Mikaela, Karageorgos, Konstantinos, Carin, Lawrence, Schliem, Aaron, Zaikis, Dimitrios, Wechsler, Roger
Automatic post-editing (APE) aims to correct errors in machine-translated text, enhancing translation quality, while reducing the need for human intervention. Despite advances in neural machine translation (NMT), the development of effective APE systems has been hindered by the lack of large-scale multilingual datasets specifically tailored to NMT outputs. To address this gap, we present and release LangMark, a new human-annotated multilingual APE dataset for English translation to seven languages: Brazilian Portuguese, French, German, Italian, Japanese, Russian, and Spanish. The dataset has 206,983 triplets, with each triplet consisting of a source segment, its NMT output, and a human post-edited translation. Annotated by expert human linguists, our dataset offers both linguistic diversity and scale. Leveraging this dataset, we empirically show that Large Language Models (LLMs) with few-shot prompting can effectively perform APE, improving upon leading commercial and even proprietary machine translation systems. We believe that this new resource will facilitate the future development and evaluation of APE systems.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Belgium (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Auto-Prompt Ensemble for LLM Judge
Li, Jiajie, Zhang, Huayi, Lin, Peng, Xiong, Jinjun, Xu, Wei
We present a novel framework that improves the reliability of LLM judges by selectively augmenting LLM with auxiliary evaluation dimensions. Existing LLM judges often miss crucial evaluation dimensions because they fail to recognize the implicit standards underlying human assessments. To address this challenge, we propose the Auto-Prompt Ensemble (APE), an adaptive framework that automatically learns evaluation dimensions from its failure cases. APE incorporates a confidence-based ensemble mechanism to decide when to adopt the judgments from additional evaluation dimensions through a novel confidence estimation approach called Collective Confidence. Extensive experiments demonstrate that APE improves the reliability of LLM Judge across diverse standard benchmarks. For instance, APE enhances GPT-4o agreement rate on Reward Bench from 87.2% to 90.5% in the zero-shot setting. Overall, APE provides a principled approach for LLM Judge to leverage test-time computation, and bridge the evaluation gap between human and LLM judges.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
Language Modeling with Learned Meta-Tokens
Shah, Alok N., Gupta, Khush, Ramji, Keshav, Chaudhari, Pratik
While modern Transformer-based language models (LMs) have achieved major success in multi-task generalization, they often struggle to capture long-range dependencies within their context window. This work introduces a novel approach using meta-tokens, special tokens injected during pre-training, along with a dedicated meta-attention mechanism to guide LMs to use these tokens. We pre-train a language model with a modified GPT-2 architecture equipped with meta-attention in addition to causal multi-head attention, and study the impact of these tokens on a suite of synthetic tasks. We find that data-efficient language model pre-training on fewer than 100B tokens utilizing meta-tokens and our meta-attention mechanism achieves strong performance on these tasks after fine-tuning. We suggest that these gains arise due to the meta-tokens sharpening the positional encoding. This enables them to operate as trainable, content-based landmarks, implicitly compressing preceding context and "caching" it in the meta-token. At inference-time, the meta-token points to relevant context, facilitating length generalization up to 2$\times$ its context window, even after extension with YaRN. We provide further evidence of these behaviors by visualizing model internals to study the residual stream, and assessing the compression quality by information-theoretic analysis on the rate-distortion tradeoff. Our findings suggest that pre-training LMs with meta-tokens offers a simple, data-efficient method to enhance long-context language modeling performance, while introducing new insights into the nature of their behavior towards length generalization.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
RadiomicsRetrieval: A Customizable Framework for Medical Image Retrieval Using Radiomics Features
Na, Inye, Rue, Nejung, Chung, Jiwon, Park, Hyunjin
Medical image retrieval is a valuable field for supporting clinical decision-making, yet current methods primarily support 2D images and require fully annotated queries, limiting clinical flexibility. To address this, we propose RadiomicsRetrieval, a 3D content-based retrieval framework bridging handcrafted radiomics descriptors with deep learning-based embeddings at the tumor level . Unlike existing 2D approaches, RadiomicsRetrieval fully exploits volumetric data to leverage richer spatial context in medical images. We employ a promptable segmentation model (e.g., SAM) to derive tumor-specific image embeddings, which are aligned with radiomics features extracted from the same tumor via contrastive learning. These representations are further enriched by anatomical positional embedding (APE). As a result, RadiomicsRe-trieval enables flexible querying based on shape, location, or partial feature sets. Extensive experiments on both lung CT and brain MRI public datasets demonstrate that radiomics features significantly enhance retrieval specificity, while APE provides global anatomical context essential for location-based searches. Notably, our framework requires only minimal user prompts (e.g., a single point), minimizing segmentation overhead and supporting diverse clinical scenarios. The capability to query using either image embeddings or selected radiomics attributes highlights its adaptability, potentially benefiting diagnosis, treatment planning, and research on large-scale medical imaging repositories.
- Asia > South Korea > Gyeonggi-do > Suwon (0.05)
- Europe > Netherlands > Drenthe > Assen (0.04)
- Europe > Hungary (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation
Adapting large pre-trained language models to specific tasks requires balancing performance improvement with preservation of learned capabilities. Standard fine-tuning approaches optimize a single objective function through gradient descent, often leading to catastrophic forgetting [16] or instability in learned representations. Parameter-efficient methods like LoRA [11] constrain modifications to low-dimensional subspaces but limit adaptation scope. We propose Adjacent Possible Exploration (APE), a selective fine-tuning approach that explores multiple parameter modification directions while implementing acceptance criteria to maintain model stability. The method draws conceptual inspiration from evolutionary optimization principles, particularly the biological constraint that viable changes must preserve essential system properties while enabling incremental improvement. APE operates by generating multiple candidate parameter updates through fine-tuning on randomly sampled data subsets, then selecting only those updates that exceed a performance improvement threshold. This creates a filtered optimization process that systematically explores beneficial parameter modifications while rejecting changes that fall within noise levels or potentially destabilize learned representations. Our key contributions include: (1) A practical algorithm for selective fine-tuning that balances exploration and stability, (2) Empirical validation showing superior performance compared to standard adaptation methods, and (3) Analysis of why selective acceptance of parameter modifications leads to more robust model adaptation. 1 The approach demonstrates that systematic exploration of parameter space through filtered selection can achieve better adaptation results than unconstrained optimization, providing a principled framework for controlled model modification that maintains stability while enabling significant performance improvements.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
Yang, Xinyu, Chen, Tianqi, Chen, Beidi
Recent advances in context-augmented generation (CAG) techniques, particularly retrieval-augmented generation (RAG) (Gupta et al., 2024; Gao et al., 2023) and in-context learning (ICL) (Dong et al., 2022; Wei et al., 2022), have been widely adopted in large language models (LLMs) (Dubey et al., 2024; Achiam et al., 2023), improving their ability to generalize to unseen tasks with contextual information, as demonstrated in Figure 1 (top). These techniques employ a sequential encoding process to ground LLM inputs with knowledge from external sources: concatenating the retrieved texts into one sequence, and encoding the sequence into key-value (KV) states as the context for subsequent queries. While this new, significantly longer input improves performance, the increased latency in context prefilling becomes a bottleneck in tasks that require long inputs but generate short outputs (Bai et al., 2023; Agarwal et al., 2024; Jiang et al., 2024b). For example, prefilling a 128K context takes 17 seconds, whereas generating 256 tokens requires only 6 seconds. This discrepancy leaves significant room to improve the practical efficiency of CAG systems in real-world deployments (Liu, 2022; Chase, 2022).
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Native Fortran Implementation of TensorFlow-Trained Deep and Bayesian Neural Networks
Furlong, Aidan, Zhao, Xingang, Salko, Bob, Wu, Xu
Over the past decade, the investigation of machine learning (ML) within the field of nuclear engineering has grown significantly. With many approaches reaching maturity, the next phase of investigation will determine the feasibility and usefulness of ML model implementation in a production setting. Several of the codes used for reactor design and assessment are primarily written in the Fortran language, which is not immediately compatible with TensorFlow-trained ML models. This study presents a framework for implementing deep neural networks (DNNs) and Bayesian neural networks (BNNs) in Fortran, allowing for native execution without TensorFlow's C API, Python runtime, or ONNX conversion. Designed for ease of use and computational efficiency, the framework can be implemented in any Fortran code, supporting iterative solvers and UQ via ensembles or BNNs. Verification was performed using a two-input, one-output test case composed of a noisy sinusoid to compare Fortran-based predictions to those from TensorFlow. The DNN predictions showed negligible differences and achieved a 19.6x speedup, whereas the BNN predictions exhibited minor disagreement, plausibly due to differences in random number generation. An 8.0x speedup was noted for BNN inference. The approach was then further verified on a nuclear-relevant problem predicting critical heat flux (CHF), which demonstrated similar behavior along with significant computational gains. Discussion regarding the framework's successful integration into the CTF thermal-hydraulics code is also included, outlining its practical usefulness. Overall, this framework was shown to be effective at implementing both DNN and BNN model inference within Fortran, allowing for the continued study of ML-based methods in real-world nuclear applications.
- North America > United States > Tennessee > Knox County > Knoxville (0.14)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.05)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
- (3 more...)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.95)
A Comprehensive Forecasting Framework based on Multi-Stage Hierarchical Forecasting Reconciliation and Adjustment
Yang, Zhengchao, Ghosh, Mithun, Saha, Anish, Xu, Dong, Shmakov, Konstantin, Lee, Kuang-chih
Ads demand forecasting for Walmart's ad products plays a critical role in enabling effective resource planning, allocation, and management of ads performance. In this paper, we introduce a comprehensive demand forecasting system that tackles hierarchical time series forecasting in business settings. Though traditional hierarchical reconciliation methods ensure forecasting coherence, they often trade off accuracy for coherence especially at lower levels and fail to capture the seasonality unique to each time-series in the hierarchy. Thus, we propose a novel framework "Multi-Stage Hierarchical Forecasting Reconciliation and Adjustment (Multi-Stage HiFoReAd)" to address the challenges of preserving seasonality, ensuring coherence, and improving accuracy. Our system first utilizes diverse models, ensembled through Bayesian Optimization (BO), achieving base forecasts. The generated base forecasts are then passed into the Multi-Stage HiFoReAd framework. The initial stage refines the hierarchy using Top-Down forecasts and "harmonic alignment." The second stage aligns the higher levels' forecasts using MinTrace algorithm, following which the last two levels undergo "harmonic alignment" and "stratified scaling", to eventually achieve accurate and coherent forecasts across the whole hierarchy. Our experiments on Walmart's internal Ads-demand dataset and 3 other public datasets, each with 4 hierarchical levels, demonstrate that the average Absolute Percentage Error from the cross-validation sets improve from 3% to 40% across levels against BO-ensemble of models (LGBM, MSTL+ETS, Prophet) as well as from 1.2% to 92.9% against State-Of-The-Art models. In addition, the forecasts at all hierarchical levels are proved to be coherent. The proposed framework has been deployed and leveraged by Walmart's ads, sales and operations teams to track future demands, make informed decisions and plan resources.
- North America > United States > California > Santa Clara County > Sunnyvale (0.05)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Oceania > Australia > Queensland (0.04)
- (3 more...)