Goto

Collaborating Authors

 Fukuoka



Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing Systems

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.




Regime-Adaptive Bayesian Optimization via Dirichlet Process Mixtures of Gaussian Processes

Zhang, Yan, Liu, Xuefeng, Chen, Sipeng, Ranftl, Sascha, Liu, Chong, Li, Shibo

arXiv.org Machine Learning

Standard Bayesian Optimization (BO) assumes uniform smoothness across the search space an assumption violated in multi-regime problems such as molecular conformation search through distinct energy basins or drug discovery across heterogeneous molecular scaffolds. A single GP either oversmooths sharp transitions or hallucinates noise in smooth regions, yielding miscalibrated uncertainty. We propose RAMBO, a Dirichlet Process Mixture of Gaussian Processes that automatically discovers latent regimes during optimization, each modeled by an independent GP with locally-optimized hyperparameters. We derive collapsed Gibbs sampling that analytically marginalizes latent functions for efficient inference, and introduce adaptive concentration parameter scheduling for coarse-to-fine regime discovery. Our acquisition functions decompose uncertainty into intra-regime and inter-regime components. Experiments on synthetic benchmarks and real-world applications, including molecular conformer optimization, virtual screening for drug discovery, and fusion reactor design, demonstrate consistent improvements over state-of-the-art baselines on multi-regime objectives.


Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach

Yoshikawa, Kohei, Kawano, Shuichi

arXiv.org Machine Learning

Identification and Estimation under Multiple Versions of Treatment: Mixture-of-Experts Approach Kohei Y oshikawa Shuichi Kawano January 5, 2026 Abstract The Stable Unit Treatment Value Assumption (SUTV A) includes the condition that there are no multiple versions of treatment in causal inference. Though we could not control the implementation of treatment in observational studies, multiple versions may exist in the treatment. It has been pointed out that ignoring such multiple versions of treatment can lead to biased estimates of causal effects, but a causal inference framework that explicitly deals with the unbiased identification and estimation of version-specific causal effects has not been fully developed yet. Thus, obtaining a deeper understanding for mechanisms of the complex treatments is difficult. In this paper, we introduce the Mixture-of-Experts framework into causal inference and develop a methodology for estimating the causal effects of latent versions. This approach enables explicit estimation of version-specific causal effects even if the versions are not observed. Numerical experiments demonstrate the effectiveness of the proposed method. Keywords causal inference multiple versions of treatment compound treatments mixture-of-experts EM algorithm 1 Introduction In the theory of causal inference, a fundamental starting point is the potential outcomes framework since Rubin (1980), whose core assumption is the Stable Unit Treatment Value Assumption (SUTV A).


Data-Driven Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations

Palar, Pramudita Satria, Saves, Paul, Regis, Rommel G., Shimoyama, Koji, Obayashi, Shigeru, Verstaevel, Nicolas, Morlier, Joseph

arXiv.org Machine Learning

Explainable machine learning techniques have gained increasing attention in engineering applications, especially in aerospace design and analysis, where understanding how input variables influence data-driven models is essential. Partial Dependence Plots (PDPs) are widely used for interpreting black-box models by showing the average effect of an input variable on the prediction. However, their global sensitivity metric can be misleading when strong interactions are present, as averaging tends to obscure interaction effects. To address this limitation, we propose a global sensitivity metric based on Individual Conditional Expectation (ICE) curves. The method computes the expected feature importance across ICE curves, along with their standard deviation, to more effectively capture the influence of interactions. We provide a mathematical proof demonstrating that the PDP-based sensitivity is a lower bound of the proposed ICE-based metric under truncated orthogonal polynomial expansion. In addition, we introduce an ICE-based correlation value to quantify how interactions modify the relationship between inputs and the output. Comparative evaluations were performed on three cases: a 5-variable analytical function, a 5-variable wind-turbine fatigue problem, and a 9-variable airfoil aerodynamics case, where ICE-based sensitivity was benchmarked against PDP, SHapley Additive exPlanations (SHAP), and Sobol' indices. The results show that ICE-based feature importance provides richer insights than the traditional PDP-based approach, while visual interpretations from PDP, ICE, and SHAP complement one another by offering multiple perspectives.


Moving object detection from multi-depth images with an attention-enhanced CNN

Shibukawa, Masato, Yoshida, Fumi, Yanagisawa, Toshifumi, Ito, Takashi, Kurosaki, Hirohisa, Yoshikawa, Makoto, Kamiya, Kohki, Jiang, Ji-an, Fraser, Wesley, Kavelaars, JJ, Benecchi, Susan, Verbiscer, Anne, Hatakeyama, Akira, O, Hosei, Ozaki, Naoya

arXiv.org Artificial Intelligence

One of the greatest challenges for detecting moving objects in the solar system from wide-field survey data is determining whether a signal indicates a true object or is due to some other source, like noise. Object verification has relied heavily on human eyes, which usually results in significant labor costs. In order to address this limitation and reduce the reliance on manual intervention, we propose a multi-input convolutional neural network integrated with a convolutional block attention module. This method is specifically tailored to enhance the moving object detection system that we have developed and used previously. The current method introduces two innovations. This first one is a multi-input architecture that processes multiple stacked images simultaneously. The second is the incorporation of the convolutional block attention module which enables the model to focus on essential features in both spatial and channel dimensions. These advancements facilitate efficient learning from multiple inputs, leading to more robust detection of moving objects. The performance of the model is evaluated on a dataset consisting of approximately 2,000 observational images. We achieved an accuracy of nearly 99% with AUC (an Area Under the Curve) of >0.99. These metrics indicate that the proposed model achieves excellent classification performance. By adjusting the threshold for object detection, the new model reduces the human workload by more than 99% compared to manual verification.


When LLMs Can't Help: Real-World Evaluation of LLMs in Nutrition

Li, Karen Jia-Hui, Balloccu, Simone, Dusek, Ondrej, Reiter, Ehud

arXiv.org Artificial Intelligence

The increasing trust in large language models (LLMs), especially in the form of chatbots, is often undermined by the lack of their extrinsic evaluation. This holds particularly true in nutrition, where randomised controlled trials (RCTs) are the gold standard, and experts demand them for evidence-based deployment. LLMs have shown promising results in this field, but these are limited to intrinsic setups. We address this gap by running the first RCT involving LLMs for nutrition. We augment a rule-based chatbot with two LLM-based features: (1) message rephrasing for conversational variety and engagement, and (2) nutritional counselling through a fine-tuned model. In our seven-week RCT (n=81), we compare chatbot variants with and without LLM integration. We measure effects on dietary outcome, emotional well-being, and engagement. Despite our LLM-based features performing well in intrinsic evaluation, we find that they did not yield consistent benefits in real-world deployment. These results highlight critical gaps between intrinsic evaluations and real-world impact, emphasising the need for interdisciplinary, human-centred approaches.\footnote{We provide all of our code and results at: \\ \href{https://github.com/saeshyra/diet-chatbot-trial}{https://github.com/saeshyra/diet-chatbot-trial}}


Evaluating Magic Leap 2 Tool Tracking for AR Sensor Guidance in Industrial Inspections

Masuhr, Christian, Koch, Julian, Schüppstuhl, Thorsten

arXiv.org Artificial Intelligence

Rigorous evaluation of commercial Augmented Reality (AR) hardware is crucial, yet public benchmarks for tool tracking on modern Head-Mounted Displays (HMDs) are limited. This paper addresses this gap by systematically assessing the Magic Leap 2 (ML2) controllers tracking performance. Using a robotic arm for repeatable motion (EN ISO 9283) and an optical tracking system as ground truth, our protocol evaluates static and dynamic performance under various conditions, including realistic paths from a hydrogen leak inspection use case. The results provide a quantitative baseline of the ML2 controller's accuracy and repeatability and present a robust, transferable evaluation methodology. The findings provide a basis to assess the controllers suitability for the inspection use case and similar industrial sensor-based AR guidance tasks.