Goto

Collaborating Authors

 Chuangsuwanich, Ekapol


Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments

arXiv.org Artificial Intelligence

Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging. While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution. Program-of-Thought (PoT) prompting separates reasoning from execution, offering a promising alternative but shifting the challenge to generating programs from non-English questions. We propose a framework to evaluate PoT by separating multilingual reasoning from code execution to examine (i) the impact of fine-tuning on question-reasoning alignment and (ii) how reasoning quality affects answer correctness. Our findings demonstrate that PoT fine-tuning substantially enhances multilingual reasoning, outperforming CoT fine-tuned models. We further demonstrate a strong correlation between reasoning quality (measured through code quality) and answer accuracy, highlighting its potential as a test-time performance improvement heuristic.


Mitigating Language Bias in Cross-Lingual Job Retrieval: A Recruitment Platform Perspective

arXiv.org Artificial Intelligence

Understanding the textual components of resumes and job postings is critical for improving job-matching accuracy and optimizing job search systems in online recruitment platforms. However, existing works primarily focus on analyzing individual components within this information, requiring multiple specialized tools to analyze each aspect. Such disjointed methods could potentially hinder overall generalizability in recruitment-related text processing. Therefore, we propose a unified sentence encoder that utilized multi-task dual-encoder framework for jointly learning multiple component into the unified sentence encoder. The results show that our method outperforms other state-of-the-art models, despite its smaller model size. Moreover, we propose a novel metric, Language Bias Kullback-Leibler Divergence (LBKL), to evaluate language bias in the encoder, demonstrating significant bias reduction and superior cross-lingual performance.


Learning Job Title Representation from Job Description Aggregation Network

arXiv.org Artificial Intelligence

Learning job title representation is a vital process for developing automatic human resource tools. To do so, existing methods primarily rely on learning the title representation through skills extracted from the job description, neglecting the rich and diverse content within. Thus, we propose an alternative framework for learning job titles through their respective job description (JD) and utilize a Job Description Aggregator component to handle the lengthy description and bidirectional contrastive loss to account for the bidirectional relationship between the job title and its description. We evaluated the performance of our method on both in-domain and out-of-domain settings, achieving a superior performance over the skill-based approach.


Thunder : Unified Regression-Diffusion Speech Enhancement with a Single Reverse Step using Brownian Bridge

arXiv.org Artificial Intelligence

Diffusion-based speech enhancement has shown promising results, but can suffer from a slower inference time. Initializing the diffusion process with the enhanced audio generated by a regression-based model can be used to reduce the computational steps required. However, these approaches often necessitate a regression model, further increasing the system's complexity. We propose Thunder, a unified regression-diffusion model that utilizes the Brownian bridge process which can allow the model to act in both modes. The regression mode can be accessed by setting the diffusion time step closed to 1. However, the standard score-based diffusion modeling does not perform well in this setup due to gradient instability. To mitigate this problem, we modify the diffusion model to predict the clean speech instead of the score function, achieving competitive performance with a more compact model size and fewer reverse steps.


MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model

arXiv.org Artificial Intelligence

Large Language Models (LLMs) often struggle with hallucinations and outdated information. To address this, Information Retrieval (IR) systems can be employed to augment LLMs with up-to-date knowledge. However, existing IR techniques contain deficiencies, posing a performance bottleneck. Given the extensive array of IR systems, combining diverse approaches presents a viable strategy. Nevertheless, prior attempts have yielded restricted efficacy. In this work, we propose an approach that leverages learning-to-rank techniques to combine heterogeneous IR systems. We demonstrate the method on two Retrieval Question Answering (ReQA) tasks. Our empirical findings exhibit a significant performance enhancement, outperforming previous approaches and achieving state-of-the-art results on ReQA SQuAD.


WangchanLion and WangchanX MRC Eval

arXiv.org Artificial Intelligence

This technical report describes the development of WangchanLion, an instruction fine-tuned model focusing on Machine Reading Comprehension (MRC) in the Thai language. Our model is based on SEA-LION and a collection of instruction following datasets. To promote open research and reproducibility, we publicly release all training data, code, and the final model weights under the Apache-2 license. To assess the contextual understanding capability, we conducted extensive experimental studies using two Thai MRC datasets, XQuAD and Iapp_wiki_qa_squad. Experimental results demonstrate the model's ability to comprehend the context and produce an answer faithful to the reference one in 0-shot and 1-shot settings. In addition, our evaluation goes beyond the traditional MRC. We propose a new evaluation scheme assessing the answer's correctness, helpfulness, conciseness, and contextuality. Our code is available publicly at https://github.com/vistec-AI/WangchanLion.


An Efficient Self-Supervised Cross-View Training For Sentence Embedding

arXiv.org Artificial Intelligence

Self-supervised sentence representation learning is the task of constructing an embedding space for sentences without relying on human annotation efforts. One straightforward approach is to finetune a pretrained language model (PLM) with a representation learning method such as contrastive learning. While this approach achieves impressive performance on larger PLMs, the performance rapidly degrades as the number of parameters decreases. In this paper, we propose a framework called Self-supervised Cross-View Training (SCT) to narrow the performance gap between large and small PLMs. To evaluate the effectiveness of SCT, we compare it to 5 baseline and state-of-the-art competitors on seven Semantic Textual Similarity (STS) benchmarks using 5 PLMs with the number of parameters ranging from 4M to 340M. The experimental results show that STC outperforms the competitors for PLMs with less than 100M parameters in 18 of 21 cases.


Typo-Robust Representation Learning for Dense Retrieval

arXiv.org Artificial Intelligence

Dense retrieval is a basic building block of information retrieval applications. One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling misspelled queries is minimizing the representations discrepancy between misspelled queries and their pristine ones. Unlike the existing approaches, which only focus on the alignment between misspelled and pristine queries, our method also improves the contrast between each misspelled query and its surrounding queries. To assess the effectiveness of our proposed method, we compare it against the existing competitors using two benchmark datasets and two base encoders. Our method outperforms the competitors in all cases with misspelled queries. Our code and models are available at https://github. com/panuthept/DST-DenseRetrieval.


A comparative study for interpreting deep learning prediction of the Parkinson's disease diagnosis from SPECT imaging

arXiv.org Machine Learning

The application of deep learning to single-photon emission computed tomography (SPECT) imaging in Parkinson's disease shows effectively high diagnosis accuracy. However, difficulties in model interpretation were occurred due to the complexity of the deep learning model. Although several interpretation methods were created to show the attention map that contains important features of the input data, it is still uncertain whether these methods can be applied in PD diagnosis. Four different models of the deep learning approach based on 3-dimensional convolution neural network (3D-CNN) of well-established architectures have been trained with an accuracy up to 95-96% in classification performance. These four models have been used as the comparative study for well-known interpretation methods. Generally, radiologists interpret SPECT images by confirming the shape of the I123-Ioflupane uptake in the striatal nuclei. To evaluate the interpretation performance, the segmented striatal nuclei of SPECT images are chosen as the ground truth. Results suggest that guided backpropagation and SHAP which were developed recently, provided the best interpretation performance. Guided backpropagation has the best performance to generate the attention map that focuses on the location of striatal nuclei. On the other hand, SHAP surpasses other methods in suggesting the change of the striatal nucleus uptake shape from healthy to PD subjects. Results from both methods confirm that 3D-CNN focuses on the striatal nuclei in the same way as the radiologist, and both methods should be suggested to increase the credibility of the model.