AITopics | Manakul, Potsawee

Collaborating Authors

Manakul, Potsawee

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments

Payoungkhamdee, Patomporn, Tuchinda, Pume, Baek, Jinheon, Cahyawijaya, Samuel, Udomcharoenchaikit, Can, Manakul, Potsawee, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Nutanong, Sarana

arXiv.org Artificial IntelligenceFeb-25-2025

Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging. While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution. Program-of-Thought (PoT) prompting separates reasoning from execution, offering a promising alternative but shifting the challenge to generating programs from non-English questions. We propose a framework to evaluate PoT by separating multilingual reasoning from code execution to examine (i) the impact of fine-tuning on question-reasoning alignment and (ii) how reasoning quality affects answer correctness. Our findings demonstrate that PoT fine-tuning substantially enhances multilingual reasoning, outperforming CoT fine-tuned models. We further demonstrate a strong correlation between reasoning quality (measured through code quality) and answer accuracy, highlighting its potential as a test-time performance improvement heuristic.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.17956

Country:

North America > United States (0.28)
North America > Mexico > Mexico City (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.33)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Mind the Gap! Static and Interactive Evaluations of Large Audio Models

Li, Minzhi, Held, William Barr, Ryan, Michael J, Pipatanakul, Kunat, Manakul, Potsawee, Zhu, Hao, Yang, Diyi

arXiv.org Artificial IntelligenceFeb-21-2025

As AI chatbots become ubiquitous, voice interaction presents a compelling way to enable rapid, high-bandwidth communication for both semantic and social signals. This has driven research into Large Audio Models (LAMs) to power voice-native experiences. However, aligning LAM development with user goals requires a clear understanding of user needs and preferences to establish reliable progress metrics. This study addresses these challenges by introducing an interactive approach to evaluate LAMs and collecting 7,500 LAM interactions from 484 participants. Through topic modeling of user queries, we identify primary use cases for audio interfaces. We then analyze user preference rankings and qualitative feedback to determine which models best align with user needs. Finally, we evaluate how static benchmarks predict interactive performance - our analysis reveals no individual benchmark strongly correlates with interactive results ($\tau \leq 0.33$ for all benchmarks). While combining multiple coarse-grained features yields modest predictive power ($R^2$=$0.30$), only two out of twenty datasets on spoken question answering and age prediction show significantly positive correlations. This suggests a clear need to develop LAM evaluations that better correlate with user preferences.

artificial intelligence, arxiv preprint arxiv, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.15919

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)
Europe > Italy (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging - An Open Recipe

Pipatanakul, Kunat, Taveekitworachai, Pittawat, Manakul, Potsawee, Tharnpipitchai, Kasima

arXiv.org Artificial IntelligenceFeb-17-2025

This paper investigates data selection and model merging methodologies aimed at incorporating advanced reasoning capabilities such as those of DeepSeek R1 into language-specific large language models (LLMs), with a particular focus on the Thai LLM. Our goal is to enhance the reasoning capabilities of language-specific LLMs while maintaining their target language abilities. DeepSeek R1 excels in reasoning but primarily benefits high-resource languages such as English and Chinese. However, low-resource languages remain underserved due to the dominance of English-centric training data and model optimizations, which limit performance in these languages. This limitation results in unreliable code-switching and diminished effectiveness on tasks in low-resource languages. Meanwhile, local and regional LLM initiatives have attempted to bridge this gap by developing languagespecific LLMs that focus on improving local linguistic fidelity. This work releases the data, merge configurations, and model weights to promote the advancement of language-specific LLM initiatives. Recent advancements in large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, particularly through innovations in scaling at test time and specialized training paradigms (DeepSeek-AI et al., 2025).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.09056

Country: Asia (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Typhoon T1: An Open Thai Reasoning Model

Taveekitworachai, Pittawat, Manakul, Potsawee, Tharnpipitchai, Kasima, Pipatanakul, Kunat

arXiv.org Artificial IntelligenceFeb-13-2025

This paper introduces Typhoon T1, an open effort to develop an open Thai reasoning model. A reasoning model is a relatively new type of generative model built on top of large language models (LLMs). A reasoning model generates a long chain of thought before arriving at a final answer, an approach found to improve performance on complex tasks. However, details on developing such a model are limited, especially for reasoning models that can generate traces in a low-resource language. Typhoon T1 presents an open effort that dives into the details of developing a reasoning model in a more cost-effective way by leveraging supervised fine-tuning using open datasets, instead of reinforcement learning. This paper shares the details about synthetic data generation and training, as well as our dataset and model weights. Additionally, we provide insights gained from developing a reasoning model that generalizes across domains and is capable of generating reasoning traces in a low-resource language, using Thai as an example. We hope this open effort provides a foundation for further research in this field.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.09042

Country:

Asia > Middle East (0.46)
North America > Mexico (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models

Pipatanakul, Kunat, Manakul, Potsawee, Nitarach, Natapong, Sirichotedumrong, Warit, Nonesung, Surapon, Jaknamon, Teetouch, Pengpun, Parinthapat, Taveekitworachai, Pittawat, Na-Thalang, Adisai, Sripaisarnmongkol, Sittipong, Jirayoot, Krisanapong, Tharnpipitchai, Kasima

arXiv.org Artificial IntelligenceDec-19-2024

This paper introduces Typhoon 2, a series of text and multimodal large language models optimized for the Thai language. The series includes models for text, vision, and audio. Typhoon2-Text builds on state-of-the-art open models, such as Llama 3 and Qwen2, and we perform continual pre-training on a mixture of English and Thai data. We employ post-training techniques to enhance Thai language performance while preserving the base models' original capabilities. We release text models across a range of sizes, from 1 to 70 billion parameters, available in both base and instruction-tuned variants. To guardrail text generation, we release Typhoon2-Safety, a classifier enhanced for Thai cultures and language. Typhoon2-Vision improves Thai document understanding while retaining general visual capabilities, such as image captioning. Typhoon2-Audio introduces an end-to-end speech-to-speech model architecture capable of processing audio, speech, and text inputs and generating both text and speech outputs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.13702

Country:

North America > United States (0.46)
Asia > Thailand (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Education (1.00)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SkillAggregation: Reference-free LLM-Dependent Aggregation

Sun, Guangzhi, Kagrecha, Anmol, Manakul, Potsawee, Woodland, Phil, Gales, Mark

arXiv.org Artificial IntelligenceOct-14-2024

Large Language Models (LLMs) are increasingly used to assess NLP tasks due to their ability to generate human-like judgments. Single LLMs were used initially, however, recent work suggests using multiple LLMs as judges yields improved performance. An important step in exploiting multiple judgements is the combination stage, aggregation. Existing methods in NLP either assign equal weight to all LLM judgments or are designed for specific tasks such as hallucination detection. This work focuses on aggregating predictions from multiple systems where no reference labels are available. A new method called SkillAggregation is proposed, which learns to combine estimates from LLM judges without needing additional data or ground truth. It extends the Crowdlayer aggregation method, developed for image classification, to exploit the judge estimates during inference. The approach is compared to a range of standard aggregation methods on HaluEval-Dialogue, TruthfulQA and Chatbot Arena tasks. SkillAggregation outperforms Crowdlayer on all tasks, and yields the best performance over all approaches on the majority of tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.10215

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

Sun, Guangzhi, Manakul, Potsawee, Liusie, Adian, Pipatanakul, Kunat, Zhang, Chao, Woodland, Phil, Gales, Mark

arXiv.org Artificial IntelligenceMay-22-2024

Multimodal foundation models are prone to hallucination, generating outputs that either contradict the input or are not grounded by factual information. Given the diversity in architectures, training data and instruction tuning techniques, there can be large variations in systems' susceptibility to hallucinations. To assess system hallucination robustness, hallucination ranking approaches have been developed for specific tasks such as image captioning, question answering, summarization, or biography generation. However, these approaches typically compare model outputs to gold-standard references or labels, limiting hallucination benchmarking for new domains. This work proposes "CrossCheckGPT", a reference-free universal hallucination ranking for multimodal foundation models. The core idea of CrossCheckGPT is that the same hallucinated content is unlikely to be generated by different independent systems, hence cross-system consistency can provide meaningful and accurate hallucination assessment scores. CrossCheckGPT can be applied to any model or task, provided that the information consistency between outputs can be measured through an appropriate distance metric. Focusing on multimodal large language models that generate text, we explore two information consistency measures: CrossCheck-explicit and CrossCheck-implicit. We showcase the applicability of our method for hallucination ranking across various modalities, namely the text, image, and audio-visual domains. Further, we propose the first audio-visual hallucination benchmark, "AVHalluBench", and illustrate the effectiveness of CrossCheckGPT, achieving correlations of 98% and 89% with human judgements on MHaluBench and AVHalluBench, respectively.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.13684

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Typhoon: Thai Large Language Models

Pipatanakul, Kunat, Jirabovonvisut, Phatrasek, Manakul, Potsawee, Sripaisarnmongkol, Sittipong, Patomwong, Ruangsak, Chokchainant, Pathomporn, Tharnpipitchai, Kasima

arXiv.org Artificial IntelligenceDec-21-2023

Typhoon is a series of Thai large language models (LLMs) developed specifically for the Thai language. This technical report presents challenges and insights in developing Thai LLMs, including data preparation, pretraining, instruction-tuning, and evaluation. As one of the challenges of low-resource languages is the amount of pretraining data, we apply continual training to transfer existing world knowledge from a strong LLM. To evaluate the Thai knowledge encapsulated in each model from the pretraining stage, we develop ThaiExam, a benchmark based on examinations for high-school students and investment professionals in Thailand. In addition, we fine-tune Typhoon to follow Thai instructions, and we evaluate instruction-tuned models on Thai instruction datasets as well as translation, summarization, and question-answering tasks. Experimental results on a suite of Thai benchmarks show that Typhoon outperforms all open-source Thai language models, and its performance is on par with GPT-3.5 in Thai while having only 7 billion parameters and being 2.62 times more efficient in tokenizing Thai text.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.13951

Country:

Europe (0.93)
Asia > Thailand (0.35)
North America > United States > Pennsylvania (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > K-12 Education > Secondary School (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Manakul, Potsawee, Liusie, Adian, Gales, Mark J. F.

arXiv.org Artificial IntelligenceOct-11-2023

Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to the output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if an LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several baselines and show that our approach has considerably higher AUC-PR scores in sentence-level hallucination detection and higher correlation scores in passage-level factuality assessment compared to grey-box methods.

large language model, machine learning, natural language, (4 more...)

arXiv.org Artificial Intelligence

2303.08896

Genre: Research Report (0.40)

Industry: Transportation > Air (0.60)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Can Generative Large Language Models Perform ASR Error Correction?

Ma, Rao, Qian, Mengjie, Manakul, Potsawee, Gales, Mark, Knill, Kate

arXiv.org Artificial IntelligenceSep-29-2023

ASR error correction is an interesting option for post processing speech recognition system outputs. These error correction models are usually trained in a supervised fashion using the decoding results of a target ASR system. This approach can be computationally intensive and the model is tuned to a specific ASR system. Recently generative large language models (LLMs) have been applied to a wide range of natural language processing tasks, as they can operate in a zero-shot or few shot fashion. In this paper we investigate using ChatGPT, a generative LLM, for ASR error correction. Based on the ASR N-best output, we propose both unconstrained and constrained, where a member of the N-best list is selected, approaches. Additionally, zero and 1-shot settings are evaluated. Experiments show that this generative LLM approach can yield performance gains for two different state-of-the-art ASR architectures, transducer and attention-encoder-decoder based, and multiple test sets.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.04172

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Spain (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback