AITopics | Tang, Zilu

Collaborating Authors

Tang, Zilu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Multi-Labeled Dataset for Indonesian Discourse: Examining Toxicity, Polarization, and Demographics Information

Susanto, Lucky, Wijanarko, Musa, Pratama, Prasetia, Tang, Zilu, Akyas, Fariz, Hong, Traci, Idris, Ika, Aji, Alham, Wijaya, Derry

arXiv.org Artificial IntelligenceMar-1-2025

Polarization is defined as divisive opinions held by two or more groups on substantive issues. As the world's third-largest democracy, Indonesia faces growing concerns about the interplay between political polarization and online toxicity, which is often directed at vulnerable minority groups. Despite the importance of this issue, previous NLP research has not fully explored the relationship between toxicity and polarization. To bridge this gap, we present a novel multi-label Indonesian dataset that incorporates toxicity, polarization, and annotator demographic information. Benchmarking this dataset using BERT-base models and large language models (LLMs) shows that polarization information enhances toxicity classification, and vice versa. Furthermore, providing demographic information significantly improves the performance of polarization classification.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.00417

Country:

Asia > Indonesia (0.49)
North America > United States > California (0.14)
Europe > Spain > Galicia (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Government (0.93)
Media > News (0.68)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization

Tang, Zilu, Chatterjee, Rajen, Garg, Sarthak

arXiv.org Artificial IntelligenceJan-28-2025

Machine Translation (MT) is undergoing a paradigm shift, with systems based on fine-tuned large language models (LLM) becoming increasingly competitive with traditional encoder-decoder models trained specifically for translation tasks. However, LLM-based systems are at a higher risk of generating hallucinations, which can severely undermine user's trust and safety. Most prior research on hallucination mitigation focuses on traditional MT models, with solutions that involve post-hoc mitigation - detecting hallucinated translations and re-translating them. While effective, this approach introduces additional complexity in deploying extra tools in production and also increases latency. To address these limitations, we propose a method that intrinsically learns to mitigate hallucinations during the model training phase. Specifically, we introduce a data creation framework to generate hallucination focused preference datasets. Fine-tuning LLMs on these preference datasets reduces the hallucination rate by an average of 96% across five language pairs, while preserving overall translation quality. In a zero-shot setting our approach reduces hallucinations by 89% on an average across three unseen target languages.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.17295

Country:

Europe (1.00)
Asia > India (0.68)
Asia > Middle East > UAE (0.14)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Could We Have Had Better Multilingual LLMs If English Was Not the Central Language?

Diandaru, Ryandito, Susanto, Lucky, Tang, Zilu, Purwarianti, Ayu, Wijaya, Derry

arXiv.org Artificial IntelligenceApr-5-2024

Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on. However, the impact of factors beyond training data size on translation performance remains a topic of debate, especially concerning languages not directly encountered during training. Our study delves into Llama2's translation capabilities. By modeling a linear relationship between linguistic feature distances and machine translation scores, we ask ourselves if there are potentially better central languages for LLMs other than English. Our experiments show that the 7B Llama2 model yields above 10 BLEU when translating into all languages it has seen, which rarely happens for languages it has not seen. Most translation improvements into unseen languages come from scaling up the model size rather than instruction tuning or increasing shot count. Furthermore, our correlation analysis reveals that syntactic similarity is not the only linguistic factor that strongly correlates with machine translation scores. Interestingly, we discovered that under specific circumstances, some languages (e.g. Swedish, Catalan), despite having significantly less training data, exhibit comparable correlation levels to English. These insights challenge the prevailing landscape of LLMs, suggesting that models centered around languages other than English could provide a more efficient foundation for multilingual applications.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2402.13917

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Pennsylvania (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

Tang, Zilu, Agarwal, Mayank, Shypula, Alex, Wang, Bailin, Wijaya, Derry, Chen, Jie, Kim, Yoon

arXiv.org Artificial IntelligenceNov-12-2023

This work explores the use of self-generated natural language explanations as an intermediate step for code-to-code translation with language models. Across three types of explanations and 19 programming languages constructed from the MultiPL-E dataset, we find the explanations to be particularly effective in the zero-shot case, improving performance by 12% on average. Improvements with natural language explanations are particularly pronounced on difficult programs. We release our dataset, code, and canonical solutions in all 19 languages.

artificial intelligence, natural language, self-generated explanation, (2 more...)

arXiv.org Artificial Intelligence

2311.0707

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Discrete Word Embedding for Logical Natural Language Understanding

Asai, Masataro, Tang, Zilu

arXiv.org Artificial IntelligenceAug-26-2020

In this paper, we propose an unsupervised neural model for learning a discrete embedding of words. While being discrete, our embedding supports vector arithmetic operations similar to continuous embeddings by interpreting each word as a set of propositional statements describing a rule. The formulation of our vector arithmetic closely reflects the logical structure originating from the symbolic sequential decision making formalism (classical/STRIPS planning). Contrary to the conventional wisdom that discrete representation cannot perform well due to the lack of ability to capture the uncertainty, our representation is competitive against the continuous representations in several downstream tasks. We demonstrate that our embedding is directly compatible with the symbolic, classical planning solvers by performing a "paraphrasing" task. Due to the discrete/logical decision making in classical algorithms with deterministic (non-probabilistic) completeness, and also because it does not require additional training on the paraphrasing dataset, our system can negatively answer a paraphrasing query (inexistence of solutions), and can answer that only some approximate solutions exist -- A feature that is missing in the recent, huge, purely neural language models such as GPT-3.

deep learning, neural network, proc, (20 more...)

arXiv.org Artificial Intelligence

2008.11649

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks > Manufacturer (0.68)
Transportation > Ground (0.46)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback