Goto

Collaborating Authors

 key aspect



UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation

Yang, Ruihan, Zhang, Caiqi, Zhang, Zhisong, Huang, Xinting, Yu, Dong, Collier, Nigel, Yang, Deqing

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are prone to hallucination, particularly in long-form generations. A promising direction to mitigate hallucination is to teach LLMs to express uncertainty explicitly when they lack sufficient knowledge. However, existing work lacks direct and fair evaluation of LLMs' ability to express uncertainty effectively in long-form generation. To address this gap, we first introduce UNCLE, a benchmark designed to evaluate uncertainty expression in both long- and short-form question answering (QA). UNCLE covers five domains and includes more than 1,000 entities, each with paired short- and long-form QA items. Our dataset is the first to directly link short- and long-form QA through aligned questions and gold-standard answers. Along with UNCLE, we propose a suite of new metrics to assess the models' capabilities to selectively express uncertainty. We then demonstrate that current models fail to convey uncertainty appropriately in long-form generation. We further explore both prompt-based and training-based methods to improve models' performance, with the training-based methods yielding greater gains. Further analysis of alignment gaps between short- and long-form uncertainty expression highlights promising directions for future research using UNCLE.



ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation

Nguyen, Truc Mai-Thanh, Nguyen, Dat Minh, Luu, Son T., Van Nguyen, Kiet

arXiv.org Artificial Intelligence

Multimodal Review Helpfulness Prediction (MRHP) is an essential task in recommender systems, particularly in E-commerce platforms. Determining the helpfulness of user-generated reviews enhances user experience and improves consumer decision-making. However, existing datasets focus predominantly on English and Indonesian, resulting in a lack of linguistic diversity, especially for low-resource languages such as Vietnamese. In this paper, we introduce ViMRHP (Vietnamese Multimodal Review Helpfulness Prediction), a large-scale benchmark dataset for MRHP task in Vietnamese. This dataset covers four domains, including 2K products with 46K reviews. Meanwhile, a large-scale dataset requires considerable time and cost. To optimize the annotation process, we leverage AI to assist annotators in constructing the ViMRHP dataset. With AI assistance, annotation time is reduced (90 to 120 seconds per task down to 20 to 40 seconds per task) while maintaining data quality and lowering overall costs by approximately 65%. However, AI-generated annotations still have limitations in complex annotation tasks, which we further examine through a detailed performance analysis. In our experiment on ViMRHP, we evaluate baseline models on human-verified and AI-generated annotations to assess their quality differences. The ViMRHP dataset is publicly available at https://github.com/trng28/ViMRHP


ObjectMover: Generative Object Movement with Video Prior

Yu, Xin, Wang, Tianyu, Kim, Soo Ye, Guerrero, Paul, Chen, Xi, Liu, Qing, Lin, Zhe, Qi, Xiaojuan

arXiv.org Artificial Intelligence

Simple as it seems, moving an object to another location within an image is, in fact, a challenging image-editing task that requires re-harmonizing the lighting, adjusting the pose based on perspective, accurately filling occluded regions, and ensuring coherent synchronization of shadows and reflections while maintaining the object identity. In this paper, we present ObjectMover, a generative model that can perform object movement in highly challenging scenes. Our key insight is that we model this task as a sequence-to-sequence problem and fine-tune a video generation model to leverage its knowledge of consistent object generation across video frames. We show that with this approach, our model is able to adjust to complex real-world scenarios, handling extreme lighting harmonization and object effect movement. As large-scale data for object movement are unavailable, we construct a data generation pipeline using a modern game engine to synthesize high-quality data pairs. We further propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization. Through extensive experiments, we demonstrate that ObjectMover achieves outstanding results and adapts well to real-world scenarios.


Towards Anthropomorphic Conversational AI Part I: A Practical Framework

Wei, Fei, Li, Yaliang, Ding, Bolin

arXiv.org Artificial Intelligence

Large language models (LLMs), due to their advanced natural language capabilities, have seen significant success in applications where the user interface is usually a conversational artificial intelligence (AI) agent and engages the user through multi-round conversations. However, many scenarios require the agents to exhibit stronger social and conversational intelligence and demonstrate more human-like (anthropomorphic) reactions. This is an aspect that foundational LLMs have yet to fully address such that a single call of foundational models might be insufficient. To bridge this gap, we propose a two-stage solution. In this work, we focus on the first stage, introducing a multi-module framework designed to replicate the key aspects of human intelligence involved in conversations. This framework comprises thinking modules for reasoning, resource modules for managing knowledge and external information, and response modules for generating contextually appropriate interactions. With all the modules cooperating, the framework would empower the agents to provide a better human-like conversation experience. In the second stage of our approach, these conversational data, after filtering and labeling, can serve as training and testing data for reinforcement learning, enabling AI to better capture human preferences. This stage is left for future work. In our experiments, volunteers engaged in over 3000 rounds of conversation with the same AI character powered by a standalone LLM and our framework which integrates the same LLM. A separate group of evaluators rated the conversation samples, revealing that our framework significantly enhanced the social and conversational intelligence, even without fine-tuning the LLM.


A Hate Speech Moderated Chat Application: Use Case for GDPR and DSA Compliance

Fillies, Jan, Mitsikas, Theodoros, Schäfermeier, Ralph, Paschke, Adrian

arXiv.org Artificial Intelligence

The detection of hate speech or toxic content online is a complex and sensitive issue. While the identification itself is highly dependent on the context of the situation, sensitive personal attributes such as age, language, and nationality are rarely available due to privacy concerns. Additionally, platforms struggle with a wide range of local jurisdictions regarding online hate speech and the evaluation of content based on their internal ethical norms. This research presents a novel approach that demonstrates a GDPR-compliant application capable of implementing legal and ethical reasoning into the content moderation process. The application increases the explainability of moderation decisions by utilizing user information. Two use cases fundamental to online communication are presented and implemented using technologies such as GPT-3.5, Solid Pods, and the rule language Prova. The first use case demonstrates the scenario of a platform aiming to protect adolescents from potentially harmful content by limiting the ability to post certain content when minors are present. The second use case aims to identify and counter problematic statements online by providing counter hate speech. The counter hate speech is generated using personal attributes to appeal to the user. This research lays the groundwork for future DSA compliance of online platforms. The work proposes a novel approach to reason within different legal and ethical definitions of hate speech and plan the fitting counter hate speech. Overall, the platform provides a fitted protection to users and a more explainable and individualized response. The hate speech detection service, the chat platform, and the reasoning in Prova are discussed, and the potential benefits for content moderation and algorithmic hate speech detection are outlined. A selection of important aspects for DSA compliance is outlined.


Cloud Data Warehouse / Business Intelligence Engineer at Eurofins - Bucharest, Romania

#artificialintelligence

You may not know our name but we can guarantee you know our work – all we do has a positive impact on life, health and the environment. Eurofins is by your side every day, from the food you eat to the medicines you rely on. We work with the biggest companies in the world, making sure the products they supply are safe, their ingredients are authentic and labelling is accurate. As a fast paced growing environment we are looking for natural born leaders that inspire passion in unique individuals and are not afraid to take risks in order to achieve goals. Life at Eurofins is a meritocracy, where people are empowered to make decisions and are rewarded for their success.


What to look out for at AI & Big Data Expo EU and NA: JPMorgan, Danone, and more - AI News

#artificialintelligence

The road to maturity for any technology in the enterprise is long and arduous. Take data and analytics platforms as an example. Data from 451 Research's Voice of the Enterprise series in March found a third of companies surveyed were still yet to fully embrace a data-driven approach to strategic decision making. If that is the case, whither artificial intelligence? Writing for Enterprise Talk earlier this month, Swapnil Mishra notes of businesses still being in the'AI adolescence' phase, citing research from Accenture which found 63% of 1,200 companies polled were still experimenting with projects.


Trustworthy AI: Operationalizing AI Models with Governance – Part 1

#artificialintelligence

Editor's note: Sourav Mazumder is a speaker for ODSC West 2021. Be sure to check out his talk, "Operationalization of Models Developed and Deployed in Heterogeneous Platforms," for more info on trustworthy AI there. Artificial intelligence (AI) is already having a significant impact on the development of humanity, already. For enterprises, the use of AI is not an option anymore. However, the core of AI relies on the use of data samples/examples to train a system/machine using algorithms so that it can behave intelligently like a human.