Personal
Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data
Wei, Jing, Kim, Sungdong, Jung, Hyunhoon, Kim, Young-Ho
Large language models (LLMs) provide a new way to build chatbots by accepting natural language prompts. Yet, it is unclear how to design prompts to power chatbots to carry on naturalistic conversations while pursuing a given goal, such as collecting self-report data from users. We explore what design factors of prompts can help steer chatbots to talk naturally and collect data reliably. To this aim, we formulated four prompt designs with different structures and personas. Through an online study (N = 48) where participants conversed with chatbots driven by different designs of prompts, we assessed how prompt designs and conversation topics affected the conversation flows and users' perceptions of chatbots. Our chatbots covered 79% of the desired information slots during conversations, and the designs of prompts and topics significantly influenced the conversation flows and the data collection performance. We discuss the opportunities and challenges of building chatbots with LLMs.
Can large language models generate salient negative statements?
Arnaout, Hiba, Razniewski, Simon
We examine the ability of large language models (LLMs) to generate salient (interesting) negative statements about real-world entities; an emerging research topic of the last few years. We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation, i.e., pattern-based textual extractions and knowledge-graph-based inferences, as well as crowdsourced gold statements. We measure the correctness and salience of the generated lists about subjects from different domains. Our evaluation shows that guided probes do in fact improve the quality of generated negatives, compared to the zero-shot variant. Nevertheless, using both prompts, LLMs still struggle with the notion of factuality of negatives, frequently generating many ambiguous statements, or statements with negative keywords but a positive meaning.
"It's a Fair Game'', or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents
Zhang, Zhiping, Jia, Michelle, Hao-Ping, null, Lee, null, Yao, Bingsheng, Das, Sauvik, Lerner, Ada, Wang, Dakuo, Li, Tianshi
The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To bridge this gap, we analyzed sensitive disclosures in real-world ChatGPT conversations and conducted semi-structured interviews with 19 LLM-based CA users. We found that users are constantly faced with trade-offs between privacy, utility, and convenience when using LLM-based CAs. However, users' erroneous mental models and the dark patterns in system design limited their awareness and comprehension of the privacy risks. Additionally, the human-like interactions encouraged more sensitive disclosures, which complicated users' ability to navigate the trade-offs. We discuss practical design guidelines and the needs for paradigmatic shifts to protect the privacy of LLM-based CA users.
A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications
Zhang, Yi, Zhao, Yuying, Li, Zhaoqing, Cheng, Xueqi, Wang, Yu, Kotevska, Olivera, Yu, Philip S., Derr, Tyler
Privacy attack is a popular and well-developed topic in various fields such as social network analysis, healthcare, finance, system, etc. [88], [89], [90]. During recent years, the surge of machine learning has provided powerful tools to solve many practical problems. However, data-driven approaches also threaten users' privacy due to the associated risks of data leakage and inference [85]. Consequently, a substantial amount of work has been devoted to investigate the vulnerabilities of ML models and the risks of privacy leakage [47]. A branch of privacy research is to develop privacy attack models, which has received much attention during the past few years. However, attack models with respect to GNNs have only been explored very recently because GNN techniques are relatively new compared with CNN/transformers in image/natural language processing(NLP) domains, and the irregular graph structure poses unique challenges to transfer existing attack techniques that are well-established in other domains. In this section, we summarize papers that have developed attack models specifically targeting GNNs. Figure 1: Illustrations of the four categories of privacy attack We classify the privacy attack models on GNN into models on graphs: a) Model extraction attacks (MEA); b) four categories (which are visualized in Figure 4): a) model Graph structure reconstruction (GSR); c) Attribute inference extraction attack (MEA), b) graph structure reconstruction attacks (AIA); and d) Membership inference attacks (MIA).
Sen. Richard Blumenthal Defends His Controversial Bill Regulating Social Media for Kids
For a while now, Washington has been wrestling with two big forces shaping technology: social media and artificial intelligence. Who should do it--and how? Currently, Congress is considering a bill that would regulate how social media companies treat minors: the Kids Online Safety Act. Although it has bipartisan support, KOSA is not without controversy. Several critics have called it "government censorship." One group, the Electronic Frontier Foundation, says it is "one of the most dangerous bills in years."
Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective
Xie, Laixin, Ouyang, Yang, Chen, Longfei, Wu, Ziming, Li, Quan
Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.
Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach
Nagib, Ahmad M., Abou-Zeid, Hatem, Hassanein, Hossam S.
The open radio access network (O-RAN) architecture supports intelligent network control algorithms as one of its core capabilities. Data-driven applications incorporate such algorithms to optimize radio access network (RAN) functions via RAN intelligent controllers (RICs). Deep reinforcement learning (DRL) algorithms are among the main approaches adopted in the O-RAN literature to solve dynamic radio resource management problems. However, despite the benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms in real network deployments falls behind. This is primarily due to the slow convergence and unstable performance exhibited by DRL agents upon deployment and when encountering previously unseen network conditions. In this paper, we address these challenges by proposing transfer learning (TL) as a core component of the training and deployment workflows for the DRL-based closed-loop control of O-RAN functionalities. To this end, we propose and design a hybrid TL-aided approach that leverages the advantages of both policy reuse and distillation TL methods to provide safe and accelerated convergence in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates multiple services, including real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also propose and implement policy reuse and distillation-aided DRL and non-TL-aided DRL as three separate baselines. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios, and a 64.6% decrease in reward variance while maintaining fast convergence and enhancing the generalizability compared with the baselines.
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Leong, Wei Qi, Ngui, Jian Gang, Susanto, Yosephine, Rengarajan, Hamsawardhini, Sarveswaran, Kengatharaiyer, Tjhi, William Chandra
The rapid development of Large Language Models (LLMs) and the emergence of novel abilities with scale have necessitated the construction of holistic, diverse and challenging benchmarks such as HELM and BIG-bench. However, at the moment, most of these benchmarks focus only on performance in English and evaluations that include Southeast Asian (SEA) languages are few in number. We therefore propose BHASA, a holistic linguistic and cultural evaluation suite for LLMs in SEA languages. It comprises three components: (1) a NLP benchmark covering eight tasks across Natural Language Understanding (NLU), Generation (NLG) and Reasoning (NLR) tasks, (2) LINDSEA, a linguistic diagnostic toolkit that spans the gamut of linguistic phenomena including syntax, semantics and pragmatics, and (3) a cultural diagnostics dataset that probes for both cultural representation and sensitivity. For this preliminary effort, we implement the NLP benchmark only for Indonesian, Vietnamese, Thai and Tamil, and we only include Indonesian and Tamil for LINDSEA and the cultural diagnostics dataset. As GPT-4 is purportedly one of the best-performing multilingual LLMs at the moment, we use it as a yardstick to gauge the capabilities of LLMs in the context of SEA languages. Our initial experiments on GPT-4 with BHASA find it lacking in various aspects of linguistic capabilities, cultural representation and sensitivity in the targeted SEA languages. BHASA is a work in progress and will continue to be improved and expanded in the future.
Obituary That Called Late NBA Player 'Useless' Sparks Firestorm
Social media users hurled criticism at Microsoft this week for what many thought was an AI-generated obituary for NBA player Brandon Hunter on its website MSN. The controversy began after the obituary -- which had a headline that read "Brandon Hunter useless at 42" written by "Editor" -- appeared on the Microsoft-owned platform after Hunter's death on Tuesday. The obituary goes on to refer to the former Boston Celtics and Orlando Magic player having been "handed away on the age of 42" and claimed he "performed in 67 video games over two seasons and achieved a career-high of 17 factors in a recreation in opposition to the Milwaukee Bucks in 2004." The post appeared to follow a similar format to a story on TMZ Sports, Futurism noted, "albeit with altered punctuation and a use of synonyms so liberal that the result is essentially incomprehensible." You can compare both the obituary containing the error and the TMZ Sports story here.
Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata
Zhang, Bohui, Reklos, Ioannis, Jain, Nitisha, Peñuela, Albert Meroño, Simperl, Elena
In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. We developed a pipeline using LLMs for Knowledge Engineering (LLMKE), combining knowledge probing and Wikidata entity mapping. The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge. The implementation is available at https://github.com/bohuizhang/LLMKE.