Personal
Adversarial Bandits with Multi-User Delayed Feedback: Theory and Application
Li, Yandi, Guo, Jianxiong, Li, Yupeng, Wang, Tian, Jia, Weijia
The multi-armed bandit (MAB) models have attracted significant research attention due to their applicability and effectiveness in various real-world scenarios such as resource allocation, online advertising, and dynamic pricing. As an important branch, the adversarial MAB problems with delayed feedback have been proposed and studied by many researchers recently where a conceptual adversary strategically selects the reward distributions associated with each arm to challenge the learning algorithm and the agent experiences a delay between taking an action and receiving the corresponding reward feedback. However, the existing models restrict the feedback to be generated from only one user, which makes models inapplicable to the prevailing scenarios of multiple users (e.g. ad recommendation for a group of users). In this paper, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution. In contrast, the feedback delay is arbitrary and unknown to the player in advance. Also, for different users in a round, the delays in feedback have no assumption of latent correlation. Thus, we formulate an adversarial MAB problem with multi-user delayed feedback and design a modified EXP3 algorithm MUD-EXP3, which makes a decision at each round by considering the importance-weighted estimator of the received feedback from different users. On the premise of known terminal round index $T$, the number of users $M$, the number of arms $N$, and upper bound of delay $d_{max}$, we prove a regret of $\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})})$. Furthermore, for the more common case of unknown $T$, an adaptive algorithm AMUD-EXP3 is proposed with a sublinear regret with respect to $T$. Finally, extensive experiments are conducted to indicate the correctness and effectiveness of our algorithms.
Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits
Schneider, Johannes, Haag, Steffi, Kruse, Leona Chandra
Large language models LLMs like ChatGPT have reached the 100 Mio user barrier in record time and might increasingly enter all areas of our life leading to a diverse set of interactions between those Artificial Intelligence models and humans. While many studies have discussed governance and regulations deductively from first-order principles, few studies provide an inductive, data-driven lens based on observing dialogues between humans and LLMs especially when it comes to non-collaborative, competitive situations that have the potential to pose a serious threat to people. In this work, we conduct a user study engaging over 40 individuals across all age groups in price negotiations with an LLM. We explore how people interact with an LLM, investigating differences in negotiation outcomes and strategies. Furthermore, we highlight shortcomings of LLMs with respect to their reasoning capabilities and, in turn, susceptiveness to prompt hacking, which intends to manipulate the LLM to make agreements that are against its instructions or beyond any rationality. We also show that the negotiated prices humans manage to achieve span a broad range, which points to a literacy gap in effectively interacting with LLMs.
Gender inference: can chatGPT outperform common commercial tools?
Alexopoulos, Michelle, Lyons, Kelly, Mahetaji, Kaushar, Barnes, Marcus Emmanuel, Gutwillinger, Rogan
An increasing number of studies use gender information to understand phenomena such as gender bias, inequity in access and participation, or the impact of the Covid pandemic response. Unfortunately, most datasets do not include self-reported gender information, making it necessary for researchers to infer gender from other information, such as names or names and country information. An important limitation of these tools is that they fail to appropriately capture the fact that gender exists on a non-binary scale, however, it remains important to evaluate and compare how well these tools perform in a variety of contexts. In this paper, we compare the performance of a generative Artificial Intelligence (AI) tool ChatGPT with three commercially available list-based and machine learning-based gender inference tools (Namsor, Gender-API, and genderize.io) on a unique dataset. Specifically, we use a large Olympic athlete dataset and report how variations in the input (e.g., first name and first and last name, with and without country information) impact the accuracy of their predictions. We report results for the full set, as well as for the subsets: medal versus non-medal winners, athletes from the largest English-speaking countries, and athletes from East Asia. On these sets, we find that Namsor is the best traditional commercially available tool. However, ChatGPT performs at least as well as Namsor and often outperforms it, especially for the female sample when country and/or last name information is available. All tools perform better on medalists versus non-medalists and on names from English-speaking countries. Although not designed for this purpose, ChatGPT may be a cost-effective tool for gender prediction. In the future, it might even be possible for ChatGPT or other large scale language models to better identify self-reported gender rather than report gender on a binary scale.
How Strong a Kick Should be to Topple Northeastern's Tumbling Robot?
Salagame, Adarsh, Bhattachan, Neha, Caetano, Andre, McCarthy, Ian, Noyes, Henry, Petersen, Brandon, Qiu, Alexander, Schroeter, Matthew, Smithwick, Nolan, Sroka, Konrad, Widjaja, Jason, Bohra, Yash, Venkatesh, Kaushik, Gangaraju, Kruthika, Ghanem, Paul, Mandralis, Ioannis, Sihite, Eric, Kalantari, Arash, Ramezani, Alireza
How Strong a Kick Should be to Topple Northeastern's Tumbling Robot? Abstract-- Rough terrain locomotion has remained one of the most challenging mobility questions. In 2022, NASA's Innovative Advanced Concepts (NIAC) Program invited US academic institutions to participate NASA's Breakthrough, Innovative & Game-changing (BIG) Idea competition by proposing novel mobility systems that can negotiate extremely rough terrain, lunar bumpy craters. In this competition, Northeastern University won NASA's top Artemis Award award by proposing an articulated robot tumbler called COBRA (Crater Observing Bio-inspired Rolling Articulator). This report briefly explains the underlying principles that made COBRA successful in competing with other concepts ranging from cable-driven to multilegged designs from six other participating US institutions.
Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions
Cao, Shulin, Zhang, Jiajie, Shi, Jiaxin, Lv, Xin, Yao, Zijun, Tian, Qi, Li, Juanzi, Hou, Lei
Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based methods suffer from: 1) Negative retrieval. Unnecessary or incorrect retrieval may mislead the reasoning; 2) Limited sight. Lacking the ability to look backward or forward, a local error in one step will propagate along the chain. In this paper, we propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree). First, LLMs translate a complex question into a query tree, in which each non-root node denotes a sub-question of its parent node. Then, probabilistic reasoning is conducted over the tree, by solving questions from leaf to root considering the confidence of both question decomposing and answering. During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem. For non-leaf nodes, with the hierarchical structure, LLMs have broader sights and are able to globally reason with the information from child nodes, thus recovering from local errors. The experiments on three Complex QA datasets under the open-domain setting show that our approach outperforms SOTA methods significantly, demonstrating the effect of probabilistic tree-of-thought reasoning.
Mods Are Asleep. Quick, Everyone Release AI Products
The turmoil at OpenAI over the past five days has captivated the tech industry and kept entrepreneurs, journalists, and anyone who still has an X account glued to their timelines for the latest emoji updates and lower-case missives. In the meantime, some of the most prominent AI companies--including OpenAI--continued to do what Silicon Valley is known for: Drop new products. The unexpected firing of Sam Altman, OpenAI's CEO, was followed by an avalanche of new AI features from competitors, including Anthropic and Stable Diffusion. On Tuesday afternoon, in the midst of turmoil, OpenAI rolled out ChatGPT with voice capabilities for free to all users. OpenAI had pre-released this in late September, but only for paid users.
The OpenAI meltdown will only accelerate the artificial intelligence race Sarah Kreps
In November 2022, OpenAI launched ChatGPT, a consumer-facing artificial intelligence tool that could hold a conversation with users, answer questions, and generate anything from poems to computer code to health advice. The initial technology was not perfect – it would sometimes "hallucinate", producing convincing but inaccurate information – but its potential generated enormous attention. A year later, ChatGPT's popularity has continued, with 100 million people using it on a weekly basis, and over 92% of Fortune 500 companies and several competitor firms looking to cash in or improve on the technology. But that's not why ChatGPT's creator, OpenAI, was in the news this week. Instead, OpenAI was the center of a fierce philosophical debate about what it means to develop artificial general intelligence for the benefit of humanity. To understand the current debate and its stakes requires going back to OpenAI's founding in December 2015.
Studying Artist Sentiments around AI-generated Artwork
Ali, Safinah, Breazeal, Cynthia
Art created using generated Artificial Intelligence has taken the world by storm and generated excitement for many digital creators and technologists. However, the reception and reaction from artists have been mixed. Concerns about plagiarizing their artworks and styles for datasets and uncertainty around the future of digital art sparked movements in artist communities shunning the use of AI for generating art and protecting artists' rights. Collaborating with these tools for novel creative use cases also sparked hope from some creators. Artists are an integral stakeholder in the rapidly evolving digital creativity industry and understanding their concerns and hopes inform responsible development and use of creativity support tools. In this work, we study artists' sentiments about AI-generated art. We interviewed 7 artists and analyzed public posts from artists on social media platforms Reddit, Twitter and Artstation. We report artists' main concerns and hopes around AI-generated artwork, informing a way forward for inclusive development of these tools.
Language Model Inversion
Morris, John X., Zhao, Wenting, Chiu, Justin T., Shmatikov, Vitaly, Rush, Alexander M.
Language models produce a distribution over the next token; can we use this to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the model's current distribution output. We consider a variety of model access scenarios, and show how even without predictions for every token in the vocabulary we can recover the probability vector through search. On Llama-2 7b, our inversion method reconstructs prompts with a BLEU of 59 and token-level F1 of 78 and recovers 27% of prompts exactly. Language models are autoregressive, outputting the probability of each next token in a sequence conditioned on the preceeding text. This distribution is used to generate future tokens in the sequence. Can this distribution also be used to reconstruct the prompt? In most contexts, this question is pointless, since we have already conditioned on this information. However, increasingly language models are being offered "as a service" where the user may have access to the outputs, but not all of the true prompt. In this context, it may be of interest to users to know the prompt and, perhaps, for the service provider to protect it. This goal has been the focus of "jailbreaking" approaches that attempt to use the forward text generation of the model to reveal the prompt. We formalize this problem of prompt reconstruction as language model inversion, recovering the input prompt conditioned on the language model's next-token probabilities. Interestingly, work in computer vision has shown that probability predictions of image classifiers retain a surprising amount of detail (Dosovitskiy & Brox, 2016), so it is plausible that this also holds for language models. We propose an architecture that predicts prompts by"unrolling" the distribution vector into a sequence that can be processed effectively by a pretrained encoder-decoder language model. This method shows for the first time that language model predictions are mostly invertible: in many cases, we are able to recover very similar inputs to the original, sometimes getting the input text back exactly.
DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency
Zhang, Zhe, Wu, Gaochang, Zhang, Jing, Shen, Chunhua, Tao, Dacheng, Chai, Tianyou
Video semantic segmentation is a pivotal aspect of video representation learning. However, significant domain shifts present a challenge in effectively learning invariant spatio-temporal features across the labeled source domain and unlabeled target domain for video semantic segmentation. To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features. Firstly, we perform bidirectional spatio-temporal fusion at the image sequence level and shallow feature level, leading to the construction of two fused intermediate video domains. This prompts the video semantic segmentation model to consistently learn spatio-temporal features of shared patch sequences which are influenced by domain-specific contexts, thereby mitigating the feature gap between the source and target domain. Secondly, we propose a category-aware feature alignment module to promote the consistency of spatio-temporal features, facilitating adaptation to the target domain. Specifically, we adaptively aggregate the domain-specific deep features of each category along spatio-temporal dimensions, which are further constrained to achieve cross-domain intra-class feature alignment and inter-class feature separation. Extensive experiments demonstrate the effectiveness of our method, which achieves state-of-the-art mIOUs on multiple challenging benchmarks. Furthermore, we extend the proposed DA-STC to the image domain, where it also exhibits superior performance for domain adaptive semantic segmentation. The source code and models will be made available at \url{https://github.com/ZHE-SAPI/DA-STC}.