Goto

Collaborating Authors

 specific example


Reverse Engineering Human Preferences with Reinforcement Learning

arXiv.org Artificial Intelligence

The capabilities of Large Language Models (LLMs) are routinely evaluated by other LLMs trained to predict human preferences. This framework--known as LLM-as-a-judge--is highly scalable and relatively low cost. However, it is also vulnerable to malicious exploitation, as LLM responses can be tuned to overfit the preferences of the judge. Previous work shows that the answers generated by a candidate-LLM can be edited post hoc to maximise the score assigned to them by a judge-LLM. In this study, we adopt a different approach and use the signal provided by judge-LLMs as a reward to adversarially tune models that generate text preambles designed to boost downstream performance. We find that frozen LLMs pipelined with these models attain higher LLM-evaluation scores than existing frameworks. Crucially, unlike other frameworks which intervene directly on the model's response, our method is virtually undetectable. We also demonstrate that the effectiveness of the tuned preamble generator transfers when the candidate-LLM and the judge-LLM are replaced with models that are not used during training. These findings raise important questions about the design of more reliable LLM-as-a-judge evaluation settings. They also demonstrate that human preferences can be reverse engineered effectively, by pipelining LLMs to optimise upstream preambles via reinforcement learning--an approach that could find future applications in diverse tasks and domains beyond adversarial attacks.


Review for NeurIPS paper: Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference

Neural Information Processing Systems

Additional Feedback: This paper proposes to use Bayesian estimates of fairness metrics. It combines this with Bayesian calibration models (one for each protected attribute value in this particular case) in order to use unlabelled data. In light of existing work (Foulds et al 2019) on Bayesian modelling of fairness, the contribution is rather minor and is limited to the case where we have unlabelled data. The approach the authors use, as it is based on calibration, seems limited to rather specific notions of fairness where Bayesian calibration can be usefully applied. Although in l.64 the definition of calibration is correct, in l. 105-107 you write that s_j P_M(y_j 1 s_j) . Since j is a specific example, there should not be any randomness here.


Conversational Prompt Engineering

arXiv.org Artificial Intelligence

Prompts are how humans communicate with LLMs. Informative prompts are essential for guiding LLMs to produce the desired output. However, prompt engineering is often tedious and time-consuming, requiring significant expertise, limiting its widespread use. We propose Conversational Prompt Engineering (CPE), a user-friendly tool that helps users create personalized prompts for their specific tasks. CPE uses a chat model to briefly interact with users, helping them articulate their output preferences and integrating these into the prompt. The process includes two main stages: first, the model uses user-provided unlabeled data to generate data-driven questions and utilize user responses to shape the initial instruction. Then, the model shares the outputs generated by the instruction and uses user feedback to further refine the instruction and the outputs. The final result is a few-shot prompt, where the outputs approved by the user serve as few-shot examples. A user study on summarization tasks demonstrates the value of CPE in creating personalized, high-performing prompts. The results suggest that the zero-shot prompt obtained is comparable to its - much longer - few-shot counterpart, indicating significant savings in scenarios involving repetitive tasks with large text volumes.


To Stop AI Killing Us All, First Regulate Deepfakes, Says Researcher Connor Leahy

TIME - Tech

Connor Leahy remembers the time he first realized AI was going to kill us all. It was 2019, and OpenAI's GPT-2 had just come out. Leahy downloaded the nascent large language model to his laptop, and took it along to a hackathon at the Technical University of Munich, where he was studying. In a tiny, cramped room, sitting on a couch surrounded by four friends, he booted up the AI system. Even though it could barely string coherent sentences together, Leahy identified in GPT-2 something that had been missing from every other AI model up until that point.


Designing trustworthy and transparent AI systems using assessment tools

#artificialintelligence

The hype around ChatGPT has brought the topic of artificial intelligence and its impressive potential to the fore. At the same time, ensuring the quality and maintaining control of AI systems are becoming increasingly important--especially when these systems take on responsible tasks. After all, the chat-bot's results are based on huge amounts of text data from the internet. That said, systems like ChatGPT only compute the most likely answer to a question and output it as a fact. Researchers from the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS will be showcasing various assessment tools and processes that can be used to systematically examine AI systems for weaknesses throughout their life cycle and safeguard against AI risks at the Hannover Messe 2023 from April 17 to 21 (at the joint Fraunhofer booth A12 in Hall 16).


How to Check if a Classification Model is Overfitted using scikit-learn

#artificialintelligence

One of the hardest problems, when dealing with Machine Learning algorithms, is evaluating whether the trained model performs well with unseen samples. For example, it may happen that a model behaves very well with a given dataset, but it is not able to predict the correct values, when deployed. This discordance between the trained and testing data can be due to different problems. One of the most common problems is overfitting. A model thats fits the training set well but testing set poorly is said to be overfit to the training set and a model that fits both sets poorly is said to be underfit.


Machine learning: Types (part-3)

#artificialintelligence

Inference refers to reaching an outcome or decision. There are different paradigms for inference that may be used as a framework for understanding how some machine learning algorithms work or how some learning problems may be approached. Some examples of approaches to learning are inductive, deductive, and transductive learning and inference. Inductive learning involves using evidence to determine the outcome. Inductive reasoning refers to using specific cases to determine general outcomes, ex- specific to general.


Internal Audit Applications of AI: It Doesn't Have to Be Complicated to Be Effective - The Protiviti View

#artificialintelligence

For many internal auditors, artificial intelligence (AI) may seem like a daunting topic to tackle -- but that shouldn't stop them from considering how they can apply it to their work. Tools and techniques exist that can provide auditors with powerful, straightforward techniques to enhance their work. With an increased focus and urgency around the use of data to support internal audit activities, the time for next-generation pursuits, such as use of AI, is now. Following up on a previous blog post discussing the basics of AI for auditors, here we offer our thoughts on how internal audit organizations can get started with AI methods, such as machine learning (ML), to increase efficiency and coverage, better assign resources to areas that matter most, deliver more insight and even help identify leading indicators of risk. We also offer a specific example of ML applied to internal audit. Machine Learning Doesn't Have to Be Complex ML is an application of AI in which the system itself is designed with the ability to learn and improve from experience.


14 Different Types of Learning in Machine Learning

#artificialintelligence

The use of an environment means that there is no fixed training dataset, rather a goal or set of goals that an agent is required to achieve, actions they may perform, and feedback about performance toward the goal. Some machine learning algorithms do not just experience a fixed dataset. For example, reinforcement learning algorithms interact with an environment, so there is a feedback loop between the learning system and its experiences.


Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment

arXiv.org Artificial Intelligence

Reference corpus for word alignment is an important resource for developing and evaluating word alignment methods. For Myanmar - English language pairs, there is no reference corpus to evaluate the word alignment tasks. Therefore, we created the guidelines f or Myanmar - English word alignment annotation between two languages over contrastive learning and built the Myanmar - English reference corpus consisting of verified alignments from Myanmar ALT of the Asian Language Treebank (ALT). This reference corpus conta ins confident labels sure (S) and possible (P) for word alignments which are used to test for the purpose of evaluation of the word alignments tasks. We discuss the most linking ambiguities to define consistent and systematic instructions to align manual w ords. We evaluated the results of annotators agreement using our reference corpus in terms of alignment error rate (AER) in word alignment tasks and discuss the words relationships in terms of BLEU scores. A bilingual corpus aligned at the level of sentences or words is a precious resource for developing machine translation systems. Word alignment is a fundamental step in extracting translation information from bilingual corpus and determines which words and phrases are translations of each other in the original and translated sentence. In most translation systems, translational correspondences are rather complex; for a language pair such as Myanmar and Eng lish that belong to the different word order languages.