AITopics

2403.04931

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(10 more...)

arXiv.org Artificial IntelligenceJun-26-2024

Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization

Tseng, Yu-Min, Huang, Yu-Chao, Hsiao, Teng-Yun, Chen, Wei-Lin, Huang, Chao-Wei, Meng, Yu, Chen, Yun-Nung

The concept of persona, originally adopted in dialogue literature, has re-surged as a promising framework for tailoring large language models (LLMs) to specific context (e.g., personalized search, LLM-as-a-judge). However, the growing research on leveraging persona in LLMs is relatively disorganized and lacks a systematic taxonomy. To close the gap, we present a comprehensive survey to categorize the current state of the field. We identify two lines of research, namely (1) LLM Role-Playing, where personas are assigned to LLMs, and (2) LLM Personalization, where LLMs take care of user personas. Additionally, we introduce existing methods for LLM personality evaluation. To the best of our knowledge, we present the first survey for role-playing and personalization in LLMs under the unified view of persona. We continuously maintain a paper collection to foster future endeavors: https://github.com/MiuLab/PersonaLLM-Survey

language model, llm, preprint arxiv, (14 more...)

2406.01171

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Middle East > Malta (0.04)
(11 more...)

Genre: Overview (1.00)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.93)
Leisure & Entertainment > Games > Computer Games (0.93)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Correia, Alvaro H. C., Massoli, Fabio Valerio, Louizos, Christos, Behboodi, Arash

An Information Theoretic Perspective on Conformal Prediction

arXiv.org Machine LearningJun-26-2024

Conformal Prediction (CP) is a distribution-free uncertainty estimation framework that constructs prediction sets guaranteed to contain the true answer with a user-specified probability. Intuitively, the size of the prediction set encodes a general notion of uncertainty, with larger sets associated with higher degrees of uncertainty. In this work, we leverage information theory to connect conformal prediction to other notions of uncertainty. More precisely, we prove three different ways to upper bound the intrinsic uncertainty, as described by the conditional entropy of the target variable given the inputs, by combining CP with information theoretical inequalities. Moreover, we demonstrate two direct and useful applications of such connection between conformal prediction and information theory: (i) more principled and effective conformal training objectives that generalize previous approaches and enable end-to-end training of machine learning models from scratch, and (ii) a natural mechanism to incorporate side information into conformal prediction. We empirically validate both applications in centralized and federated learning settings, showing our theoretical results translate to lower inefficiency (average prediction set size) for popular CP methods.

conformal prediction, inequality, prediction, (15 more...)

arXiv.org Machine Learning

2405.0214

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Japan (0.04)
(7 more...)

Genre:

Overview (0.92)
Research Report > New Finding (0.46)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Zammit-Mangion, Andrew, Sainsbury-Dale, Matthew, Huser, Raphaël

Neural Methods for Amortised Inference

arXiv.org Machine LearningJun-26-2024

Simulation-based methods for statistical inference have evolved dramatically over the past 50 years, keeping pace with technological advancements. The field is undergoing a new revolution as it embraces the representational capacity of neural networks, optimisation libraries and graphics processing units for learning complex mappings between data and inferential targets. The resulting tools are amortised, in the sense that they allow rapid inference through fast feedforward operations. In this article we review recent progress in the context of point estimation, approximate Bayesian inference, summary-statistic construction, and likelihood approximation. We also cover software, and include a simple illustration to showcase the wide array of tools available for amortised inference and the benefits they offer over Markov chain Monte Carlo methods. The article concludes with an overview of relevant topics and an outlook on future research directions.

inference, neural network, posterior distribution, (15 more...)

arXiv.org Machine Learning

2404.12484

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(11 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Atanasov, Alexander, Zavatone-Veth, Jacob A., Pehlevan, Cengiz

Scaling and renormalization in high-dimensional regression

arXiv.org Machine LearningJun-26-2024

This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

df 1, matrix, random feature model, (14 more...)

arXiv.org Machine Learning

2405.00592

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Koshkin, Roman, Sudoh, Katsuhito, Nakamura, Satoshi

LLMs Are Zero-Shot Context-Aware Simultaneous Translators

The advent of transformers has fueled progress in machine translation. More recently large language models (LLMs) have come to the spotlight thanks to their generality and strong performance in a wide range of language tasks, including translation. Here we show that open-source LLMs perform on par with or better than some state-of-the-art baselines in simultaneous machine translation (SiMT) tasks, zero-shot. We also demonstrate that injection of minimal background information, which is easy with an LLM, brings further performance gains, especially on challenging technical subject-matter. This highlights LLMs' potential for building next generation of massively multilingual, context-aware and terminologically accurate SiMT systems that require no resource-intensive training or fine-tuning.

large language model, machine learning, translation, (20 more...)

2406.13476

Country:

Europe (1.00)
Asia > Middle East (0.93)
North America > United States (0.93)

Genre: Overview (1.00)

Industry:

Government (1.00)
Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kepel, Daan, Valogianni, Konstantina

Autonomous Prompt Engineering in Large Language Models

Prompt engineering is a crucial yet challenging task for optimizing the performance of large language models (LLMs) on customized tasks. This pioneering research introduces the Automatic Prompt Engineering Toolbox (APET), which enables GPT-4 to autonomously apply prompt engineering techniques. By leveraging sophisticated strategies such as Expert Prompting, Chain of Thought, and Tree of Thoughts, APET empowers GPT-4 to dynamically optimize prompts, resulting in substantial improvements in tasks like Word Sorting (4.4% increase) and Geometric Shapes (6.8% increase). Despite encountering challenges in complex tasks such as Checkmate in One (-14.8%), these findings demonstrate the transformative potential of APET in automating complex prompt optimization processes without the use of external data. Overall, this research represents a significant leap in AI development, presenting a robust framework for future innovations in autonomous AI systems and highlighting the ability of GPT-4 to bring prompt engineering theory to practice. It establishes a foundation for enhancing performance in complex task performance and broadening the practical applications of these techniques in real-world scenarios.

arxiv, gpt-4, llm, (11 more...)

2407.11

Country: Asia > Indonesia > Bali (0.04)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

Jin, Haibo, Hu, Leyang, Li, Xinuo, Zhang, Peiyan, Chen, Chonghan, Zhuang, Jun, Wang, Haohan

The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking--deliberately circumventing the ethical and operational boundaries of LLMs and VLMs--and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models. More details can be found on our website: \url{https://chonghan-chen.com/llm-jailbreak-zoo-survey/}.

arxiv preprint arxiv, language model, llm, (13 more...)

2407.01599

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Idaho > Ada County > Boise (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.85)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nandam, Srinivasa Rao, Atito, Sara, Feng, Zhenhua, Kittler, Josef, Awais, Muhammad

Pseudo Labelling for Enhanced Masked Autoencoders

Masked Image Modeling (MIM)-based models, such as SdAE, CAE, GreenMIM, and MixAE, have explored different strategies to enhance the performance of Masked Autoencoders (MAE) by modifying prediction, loss functions, or incorporating additional architectural components. In this paper, we propose an enhanced approach that boosts MAE performance by integrating pseudo labelling for both class and data tokens, alongside replacing the traditional pixel-level reconstruction with token-level reconstruction. This strategy uses cluster assignments as pseudo labels to promote instance-level discrimination within the network, while token reconstruction requires generation of discrete tokens encapturing local context. The targets for pseudo labelling and reconstruction needs to be generated by a teacher network. To disentangle the generation of target pseudo labels and the reconstruction of the token features, we decouple the teacher into two distinct models, where one serves as a labelling teacher and the other as a reconstruction teacher. This separation proves empirically superior to a single teacher, while having negligible impact on throughput and memory consumption. Incorporating pseudo-labelling as an auxiliary task has demonstrated notable improvements in ImageNet-1K and other downstream tasks, including classification, semantic segmentation, and detection.

pretext task, pseudo, reconstruction, (12 more...)

2406.1745

Country: Europe > United Kingdom > England > Surrey > Guildford (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

MT2ST: Adaptive Multi-Task to Single-Task Learning

Liu, Dong, Jiang, Meng

The conventional training approaches often face challenges in balancing the breadth of multi-task learning (MTL) with the depth of single-task learning (STL). To address this issue, we introduce the Multi-Task to Single-Task (MT2ST) framework, a groundbreaking approach that can combine the generalizability of MTL with the precision of STL. Our work include two strategies: 'Diminish' and 'Switch'. 'Diminish' Strategy will gradually reduce the influence of auxiliary tasks, while the 'Switch' strategy involves a shift from multi-tasking to single-tasking at a specific timepoint at the training process. In this paper, we propose the Multi-Task to Single-Task (MT2ST) framework, a novel approach that significantly enhances the efficiency and accuracy of word embedding training while concurrently addressing prevalent issues such as overfitting. Our empirical studies demonstrate that MT2ST can reduce training time by 67\% when contrasted with single-task learning approaches, and by 13\% compared to traditional multi-task learning methods. These findings underscore MT2ST's potential to be a powerful tools for word embedding training acceleration.

learning, mt2st, multi-task learning, (13 more...)

2406.18038

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.54)
Overview > Innovation (0.54)
Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)