AITopics

2503.20981

Country:

North America > United States > California (0.46)
North America > United States > Florida (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Consumer Health (0.94)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningNov-25-2024

Fast training of large kernel models with delayed projections

Abedsoltan, Amirhesam, Ma, Siyuan, Pandit, Parthe, Belkin, Mikhail

Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible, pushing the practical limits of kernel-based learning. They have also served as the foundation 2024) leverage the Nyström Approximation (NA) in combination for understanding many significant phenomena in with other strategies to enhance performance. Despite these advantages, ASkotch combines it with block coordinate descent, the scalability of kernel methods has remained a persistent whereas Falkon combines it with the Conjugate Gradient challenge, particularly when applied to large datasets. However, this limitation is critical for expanding the utility these strategies are limited by model size due to memory of kernel-based techniques in modern machine learning applications.

artificial intelligence, eigenpro 4, machine learning, (18 more...)

2411.16658

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

arXiv.org Artificial IntelligenceJun-12-2024

Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character

Ma, Siyuan, Luo, Weidi, Wang, Yu, Liu, Xiaogeng

With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), ensuring their safety has become increasingly critical. To achieve this objective, it requires us to proactively discover the vulnerability of MLLMs by exploring the attack methods. Thus, structure-based jailbreak attacks, where harmful semantic content is embedded within images, have been proposed to mislead the models. However, previous structure-based jailbreak methods mainly focus on transforming the format of malicious queries, such as converting harmful content into images through typography, which lacks sufficient jailbreak effectiveness and generalizability. To address these limitations, we first introduce the concept of "Role-play" into MLLM jailbreak attacks and propose a novel and effective method called Visual Role-play (VRP). Specifically, VRP leverages Large Language Models to generate detailed descriptions of high-risk characters and create corresponding images based on the descriptions. When paired with benign role-play instruction texts, these high-risk character images effectively mislead MLLMs into generating malicious responses by enacting characters with negative attributes. We further extend our VRP method into a universal setup to demonstrate its generalizability. Extensive experiments on popular benchmarks show that VRP outperforms the strongest baseline, Query relevant and FigStep, by an average Attack Success Rate (ASR) margin of 14.3% across all models.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

2405.20773

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceDec-27-2023

Understanding News Creation Intents: Frame, Dataset, and Method

Wang, Zhengjia, Wang, Danding, Sheng, Qiang, Cao, Juan, Su, Silong, Sun, Yifan, Hu, Beizhe, Ma, Siyuan

As the disruptive changes in the media economy and the proliferation of alternative news media outlets, news intent has progressively deviated from ethical standards that serve the public interest. News intent refers to the purpose or intention behind the creation of a news article. While the significance of research on news intent has been widely acknowledged, the absence of a systematic news intent understanding framework hinders further exploration of news intent and its downstream applications. To bridge this gap, we propose News INTent (NINT) frame, the first component-aware formalism for understanding the news creation intent based on research in philosophy, psychology, and cognitive science. Within this frame, we define the news intent identification task and provide a benchmark dataset with fine-grained labels along with an efficient benchmark method. Experiments demonstrate that NINT is beneficial in both the intent identification task and downstream tasks that demand a profound understanding of news. This work marks a foundational step towards a more systematic exploration of news creation intents.

data mining, large language model, machine learning, (21 more...)

2312.1649

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceAug-21-2023

Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models

Liu, Yiheng, Han, Tianle, Ma, Siyuan, Zhang, Jiayue, Yang, Yuanyuan, Tian, Jiaming, He, Hao, Li, Antong, He, Mengshen, Liu, Zhengliang, Wu, Zihao, Zhao, Lin, Zhu, Dajiang, Li, Xiang, Qiang, Ning, Shen, Dingang, Liu, Tianming, Ge, Bao

This paper presents a comprehensive survey of ChatGPT-related (GPT-3.5 and GPT-4) research, state-of-the-art large language models (LLM) from the GPT series, and their prospective applications across diverse domains. Indeed, key innovations such as large-scale pre-training that captures knowledge across the entire world wide web, instruction fine-tuning and Reinforcement Learning from Human Feedback (RLHF) have played significant roles in enhancing LLMs' adaptability and performance. We performed an in-depth analysis of 194 relevant papers on arXiv, encompassing trend analysis, word cloud representation, and distribution analysis across various application domains. The findings reveal a significant and increasing interest in ChatGPT-related research, predominantly centered on direct natural language processing applications, while also demonstrating considerable potential in areas ranging from education and history to mathematics, medicine, and physics. This study endeavors to furnish insights into ChatGPT's capabilities, potential implications, ethical concerns, and offer direction for future advancements in this field.

artificial intelligence, machine learning, natural language, (16 more...)

doi: 10.1016/j.metrad.2023.100017

2304.01852

Country:

North America > United States (0.67)
Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

arXiv.org Artificial IntelligenceDec-6-2022

Learning the joint distribution of two sequences using little or no paired data

Mariooryad, Soroosh, Shannon, Matt, Ma, Siyuan, Bagby, Tom, Kao, David, Stanton, Daisy, Battenberg, Eric, Skerry-Ryan, RJ

A classical ASR approach treats the process of generating speech as a noisy channel. In this framing, text is drawn from some distribution and statistically transformed into We present a noisy channel generative model speech audio; the speech recognition task is then to invert of two sequences, for example text and speech, this generative model to infer the text most likely to have which enables uncovering the association between given rise to a given speech waveform. This generative the two modalities when limited paired data is model of speech was historically successful (Baker, 1975; available. To address the intractability of the exact Jelinek, 1976; Rabiner, 1989), but has been superseded in model under a realistic data setup, we propose modern discriminative systems by directly modeling the a variational inference approximation. To train conditional distribution of text, given speech (Graves et al., this variational model with categorical data, we 2006; Amodei et al., 2016). The direct approach has the advantage propose a KL encoder loss approach which has of allowing limited modeling power to be solely devoted connections to the wake-sleep algorithm. Identifying to the task of interest, whereas the generative one can the joint or conditional distributions by only be extremely sensitive to faulty assumptions in the speech observing unpaired samples from the marginals is audio model despite the fact that this is not the primary only possible under certain conditions in the data object of interest. However the generative approach allows distribution and we discuss under what type of learning in a principled way from untranscribed speech conditional independence assumptions that might audio, something fundamentally impossible in the direct approach.

artificial intelligence, machine learning, natural language, (19 more...)

2212.03232

Country: North America > United States (0.28)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Machine LearningDec-28-2018

Reconciling modern machine learning and the bias-variance trade-off

Belkin, Mikhail, Hsu, Daniel, Ma, Siyuan, Mandal, Soumik

The question of generalization in machine learning---how algorithms are able to learn predictors from a training sample to make accurate predictions out-of-sample---is revisited in light of the recent breakthroughs in modern machine learning technology. The classical approach to understanding generalization is based on bias-variance trade-offs, where model complexity is carefully calibrated so that the fit on the training sample reflects performance out-of-sample. However, it is now common practice to fit highly complex models like deep neural networks to data with (nearly) zero training error, and yet these interpolating predictors are observed to have good out-of-sample accuracy even for noisy data. How can the classical understanding of generalization be reconciled with these observations from modern machine learning practice? In this paper, we bridge the two regimes by exhibiting a new "double descent" risk curve that extends the traditional U-shaped bias-variance curve beyond the point of interpolation. Specifically, the curve shows that as soon as the model complexity is high enough to achieve interpolation on the training sample---a point that we call the "interpolation threshold"---the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models. The double descent risk curve is demonstrated for a broad range of models, including neural networks and random forests, and a mechanism for producing this behavior is posited.

deep learning, educational technology, interpolation threshold, (17 more...)

1812.11118

Country:

North America > United States > New York (0.14)
North America > United States > Ohio (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Machine LearningNov-5-2018

On exponential convergence of SGD in non-convex over-parametrized learning

Bassily, Raef, Belkin, Mikhail, Ma, Siyuan

Stochastic Gradient Descent and its variants have become a staple of the algorithmic foundations of machine learning. Yet many of its properties are not fully understood, particularly in non-convex settings common in modern practice. In this note, we study convergence of Stochastic Gradient Descent (SGD) for the class of functions satisfying the Polyak-Lojasiewicz (PL) condition. This class contains all strongly-convex functions as well as a broad range of non-convex functions including those used in machine learning applications (see the discussion below). The primary purpose of this note is to show that in the interpolation setting (common in modern overparametrized machine learning and studied in our previous work [8]) SGD with fixed step size has exponential convergence for the functions satisfying the PL condition. To the best of our knowledge, this is the first such exponential convergence result for a class of non-convex functions. Below, we discuss and highlight a number of aspects of the PL condition which differentiate it from the convex setting and make it more relevant to the practice and requirements of many machine learning problems.

artificial intelligence, exponential convergence, machine learning, (16 more...)

1811.02564

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

arXiv.org Machine LearningNov-5-2018

Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement

Hui, Like, Ma, Siyuan, Belkin, Mikhail

We apply a fast kernel method for mask-based single-channel speech enhancement. Specifically, our method solves a kernel regression problem associated to a non-smooth kernel function (exponential power kernel) with a highly efficient iterative method (EigenPro). Due to the simplicity of this method, its hyper-parameters such as kernel bandwidth can be automatically and efficiently selected using line search with subsamples of training data. We observe an empirical correlation between the regression loss (mean square error) and regular metrics for speech enhancement. This observation justifies our training target and motivates us to achieve lower regression loss by training separate kernel model per frequency subband. We compare our method with the state-of-the-art deep neural networks on mask-based HINT and TIMIT. Experimental results show that our kernel method consistently outperforms deep neural networks while requiring less training time.

deep learning, neural network, survey article, (19 more...)

1811.02095

Country: North America > United States (0.47)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-15-2018

Learning kernels that adapt to GPU

Ma, Siyuan, Belkin, Mikhail

In recent years machine learning methods that nearly interpolate the data have achieved remarkable success. In many settings achieving near-zero training error leads to excellent test results. In this work we show how the mathematical and conceptual simplicity of interpolation can be harnessed to construct a framework for very efficient, scalable and accurate kernel machines. Our main innovation is in constructing kernel machines that output solutions mathematically equivalent to those obtained using standard kernels, yet capable of fully utilizing the available computing power of a parallel computational resource, such as GPU. Such utilization is key to strong performance since much of the computational resource capability is wasted by the standard iterative methods. The computational resource and data adaptivity of our learned kernels is based on theoretical convergence bounds. The resulting algorithm, which we call EigenPro 2.0, is accurate, principled and very fast. For example, using a single GPU, training on ImageNet with $1.3\times 10^6$ data points and $1000$ labels takes under an hour, while smaller datasets, such as MNIST, take seconds. Moreover, as the parameters are chosen analytically, based on the theory, little tuning beyond selecting the kernel and kernel parameter is needed, further facilitating the practical use of these methods.

deep learning, iteration, neural network, (18 more...)

1806.06144

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)