AITopics | part 4

Collaborating Authors

part 4 Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

Li, Zhihang, Song, Zhao, Wang, Zifan, Yin, Junze

arXiv.org Artificial IntelligenceNov-26-2023

There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after $O(\log(1/\epsilon))$ iterations, our algorithm can find an $\epsilon$-approximate minimizer of the training loss with high probability. Each iteration requires approximately $O(\mathrm{nnz}(C) + d^\omega)$ time, where $d$ is the model size, $C$ is the input matrix, and $\omega < 2.374$ is the matrix multiplication exponent.

diag, second step follow, step follow, (14 more...)

arXiv.org Artificial Intelligence

2311.1539

Country:

North America > United States > Virginia (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:

Workflow (0.70)
Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.64)

Add feedback

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

Li, Chenyang, Song, Zhao, Wang, Weixin, Yang, Chiwun

arXiv.org Artificial IntelligenceNov-22-2023

The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients. This approach poses a substantial threat to the privacy of individuals and organizations alike. This research presents a comprehensive analysis of the gradient leakage method when applied specifically to transformer-based models. Through meticulous examination, we showcase the capability to accurately recover data solely from gradients and rigorously investigate the conditions under which gradient attacks can be executed, providing compelling evidence. Furthermore, we reevaluate the approach of introducing additional noise on gradients as a protective measure against gradient attacks. To address this, we outline a theoretical proof that analyzes the associated privacy costs within the framework of differential privacy. Additionally, we affirm the convergence of the Stochastic Gradient Descent (SGD) algorithm under perturbed gradients. The primary objective of this study is to augment the understanding of gradient leakage attack and defense strategies while actively contributing to the development of privacy-preserving techniques specifically tailored for transformer-based models. By shedding light on the vulnerabilities and countermeasures associated with gradient leakage, this research aims to foster advancements in safeguarding sensitive data and upholding privacy in the context of transformer-based models.

diag, exp, step follow, (12 more...)

arXiv.org Artificial Intelligence

2311.13624

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
North America > United States > Virginia (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Fujian Province > Fuzhou (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Unified Scheme of ResNet and Softmax

Song, Zhao, Wang, Weixin, Yin, Junze

arXiv.org Machine LearningSep-23-2023

Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical components supporting the functionality of LLMs but also are related to many other machine learning and theoretical computer science fields, including but not limited to image classification, object detection, semantic segmentation, and tensors. Previous research works studied these two concepts separately. In this paper, we provide a theoretical analysis of the regression problem: $\| \langle \exp(Ax) + A x , {\bf 1}_n \rangle^{-1} ( \exp(Ax) + Ax ) - b \|_2^2$, where $A$ is a matrix in $\mathbb{R}^{n \times d}$, $b$ is a vector in $\mathbb{R}^n$, and ${\bf 1}_n$ is the $n$-dimensional vector whose entries are all $1$. This regression problem is a unified scheme that combines softmax regression and ResNet, which has never been done before. We derive the gradient, Hessian, and Lipschitz properties of the loss function. The Hessian is shown to be positive semidefinite, and its structure is characterized as the sum of a low-rank matrix and a diagonal matrix. This enables an efficient approximate Newton method. As a result, this unified scheme helps to connect two previously thought unrelated fields and provides novel insight into loss landscape and optimization for emerging over-parameterized neural networks, which is meaningful for future research in deep learning models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2309.13482

Country:

North America > United States > Virginia (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
Europe > Portugal (0.04)

Genre:

Research Report (0.64)
Workflow (0.52)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

Gao, Yeqi, Song, Zhao, Wang, Weixin, Yin, Junze

arXiv.org Machine LearningSep-14-2023

Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}}(n,n,d) + {\cal T}_{\mathrm{mat}}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Machine Learning

2309.07418

Country:

North America > United States > Virginia (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Middle East > Yemen > Amran Governorate > Amran (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Create & Sell AI Art -- Part 4. So putting everything together, he is…

#artificialintelligenceDec-13-2022, 22:36:46 GMT

So putting everything together, he is going to create a design and put it on Etsy. Specifically, he is going to sell wall art. As a starting image, he surfed the website Unsplash and used that as a reference image. First, he converted images to text or image captioning.

create & sell ai art, part 4

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.89)

Add feedback

ABC of Deep Learning (Part 4 of 5)

#artificialintelligenceOct-22-2022, 08:52:33 GMT

Network Architecture refers to the overall neural network structure, including the number of layers, the number of neurons in each layer, and the connections between them. There is not a single best network architecture for all problems; the best architecture depends on the specific problem being solved. This is also a very active area of research, with new architectures being constantly proposed. Some wildly successful architectures have been proposed in recent years, such as convolutional neural networks and recurrent neural networks. The most advanced architectures are usually based on a combination of these basic types.

architecture, neural network, sequence, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Computer Systems Performance Modeling And Evaluation - Part 4

#artificialintelligenceMay-26-2022, 00:03:47 GMT

It is developed in conjunction with the HINT benchmark program. It defines the quality of the solution as a user's final goal. The quality is rigorously defined on the basis of the mathematical characteristics of the problem being solved. Dividing the measure of solution quality by the time taken to achieve that level of quality produces QUIPS. It has several of the characteristics of a good performance metric. The mathematically precise definition of quality for the defined problem makes this metric insensitive to outside influences (characteristic 6) and entirely self-consistent when it is ported to different machines (characteristic 5).

computer system performance modeling, performance metric, system performance modeling and evaluation, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Text Summarization, Part 4 -- Twitter bot for Automatic Summarization of Paper Abstracts

#artificialintelligenceMar-25-2022, 01:15:33 GMT

The last three chapters (chapter 1, chapter 2 and chapter 3) were dedicated to the theoretical aspect and the underlying structure of various Text Summarization methods. This chapter shows a concrete…

automatic summarization, summarization, twitter bot, (10 more...)

#artificialintelligence

Genre: Research Report (0.32)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

APOSTLE TALK - Future News Now! : THERE'S MORE THAN ARTIFICIAL INTELLIGENCE – PART 4

#artificialintelligenceMay-12-2021, 06:13:44 GMT

Scientists have developed software that can look minutes into the future. Gravity in this galaxy [even outside our solar system] behaves as predicted by Albert Einstein's general theory of relativity, confirming the theory's validity on galactic scales. FYI: Our Sun is just ONE STAR among the hundreds of billions of stars in our Milky Way Galaxy. The technology is being built into the official postal system in countries like Mongolia, Ivory Coast and Nigeria, and Mercedes, an investor, is incorporating "What3Words" navigation into its cars. FYI: Each 10-foot-square patch of Planet Earth is labeled with three words -- 57 trillion squares altogether.

apostle talk, information technology, interface, (10 more...)

#artificialintelligence

Country:

Asia > Mongolia (0.25)
Africa > Nigeria (0.25)
Africa > Côte d'Ivoire (0.25)
North America > United States > California (0.05)

Genre: Instructional Material (0.31)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.74)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.30)

Add feedback

Exotic Programming Ideas: Part 4 (Datalog)

#artificialintelligenceDec-5-2020, 14:10:14 GMT

Continuing on in our series on exotic programming ideas, we're going to explore the topic of logic programming and a particular form known as datalog. Datalog is executed by a query processor that given these two inputs, finds all instance of facts implied by both the databased and rules. For our examples we're going to be coding our examples in the Souffle language. The namesake of the language is an acronym for the Systematic, Ontological, Undiscovered Fact Finding Logic Engine. Souffle is a minimalist datalog system designed for complex queries over large data sets, such as those encountered in the context of doing static program analysis over large codebases.

datalog, relation, soufflé, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.36)
Information Technology > Software > Programming Languages (0.30)

Add feedback