AITopics

2411.11206

Country: Asia > Singapore (0.04)

Genre:

Workflow (0.68)
Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceFeb-10-2024

Quantum Speedup for Spectral Approximation of Kronecker Products

Gao, Yeqi, Song, Zhao, Zhang, Ruizhe

Given its widespread application in machine learning and optimization, the Kronecker product emerges as a pivotal linear algebra operator. However, its computational demands render it an expensive operation, leading to heightened costs in spectral approximation of it through traditional computation algorithms. Existing classical methods for spectral approximation exhibit a linear dependency on the matrix dimension denoted by $n$, considering matrices of size $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. Our work introduces an innovative approach to efficiently address the spectral approximation of the Kronecker product $A_1 \otimes A_2$ using quantum methods. By treating matrices as quantum states, our proposed method significantly reduces the time complexity of spectral approximation to $O_{d,\epsilon}(\sqrt{n})$.

algorithm, leverage score, matrix, (15 more...)

2402.07027

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Germany (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre:

Overview (0.87)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.89)

arXiv.org Artificial IntelligenceNov-26-2023

Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

Li, Zhihang, Song, Zhao, Wang, Zifan, Yin, Junze

There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after $O(\log(1/\epsilon))$ iterations, our algorithm can find an $\epsilon$-approximate minimizer of the training loss with high probability. Each iteration requires approximately $O(\mathrm{nnz}(C) + d^\omega)$ time, where $d$ is the model size, $C$ is the input matrix, and $\omega < 2.374$ is the matrix multiplication exponent.

diag, second step follow, step follow, (14 more...)

2311.1539

Country:

North America > United States > Virginia (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:

Workflow (0.70)
Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.64)

arXiv.org Artificial IntelligenceNov-22-2023

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

Li, Chenyang, Song, Zhao, Wang, Weixin, Yang, Chiwun

The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients. This approach poses a substantial threat to the privacy of individuals and organizations alike. This research presents a comprehensive analysis of the gradient leakage method when applied specifically to transformer-based models. Through meticulous examination, we showcase the capability to accurately recover data solely from gradients and rigorously investigate the conditions under which gradient attacks can be executed, providing compelling evidence. Furthermore, we reevaluate the approach of introducing additional noise on gradients as a protective measure against gradient attacks. To address this, we outline a theoretical proof that analyzes the associated privacy costs within the framework of differential privacy. Additionally, we affirm the convergence of the Stochastic Gradient Descent (SGD) algorithm under perturbed gradients. The primary objective of this study is to augment the understanding of gradient leakage attack and defense strategies while actively contributing to the development of privacy-preserving techniques specifically tailored for transformer-based models. By shedding light on the vulnerabilities and countermeasures associated with gradient leakage, this research aims to foster advancements in safeguarding sensitive data and upholding privacy in the context of transformer-based models.

diag, exp, step follow, (12 more...)

2311.13624

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
North America > United States > Virginia (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Fujian Province > Fuzhou (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Allen-Zhu, Zeyuan, Li, Yuanzhi

Physics of Language Models: Part 3.2, Knowledge Manipulation

arXiv.org Artificial IntelligenceSep-25-2023

Language models can store vast amounts of factual knowledge, but their ability to use this knowledge for logical reasoning remains questionable. This paper explores a language model's ability to manipulate its stored knowledge during inference. We focus on four manipulation types: retrieval (e.g., "What is person A's attribute X"), classification (e.g., "Is A's attribute X even or odd?"), comparison (e.g., "Is A greater than B in attribute X?") and inverse search (e.g., "Which person's attribute X equals T?") We observe that pre-trained language models like GPT2/3/4 excel in knowledge retrieval but struggle with simple classification or comparison tasks unless Chain of Thoughts (CoTs) are employed during both training and inference. They also perform poorly in inverse knowledge search, irrespective of the prompts. Our primary contribution is a synthetic dataset for a controlled experiment that confirms these inherent weaknesses: a language model cannot efficiently manipulate knowledge from pre-training data, even when such knowledge is perfectly stored and fully extractable in the models, and despite adequate instruct fine-tuning.

knowledge manipulation, language model, physics, (1 more...)

2309.14402

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.87)

#artificialintelligenceApr-10-2023, 00:15:13 GMT

Continuations by Albert Wenger : Thinking About AI: Part 3 - Existential Risk...

Now we are getting to the biggest and weirdest risk of AI: a super intelligence emerging and wiping out humanity in pursuit of its own goals. To a lot of people this seems like a totally absurd idea, held only by a tiny fringe of people who appear weird and borderline culty. It seems so far out there and also so huge that most people wind up dismissing it and/or forgetting about shortly after hearing it. There is a big similarity here to the climate crisis, where the more extreme views are widely dismissed. In case you have not encountered the argument yet, let me give a very brief summary (Nick Bostrom has an entire book on the topic and Eliezer Yudkowsky has been blogging about it for two decades, so this will be super compressed by comparison): A superintelligence when it emerges will be pursuing its own set of goals.

artificial general intelligence, existential risk, superintelligence, (14 more...)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.70)

#artificialintelligenceMar-1-2023, 00:53:19 GMT

Optimize AI/ML workloads for sustainability: Part 3, deployment and monitoring

We're celebrating Earth Day 2022 from 4/22 through 4/29 with posts that highlight how to build, maintain, and refine your workloads for sustainability. AWS estimates that inference (the process of using a trained machine learning [ML] algorithm to make a prediction) makes up 90 percent of the cost of an ML model. Given with AWS you pay for what you use, we estimate that inference also generally equates to most of the resource usage within an ML lifecycle. In Part 3, our final piece in the series, we show you how to reduce the environmental impact of your ML workload once your model is in production. If you missed the first parts of this series, in Part 1, we showed you how to examine your workload to help you 1) evaluate the impact of your workload, 2) identify alternatives to training your own model, and 3) optimize data processing.

inference, ml workload, workload, (15 more...)

Genre: Play > Prospect > Charge (0.36)

Industry:

Energy > Renewable (0.49)
Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceDec-30-2022, 20:05:25 GMT

Digital Storage And Memory Projections For 2023, Part 3

This is my third and last blog on digital storage and memory projections for 2023. The last two articles focused on digital storage and memory devices including magnetic tape, HDDs, SSDs as well as NAND, DRAM and emerging memories. We also covered developments in shared storage and memory networking. This article focuses on developments in digital storage systems and software and their use in various workflows. While there are lingering issues with supply chains and at least partial remote work and remote collaboration seems here to stay, in 2022, we began to recover from the impacts of two years of the COVID pandemic. On the other hand, high inflation rates and tightening of money supplies to try and stem inflation resulted in many technology-driven companies tightening their belts, laying off workers and moderating their IT infrastructure spending in the second half of the year.

application, getty image, storage, (10 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > Maryland > Prince George's County > Suitland (0.05)
North America > United States > Colorado > Denver County > Denver (0.05)
Asia > Thailand > Bangkok > Bangkok (0.05)

Industry:

Banking & Finance > Economy (0.89)
Information Technology > Security & Privacy (0.76)
Health & Medicine > Therapeutic Area (0.75)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Collaboration (0.55)

#artificialintelligenceOct-17-2022, 23:35:51 GMT

The Spatial Web is Coming -- Part 3

Enter The Spatial Web Foundation and VERSES Technologies, a next-gen AI company that is literally laying the foundation for the Spatial Web Protocol by establishing and defining an entirely new computing technology stack comprised of three tiers: Interface, Logic & Data. VERSES has created the Hyperspace Transaction Protocol (HSTP), using Hyperspace Modeling Language (HSML), as the foundation for a common networked terminal, to bring all the interface tier components together in order to facilitate an indexed and searchable Spatial Web Browser of every person, place or thing, both real and digital. As Dan Mapes of VERSES points out, "HTML lets you program a web page -- HSML lets you program a web space." The Logic Tier enables the parsing of this huge amount of new spatial & UX data through cognitive computing methods, powered by VERSES' flagship contextual computing AI Operating System called, COSM . VERSES is Blockchain agnostic which means you can use multiple chains and even operate a hybrid data layer using both DLT technologies and the cloud.

part 3, verse, verse technology, (8 more...)

Industry: Law (0.35)

Technology:

Information Technology > Communications > Web (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.35)

#artificialintelligenceOct-4-2022, 20:53:52 GMT

ABC of Deep Learning (Part 3 of 5)

As referred previously, the gradient descent algorithm is an optimization technique used to find the weights and bias values that minimize a cost function. The backpropagation algorithm is a method of training neural networks that uses gradient descent to minimize the cost function. Backpropagation is a fast and efficient method of training a neural network. Before explaining the backpropagation algorithm, it's crucial to describe the equation behind any artificial neural network. A neural network can be represented by a composition of multivariate functions.

backpropagation algorithm, cost function, training data, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)