AITopics | Banijamali, Ershad

Collaborating Authors

Banijamali, Ershad

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models

Zhou, Jiajun, Yang, Yifan, Zhen, Kai, Liu, Ziyue, Zhao, Yequan, Banijamali, Ershad, Mouchtaris, Athanasios, Wong, Ngai, Zhang, Zheng

arXiv.org Artificial IntelligenceFeb-17-2025

Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks. Traditional fine-tuning methods such as stochastic gradient descent and Adam optimization require backpropagation, which are error-prone in the low-precision settings. To overcome these limitations, we propose the Quantized Zeroth-Order (QuZO) framework, specifically designed for fine-tuning LLMs through low-precision (e.g., 4- or 8-bit) forward passes. Our method can avoid the error-prone low-precision straight-through estimator, and utilizes optimized stochastic rounding to mitigate the increased bias. QuZO simplifies the training process, while achieving results comparable to first-order methods in ${\rm FP}8$ and superior accuracy in ${\rm INT}8$ and ${\rm INT}4$ training. Experiments demonstrate that low-bit training QuZO achieves performance comparable to MeZO optimization on GLUE, Multi-Choice, and Generation tasks, while reducing memory cost by $2.94 \times$ in LLaMA2-7B fine-tuning compared to quantized first-order methods.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2502.12346

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

CALICO: Conversational Agent Localization via Synthetic Data Generation

Rosenbaum, Andy, Kharazmi, Pegah, Banijamali, Ershad, Zeng, Lu, DiPersio, Christopher, Wei, Pan, Oz, Gokmen, Chung, Clement, Owczarzak, Karolina, Triefenbach, Fabian, Hamza, Wael

arXiv.org Artificial IntelligenceDec-6-2024

We present CALICO, a method to fine-tune Large Language Models (LLMs) to localize conversational agent training data from one language to another. For slots (named entities), CALICO supports three operations: verbatim copy, literal translation, and localization, i.e. generating slot values more appropriate in the target language, such as city and airport names located in countries where the language is spoken. Furthermore, we design an iterative filtering mechanism to discard noisy generated samples, which we show boosts the performance of the downstream conversational agent. To prove the effectiveness of CALICO, we build and release a new human-localized (HL) version of the MultiATIS++ travel information test set in 8 languages. Compared to the original human-translated (HT) version of the test set, we show that our new HL version is more challenging. We also show that CALICO out-performs state-of-the-art LINGUIST (which relies on literal slot translation out of context) both on the HT case, where CALICO generates more accurate slot translations, and on the HL case, where CALICO generates localized slots which are closer to the HL test set.

computational linguistic, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.05388

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Industry: Transportation > Air (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
(2 more...)

Add feedback

Deep Variational Sufficient Dimensionality Reduction

Banijamali, Ershad, Karimi, Amir-Hossein, Ghodsi, Ali

arXiv.org Machine LearningDec-18-2018

We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved. We propose DVSDR, a deep variational approach for sufficient dimensionality reduction. The deep structure in our model has a bottleneck that represent the low-dimensional embedding of the data. We explain the SDR problem using graphical models and use the framework of variational autoencoders to maximize the lower bound of the log-likelihood of the joint distribution of the observation and label. We show that such a maximization problem can be interpreted as solving the SDR problem. DVSDR can be easily adopted to semi-supervised learning setting. In our experiment we show that DVSDR performs competitively on classification tasks while being able to generate novel data samples.

deep learning, dimensionality reduction, neural network, (17 more...)

arXiv.org Machine Learning

1812.07641

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Optimizing over a Restricted Policy Class in Markov Decision Processes

Banijamali, Ershad, Abbasi-Yadkori, Yasin, Ghavamzadeh, Mohammad, Vlassis, Nikos

arXiv.org Machine LearningFeb-26-2018

We address the problem of finding an optimal policy in a Markov decision process under a restricted policy class defined by the convex hull of a set of base policies. This problem is of great interest in applications in which a number of reasonably good (or safe) policies are already known and we are only interested in optimizing in their convex hull. We show that this problem is NP-hard to solve exactly as well as to approximate to arbitrary accuracy. However, under a condition that is akin to the occupancy measures of the base policies having large overlap, we show that there exists an efficient algorithm that finds a policy that is almost as good as the best convex combination of the base policies. The running time of the proposed algorithm is linear in the number of states and polynomial in the number of base policies. In practice, we demonstrate an efficient implementation for large state problems. Compared to traditional policy gradient methods, the proposed approach has the advantage that, apart from the computation of occupancy measures of some base policies, the iterative method need not interact with the environment during the optimization process. This is especially important in complex systems where estimating the value of a policy can be a time consuming process.

artificial intelligence, machine learning, occupancy measure, (17 more...)

arXiv.org Machine Learning

1802.09646

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

JADE: Joint Autoencoders for Dis-Entanglement

Banijamali, Ershad, Karimi, Amir-Hossein, Wong, Alexander, Ghodsi, Ali

arXiv.org Machine LearningNov-24-2017

The problem of feature disentanglement has been explored in the literature, for the purpose of image and video processing and text analysis. State-of-the-art methods for disentangling feature representations rely on the presence of many labeled samples. In this work, we present a novel method for disentangling factors of variation in data-scarce regimes. Specifically, we explore the application of feature disentangling for the problem of supervised classification in a setting where few labeled samples exist, and there are no unlabeled samples for use in unsupervised training. Instead, a similar datasets exists which shares at least one direction of variation with the sample-constrained datasets. We train our model end-to-end using the framework of variational autoencoders and are able to experimentally demonstrate that using an auxiliary dataset with similar variation factors contribute positively to classification performance, yielding competitive results with the state-of-the-art in unsupervised learning.

dataset, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1711.09163

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry: Health & Medicine (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Fast Spectral Clustering Using Autoencoders and Landmarks

Banijamali, Ershad, Ghodsi, Ali

arXiv.org Machine LearningApr-7-2017

In this paper, we introduce an algorithm for performing spectral clustering efficiently. Spectral clustering is a powerful clustering algorithm that suffers from high computational complexity, due to eigen decomposition. In this work, we first build the adjacency matrix of the corresponding graph of the dataset. To build this matrix, we only consider a limited number of points, called landmarks, and compute the similarity of all data points with the landmarks. Then, we present a definition of the Laplacian matrix of the graph that enable us to perform eigen decomposition efficiently, using a deep autoencoder. The overall complexity of the algorithm for eigen decomposition is $O(np)$, where $n$ is the number of data points and $p$ is the number of landmarks. At last, we evaluate the performance of the algorithm in different experiments.

artificial intelligence, neural network, spectral, (20 more...)

arXiv.org Machine Learning

1704.02345

Country: North America > Canada (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Add feedback

Generative Mixture of Networks

Banijamali, Ershad, Ghodsi, Ali, Poupart, Pascal

arXiv.org Machine LearningFeb-10-2017

A generative model based on training deep architectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K clusters and feeding each of them into a separate network. After few iterations of training networks separately, we use an EM-like algorithm to train the networks together and update the clusters of the data. We call this model Mixture of Networks. The provided model is a platform that can be used for any deep structure and be trained by any conventional objective function for distribution modeling. As the components of the model are neural networks, it has high capability in characterizing complicated data distributions as well as clustering data. We apply the algorithm on MNIST hand-written digits and Yale face datasets. We also demonstrate the clustering ability of the model using some real-world and toy examples.

algorithm, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

1702.03307

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback