AITopics | Kisilev, Pavel

Collaborating Authors

Kisilev, Pavel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rethinking Data: Towards Better Performing Domain-Specific Small Language Models

Nazarov, Boris, Frolova, Darya, Lubarsky, Yackov, Gaissinski, Alexei, Kisilev, Pavel

arXiv.org Artificial IntelligenceMar-3-2025

Fine-tuning of Large Language Models (LLMs) for downstream tasks, performed on domain-specific data has shown significant promise. However, commercial use of such LLMs is limited by the high computational cost required for their deployment at scale. On the other hand, small Language Models (LMs) are much more cost effective but have subpar performance in a similar setup. This paper presents our approach to finetuning a small LM, that reaches high accuracy in multiple choice question answering task. We achieve this by improving data quality at each stage of the LM training pipeline. In particular, we start with data structuring resulting in extraction of compact, semantically meaningful text chunks used by a retriever. This allows more efficient knowledge digestion by the LM. Further, we improve the retrieved context by training a lightweight Chunk Re-Ranker (CRR) that generates more accurate relative relevance chunk scores. Finally, we improve the model generalization ability by merging the models fine-tuned with different parameters on different data subsets. We present detailed procedure descriptions, and corresponding experimental findings that show the improvements of each one of the proposed techniques.

accuracy, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.01464

Genre: Research Report (1.00)

Industry:

Telecommunications (0.47)
Education (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

Choukroun, Yoni, Azoulay, Shlomi, Kisilev, Pavel

arXiv.org Artificial IntelligenceNov-6-2024

Distributed machine learning has recently become a critical paradigm for training large models on vast datasets. We examine the stochastic optimization problem for deep learning within synchronous parallel computing environments under communication constraints. While averaging distributed gradients is the most widely used method for gradient estimation, whether this is the optimal strategy remains an open question. In this work, we analyze the distributed gradient aggregation process through the lens of subspace optimization. By formulating the aggregation problem as an objective-aware subspace optimization problem, we derive an efficient weighting scheme for gradients, guided by subspace coefficients. We further introduce subspace momentum to accelerate convergence while maintaining statistical unbiasedness in the aggregation. Our method demonstrates improved performance over the ubiquitous gradient averaging on multiple MLPerf tasks while remaining extremely efficient in both communicational and computational complexity.

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Artificial Intelligence

2411.03742

Country:

Europe > Netherlands (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Primal-Dual Sequential Subspace Optimization for Saddle-point Problems

Choukroun, Yoni, Zibulevsky, Michael, Kisilev, Pavel

arXiv.org Machine LearningAug-20-2020

We introduce a new sequential subspace optimization method for large-scale saddle-point problems. It solves iteratively a sequence of auxiliary saddle-point problems in low-dimensional subspaces, spanned by directions derived from first-order information over the primal \emph{and} dual variables. Proximal regularization is further deployed to stabilize the optimization process. Experimental results demonstrate significantly better convergence relative to popular first-order methods. We analyze the influence of the subspace on the convergence of the algorithm, and assess its performance in various deterministic optimization scenarios, such as bi-linear games, ADMM-based constrained optimization and generative adversarial networks.

artificial intelligence, optimization, optimization problem, (17 more...)

arXiv.org Machine Learning

2008.09149

Country: North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Low-bit Quantization of Neural Networks for Efficient Inference

Choukroun, Yoni, Kravchik, Eli, Kisilev, Pavel

arXiv.org Machine LearningFeb-18-2019

Recent breakthrough methods in machine learning make use of increasingly large deep neural networks. The gains in performance have come at the cost of a substantial increase in computation and storage, making real-time implementation on limited hardware a very challenging task. One popular approach to address this challenge is to perform low-bit precision computations via neural network quantization. However, aggressive quantization generally entails a severe penalty in terms of accuracy and usually requires the retraining of the network or resorts to higher bit precision quantization. In this paper, we formalize the linear quantization task as a Minimum Mean Squared Error (MMSE) problem for both weights and activations. This allows low-bit precision inference without the need for full network retraining. The main contributions of our approach is the optimization of the constrained MSE problem at each layer of the network, the hardware aware partitioning of the neural network parameters, and the use of multiple low precision quantized tensors for poorly approximated layers. The proposed approach allows for the first time a linear 4 bits integer precision (INT4) quantization for deployment of pretrained models on limited hardware resources.

deep learning, neural network, quantization, (17 more...)

arXiv.org Machine Learning

1902.06822

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Blind Source Separation via Multinode Sparse Representation

Zibulevsky, Michael, Kisilev, Pavel, Zeevi, Yehoshua Y., Pearlmutter, Barak A.

Neural Information Processing SystemsDec-31-2002

We consider a problem of blind source separation from a set of instantaneous linear mixtures, where the mixing matrix is unknown. It was discovered recently, that exploiting the sparsity of sources in an appropriate representation according to some signal dictionary, dramatically improves the quality of separation. In this work we use the property of multi scale transforms, such as wavelet or wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We use this intrinsic property for selecting the best (most sparse) subsets of features for further separation. The performance of the algorithm is verified on noise-free and noisy data. Experiments with simulated signals, musical sounds and images demonstrate significant improvement of separation quality over previously reported results.

artificial intelligence, coefficient, data quality, (19 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.29)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality > Data Transformation (0.93)

Add feedback

Blind Source Separation via Multinode Sparse Representation

Zibulevsky, Michael, Kisilev, Pavel, Zeevi, Yehoshua Y., Pearlmutter, Barak A.

Neural Information Processing SystemsDec-31-2002

We consider a problem of blind source separation from a set of instantaneous linearmixtures, where the mixing matrix is unknown. It was discovered recently, that exploiting the sparsity of sources in an appropriate representationaccording to some signal dictionary, dramatically improves the quality of separation. In this work we use the property of multi scale transforms, such as wavelet or wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We use this intrinsic property for selecting the best (most sparse) subsets of features for further separation. The performance of the algorithm is verified onnoise-free and noisy data. Experiments with simulated signals, musical sounds and images demonstrate significant improvement of separation qualityover previously reported results. 1 Introduction

artificial intelligence, coefficient, data quality, (19 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.29)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality > Data Transformation (0.94)

Add feedback