AITopics | Vishniakov, Kirill

Collaborating Authors

Vishniakov, Kirill

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gene42: Long-Range Genomic Foundation Model With Dense Attention

Vishniakov, Kirill, Amor, Boulbaba Ben, Tekin, Engin, ElNaker, Nancy A., Viswanathan, Karthik, Medvedev, Aleksandr, Singh, Aahan, Nadeem, Maryam, Sayeed, Mohammad Amaan, Kanithi, Praveenkumar, Magalhaes, Tiago, Vassilieva, Natalia, Mahapatra, Dwarikanath, Pimentel, Marco, Khan, and Shadab

arXiv.org Artificial IntelligenceMar-20-2025

We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context length to 192,000 bp. This iterative extension allowed for the comprehensive processing of large-scale genomic data and the capture of intricate patterns and dependencies within the human genome. Gene42 is the first dense attention model capable of handling such extensive long context lengths in genomics, challenging state-space models that often rely on convolutional operators among other mechanisms. Our pretrained models exhibit notably low perplexity values and high reconstruction accuracy, highlighting their strong ability to model genomic data. Extensive experiments on various genomic benchmarks have demonstrated state-of-the-art performance across multiple tasks, including biotype classification, regulatory region identification, chromatin profiling prediction, variant pathogenicity prediction, and species classification. The models are publicly available at huggingface.co/inceptionai.

bioinformatics, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.16565

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.70)

Add feedback

ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy

Vishniakov, Kirill, Shen, Zhiqiang, Liu, Zhuang

arXiv.org Artificial IntelligenceJan-5-2024

Modern computer vision offers a great variety of models to practitioners, and selecting a model from multiple options for specific applications can be challenging. Conventionally, competing model architectures and training protocols are compared by their classification accuracy on ImageNet. However, this single metric does not fully capture performance nuances critical for specialized tasks. In this work, we conduct an in-depth comparative analysis of model behaviors beyond ImageNet accuracy, for both ConvNet and Vision Transformer architectures, each across supervised and CLIP training paradigms. Although our selected models have similar ImageNet accuracies and compute requirements, we find that they differ in many other aspects: types of mistakes, output calibration, transferability, and feature invariance, among others. This diversity in model characteristics, not captured by traditional metrics, highlights the need for more nuanced analysis when choosing among different models. Our code is available at https://github.com/kirill-vish/Beyond-INet.

accuracy, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.09215

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Initializing Models with Larger Ones

Xu, Zhiqiu, Chen, Yanjie, Vishniakov, Kirill, Yin, Yida, Shen, Zhiqiang, Darrell, Trevor, Liu, Lingjie, Liu, Zhuang

arXiv.org Artificial IntelligenceNov-30-2023

Weight initialization plays an important role in neural network training. Widely used initialization methods are proposed and evaluated for networks that are trained from scratch. However, the growing number of pretrained models now offers new opportunities for tackling this classical problem of weight initialization. In this work, we introduce weight selection, a method for initializing smaller models by selecting a subset of weights from a pretrained larger model. This enables the transfer of knowledge from pretrained weights to smaller models. Our experiments demonstrate that weight selection can significantly enhance the performance of small models and reduce their training time. Notably, it can also be used together with knowledge distillation. Weight selection offers a new approach to leverage the power of pretrained models in resource-constrained settings, and we hope it can be a useful tool for training small models in the large-model era. The initialization of neural network weights is crucial for their optimization. Proper initialization aids in model convergence and prevents issues like gradient vanishing. Two prominent initialization techniques, Xavier initialization (Glorot & Bengio, 2010) and Kaiming initialization (He et al., 2015), have played substantial roles in neural network training.

artificial intelligence, machine learning, selection, (19 more...)

arXiv.org Artificial Intelligence

2311.18823

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (0.54)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MixMask: Revisiting Masking Strategy for Siamese ConvNets

Vishniakov, Kirill, Xing, Eric, Shen, Zhiqiang

arXiv.org Artificial IntelligenceMar-21-2023

Recent advances in self-supervised learning have integrated Masked Image Modeling (MIM) and Siamese Networks into a unified framework that leverages the benefits of both techniques. However, several issues remain unaddressed when applying conventional erase-based masking with Siamese ConvNets. These include (I) the inability to drop uninformative masked regions in ConvNets as they process data continuously, resulting in low training efficiency compared to ViT models; and (II) the mismatch between erase-based masking and the contrastive-based objective in Siamese ConvNets, which differs from the MIM approach. In this paper, we propose a filling-based masking strategy called MixMask to prevent information incompleteness caused by the randomly erased regions in an image in the vanilla masking method. Furthermore, we introduce a flexible loss function design that considers the semantic distance change between two different mixed views to adapt the integrated architecture and prevent mismatches between the transformed input and objective in Masked Siamese ConvNets (MSCN). We conducted extensive experiments on various datasets, including CIFAR-100, Tiny-ImageNet, and ImageNet-1K. The results demonstrate that our proposed framework achieves superior accuracy on linear probing, semi-supervised, and supervised finetuning, outperforming the state-of-the-art MSCN by a significant margin. Additionally, we demonstrate the superiority of our approach in object detection and segmentation tasks. Our source code is available at https://github.com/LightnessOfBeing/MixMask.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2210.11456

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.49)

Add feedback