AITopics | Nikolenko, Sergey

Collaborating Authors

Nikolenko, Sergey

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Toolken+: Improving LLM Tool Usage with Reranking and a Reject Option

Yakovlev, Konstantin, Nikolenko, Sergey, Bout, Andrey

arXiv.org Artificial IntelligenceOct-15-2024

The recently proposed ToolkenGPT tool learning paradigm demonstrates promising performance but suffers from two major issues: first, it cannot benefit from tool documentation, and second, it often makes mistakes in whether to use a tool at all. We introduce Toolken+ that mitigates the first problem by reranking top $k$ tools selected by ToolkenGPT and the second problem with a special "Reject" option such that the model will generate a vocabulary token if "Reject" is ranked first. We demonstrate the effectiveness of Toolken+ on multistep numerical reasoning and tool selection tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.12004

Country:

Asia (0.47)
Europe > Russia (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robust AI-Generated Text Detection by Restricted Embeddings

Kuznetsov, Kristian, Tulchinskii, Eduard, Kushnareva, Laida, Magai, German, Barannikov, Serguei, Nikolenko, Sergey, Piontkovskaya, Irina

arXiv.org Artificial IntelligenceOct-10-2024

Growing amount and quality of AI-generated texts makes detecting such content more difficult. In most real-world scenarios, the domain (style and topic) of generated data and the generator model are not known in advance. In this work, we focus on the robustness of classifier-based detectors of AI-generated text, namely their ability to transfer to unseen generators or semantic domains. We investigate the geometry of the embedding space of Transformer-based text encoders and show that clearing out harmful linear subspaces helps to train a robust classifier, ignoring domain-specific spurious features. We investigate several subspace decomposition and feature selection strategies and achieve significant improvements over state of the art methods in cross-domain and cross-generator transfer. Our best approaches for head-wise and coordinate-based subspace removal increase the mean out-of-distribution (OOD) classification score by up to 9% and 14% in particular setups for RoBERTa and BERT embeddings respectively. We release our code and data: https://github.com/SilverSolver/RobustATD

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.08113

Country:

Europe > Russia (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural Click Models for Recommender Systems

Shirokikh, Mikhail, Shenbin, Ilya, Alekseev, Anton, Volodkevich, Anna, Vasilev, Alexey, Savchenko, Andrey V., Nikolenko, Sergey

arXiv.org Artificial IntelligenceSep-30-2024

We develop and evaluate neural architectures to model the user behavior in recommender systems (RS) inspired by click models for Web search but going beyond standard click models. Proposed architectures include recurrent networks, Transformer-based models that alleviate the quadratic complexity of self-attention, adversarial and hierarchical architectures. Our models outperform baselines on the ContentWise and RL4RS datasets and can be used in RS simulators to model user response for RS evaluation and pretraining.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3626772.3657939

2409.20055

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.66)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

$\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Khrabrov, Kuzma, Ber, Anton, Tsypin, Artem, Ushenin, Konstantin, Rumiantsev, Egor, Telepov, Alexander, Protasov, Dmitry, Shenbin, Ilya, Alekseev, Anton, Shirokikh, Mikhail, Nikolenko, Sergey, Tutubalina, Elena, Kadurin, Artur

arXiv.org Machine LearningJun-20-2024

Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called $\nabla^2$DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level ($\omega$B97X-D/def2-SVP) for each conformation. Moreover, $\nabla^2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.

artificial intelligence, machine learning, molecule, (20 more...)

arXiv.org Machine Learning

2406.14347

Country: Europe (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

ImplicitSLIM and How it Improves Embedding-based Collaborative Filtering

Shenbin, Ilya, Nikolenko, Sergey

arXiv.org Artificial IntelligenceMay-31-2024

Sparse linear methods (SLIM) and their variations show outstanding performance, but they are memory-intensive and hard to scale. ImplicitSLIM improves embedding-based models by extracting embeddings from SLIM-like models in a computationally cheap and memory-efficient way, without explicit learning of heavy SLIM-like models. We show that ImplicitSLIM improves performance and speeds up convergence for both state of the art and classical collaborative filtering methods. Learnable embeddings are a core part of many collaborative filtering (CF) models. In this work, we propose an approach able to improve a wide variety of collaborative filtering models with learnable embeddings. Item-item methods, including kNN-based approaches (Sarwar et al., 2001) and sparse linear methods (SLIM) (Ning & Karypis, 2011), are making predictions based on item-item similarity. Previous research shows that the item-item weight matrix learned by SLIM-like models can become a part of other collaborative filtering models; e.g., RecWalk uses it as a transition probability matrix (Nikolakopoulos & Karypis, 2019). In this work, we reuse the item-item weight matrix in order to enrich embedding-based models with information on item-item interactions. Another motivation for our approach stems from nonlinear dimensionality reduction methods (e.g., VAEs) applied to collaborative filtering (Shenbin et al., 2020). We consider a group of manifold learning methods that aim to preserve the structure of data in the embedding space, that is, they force embeddings of similar objects to be similar.

artificial intelligence, implicitslim, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.00198

Country:

Europe > Russia (0.28)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

AI-generated text boundary detection with RoFT

Kushnareva, Laida, Gaintseva, Tatiana, Magai, German, Barannikov, Serguei, Abulkhanov, Dmitry, Kuznetsov, Kristian, Tulchinskii, Eduard, Piontkovskaya, Irina, Nikolenko, Sergey

arXiv.org Artificial IntelligenceApr-2-2024

Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap and examine several ways to adapt state of the art artificial text detection classifiers to the boundary detection setting. We push all detectors to their limits, using the Real or Fake text benchmark that contains short texts on several topics and includes generations of various language models. We use this diversity to deeply examine the robustness of all detectors in cross-domain and cross-model settings to provide baselines and insights for future research. In particular, we find that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; we also find which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.08349

Country:

Asia (0.93)
North America > Canada (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Early Warning Prediction with Automatic Labeling in Epilepsy Patients

Zhang, Peng, Gao, Ting, Guo, Jin, Duan, Jinqiao, Nikolenko, Sergey

arXiv.org Artificial IntelligenceJan-11-2024

Early warning for epilepsy patients is crucial for their safety and well-being, in particular to prevent or minimize the severity of seizures. Through the patients' EEG data, we propose a meta learning framework to improve the prediction of early ictal signals. The proposed bi-level optimization framework can help automatically label noisy data at the early ictal stage, as well as optimize the training accuracy of the backbone model. To validate our approach, we conduct a series of experiments to predict seizure onset in various long-term windows, with LSTM and ResNet implemented as the baseline models. Our study demonstrates that not only the ictal prediction accuracy obtained by meta learning is significantly improved, but also the resulting model captures some intrinsic patterns of the noisy data that a single backbone model could not learn. As a result, the predicted probability generated by the meta network serves as a highly effective early warning indicator.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2310.06059

Country:

Asia (0.47)
North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule

Bout, Andrey, Podolskiy, Alexander, Nikolenko, Sergey, Piontkovskaya, Irina

arXiv.org Artificial IntelligenceNov-20-2023

Progress in neural grammatical error correction (GEC) is hindered by the lack of annotated training data. Sufficient amounts of high-quality manually annotated data are not available, so recent research has relied on generating synthetic data, pretraining on it, and then fine-tuning on real datasets; performance gains have been achieved either by ensembling or by using huge pretrained models such as XXL-T5 as the backbone. In this work, we explore an orthogonal direction: how to use available data more efficiently. First, we propose auxiliary tasks that exploit the alignment between the original and corrected sentences, such as predicting a sequence of corrections. We formulate each task as a sequence-to-sequence problem and perform multi-task training. Second, we discover that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance, so we set out to find the best training schedule. Together, these two ideas lead to significant improvements, producing results that improve state of the art with much smaller models; in particular, we outperform the best models based on T5-XXL (11B parameters) with a BART-based model (400M parameters).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.11813

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Oregon (0.14)
North America > United States > Maryland (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Data Science > Data Quality > Data Cleaning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Tulchinskii, Eduard, Kuznetsov, Kristian, Kushnareva, Laida, Cherniavskii, Daniil, Barannikov, Serguei, Piontkovskaya, Irina, Nikolenko, Sergey, Burnaev, Evgeny

arXiv.org Artificial IntelligenceOct-31-2023

Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over different text domains and varying proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant for human-written texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings for a given text sample. We show that the average intrinsic dimensionality of fluent texts in a natural language is hovering around the value $9$ for several alphabet-based languages and around $7$ for Chinese, while the average intrinsic dimensionality of AI-generated texts for each language is $\approx 1.5$ lower, with a clear statistical separation between human-generated and AI-generated distributions. This property allows us to build a score-based artificial text detector. The proposed detector's accuracy is stable over text domains, generator models, and human writer proficiency levels, outperforming SOTA detectors in model-agnostic and cross-domain scenarios by a significant margin.

artificial intelligence, intrinsic dimension estimation, natural language, (2 more...)

arXiv.org Artificial Intelligence

2306.04723

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Machine Learning for SAT: Restricted Heuristics and New Graph Representations

Shirokikh, Mikhail, Shenbin, Ilya, Alekseev, Anton, Nikolenko, Sergey

arXiv.org Artificial IntelligenceJul-18-2023

Boolean satisfiability (SAT) is a fundamental NP-complete problem with many applications, including automated planning and scheduling. To solve large instances, SAT solvers have to rely on heuristics, e.g., choosing a branching variable in DPLL and CDCL solvers. Such heuristics can be improved with machine learning (ML) models; they can reduce the number of steps but usually hinder the running time because useful models are relatively large and slow. We suggest the strategy of making a few initial steps with a trained ML model and then releasing control to classical heuristics; this simplifies cold start for SAT solving and can decrease both the number of steps and overall runtime, but requires a separate decision of when to release control to the solver. Moreover, we introduce a modification of Graph-Q-SAT tailored to SAT problems converted from other domains, e.g., open shop scheduling problems. We validate the feasibility of our approach with random and industrial SAT problems.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2307.09141

Country:

Europe > Russia (0.14)
Europe > France (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.89)

Add feedback