AITopics | Tomsk

While not long ago probabilistic autoregressive language models were just models that assign probabilities to sequences of words (Bahl et al., 1983), now they are the cornerstone of any task in computational linguistics through prompting (Sanh et al., 2022) or fine-tuning (Radford et al., 2018). Such models being successfully commercialized, the number of practical applications of these models is rapidly growing, as is the number of papers considering various aspects of the use of probabilistic autoregressive language models. It is all the more surprising that the statistical properties of the output sequences produced by such models have been studied relatively little. We aim to fill this gap somewhat and empirically demonstrate that, depending on the temperature parameter, LLMs can generate text that can be classified as solid (periodic phase), critical state (that has autocorrelations decay according to the power law) or gas (amorphous phase) from the point of view of autocorrelation analysis. Our main contributions are the following: 1. We clearly identify three phases of LLM-generated texts - periodic, critical and amorphous 2. We show through computational experiments that for LLM-generated texts, there is a phase transition from ordered to amorphous state at about the same temperatures between 0.7 and 1, for different LLMs 3. We show that for amorphous state, long-range autocorrelations decay follows the exponential law independently from the generation temperature, for different LLMs 4. We show that for temperatures between 0.7 and 1 autocorrelations exhibit power law decay on medium distances of up to 2000 words, implying isles of connectivity of these sizes. We go on to introduce the key concepts.

computational linguistic, conference paper, correlation, (11 more...)

arXiv.org Artificial Intelligence

2503.0633

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Russia > Siberian Federal District > Tomsk Oblast > Tomsk (0.04)
(5 more...)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Spectral k-Support Norm Regularization

Andrew M. McDonald, Massimiliano Pontil, Dimitris Stamos

Neural Information Processing SystemsFeb-9-2025, 20:04:55 GMT

The k-support norm has successfully been applied to sparse vector prediction problems. We observe that it belongs to a wider class of norms, which we call the box-norms. Within this framework we derive an efficient algorithm to compute the proximity operator of the squared norm, improving upon the original method for the k-support norm. We extend the norms from the vector to the matrix setting and we introduce the spectral k-support norm. We study its properties and show that it is closely related to the multitask learning cluster norm. We apply the norms to real and synthetic matrix completion datasets. Our findings indicate that spectral k-support norm regularization gives state of the art performance, consistently improving over trace norm regularization and the matrix elastic net.

artificial intelligence, k-support norm, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Russia > Siberian Federal District > Tomsk Oblast > Tomsk (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Patent Figure Classification using Large Vision-language Models

Awale, Sushil, Müller-Budack, Eric, Ewerth, Ralph

arXiv.org Artificial IntelligenceJan-22-2025

Patent figure classification facilitates faceted search in patent retrieval systems, enabling efficient prior art search. Existing approaches have explored patent figure classification for only a single aspect and for aspects with a limited number of concepts. In recent years, large vision-language models (LVLMs) have shown tremendous performance across numerous computer vision downstream tasks, however, they remain unexplored for patent figure classification. Our work explores the efficacy of LVLMs in patent figure visual question answering (VQA) and classification, focusing on zero-shot and few-shot learning scenarios. For this purpose, we introduce new datasets, PatFigVQA and PatFigCLS, for fine-tuning and evaluation regarding multiple aspects of patent figures~(i.e., type, projection, patent class, and objects). For a computational-effective handling of a large number of classes using LVLM, we propose a novel tournament-style classification strategy that leverages a series of multiple-choice questions. Experimental results and comparisons of multiple classification approaches based on LVLMs and Convolutional Neural Networks (CNNs) in few-shot settings show the feasibility of the proposed approaches.

classification, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.12751

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > Austria > Vienna (0.14)
(19 more...)

Genre: Research Report (0.64)

Industry: Law > Intellectual Property & Technology Law (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Large Language Model Predict Employee Attrition?

Ma, Xiaoye, Liu, Weiheng, Zhao, Changyi, Tukhvatulina, Liliya R.

arXiv.org Artificial IntelligenceNov-2-2024

Employee attrition poses significant costs for organizations, with traditional statistical prediction methods often struggling to capture modern workforce complexities. Machine learning (ML) advancements offer more scalable and accurate solutions, but large language models (LLMs) introduce new potential in human resource management by interpreting nuanced employee communication and detecting subtle turnover cues. This study leverages the IBM HR Analytics Attrition dataset to compare the predictive accuracy and interpretability of a fine-tuned GPT-3.5 model against traditional ML classifiers, including Logistic Regression, k-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree, Random Forest, AdaBoost, and XGBoost. While traditional models are easier to use and interpret, LLMs can reveal deeper patterns in employee behavior. Our findings show that the fine-tuned GPT-3.5 model outperforms traditional methods with a precision of 0.91, recall of 0.94, and an F1-score of 0.92, while the best traditional model, SVM, achieved an F1-score of 0.82, with Random Forest and XGBoost reaching 0.80. These results highlight GPT-3.5's ability to capture complex patterns in attrition risk, offering organizations improved insights for retention strategies and underscoring the value of LLMs in HR applications.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.01353

Country:

Asia > Russia > Siberian Federal District > Tomsk Oblast > Tomsk (0.05)
Europe > Russia (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.49)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Faster Sampling from Log-Concave Densities over Polytopes via Efficient Linear Solvers

Mangoubi, Oren, Vishnoi, Nisheeth K.

arXiv.org Machine LearningSep-6-2024

We present a nearly-optimal implementation of this Markov chain with per-step complexity which is roughly the number of non-zero entries of A while the number of Markov chain steps remains the same. The key technical ingredients are 1) to show that the matrices that arise in this Dikin walk change slowly, 2) to deploy efficient linear solvers that can leverage this slow change to speed up matrix inversion by using information computed in previous steps, and 3) to speed up the computation of the determinantal term in the Metropolis filter step via a randomized Taylor series-based estimator. This result directly improves the runtime for applications that involve sampling from Gibbs distributions constrained to polytopes that arise in Bayesian statistics and private optimization.

algorithm, arithmetic operation, markov chain, (14 more...)

arXiv.org Machine Learning

2409.0432

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > Russia > Siberian Federal District > Tomsk Oblast > Tomsk (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Objective Features Extracted from Motor Activity Time Series for Food Addiction Analysis Using Machine Learning

Borisenkov, Mikhail, Velichko, Andrei, Belyaev, Maksim, Korzun, Dmitry, Tserne, Tatyana, Bakutova, Larisa, Gubin, Denis

arXiv.org Artificial IntelligenceAug-30-2024

This study investigates machine learning algorithms to identify objective features for diagnosing food addiction (FA) and assessing confirmed symptoms (SC). Data were collected from 81 participants (mean age: 21.5 years, range: 18-61 years, women: 77.8%) whose FA and SC were measured using the Yale Food Addiction Scale (YFAS). Participants provided demographic and anthropometric data, completed the YFAS, the Zung Self-Rating Depression Scale, and the Dutch Eating Behavior Questionnaire, and wore an actimeter on the non-dominant wrist for a week to record motor activity. Analysis of the actimetric data identified significant statistical and entropy-based features that accurately predicted FA and SC using ML. The Matthews correlation coefficient (MCC) was the primary metric. Activity-related features were more effective for FA prediction (MCC=0.88) than rest-related features (MCC=0.68). For SC, activity segments yielded MCC=0.47, rest segments MCC=0.38, and their combination MCC=0.51. Significant correlations were also found between actimetric features related to FA, emotional, and restrained eating behaviors, supporting the model's validity. Our results support the concept of a human bionic suite composed of IoT devices and ML sensors, which implements health digital assistance with real-time monitoring and analysis of physiological indicators related to FA and SC.

actimetric feature, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2409.0031

Country:

Europe > Russia > Northwestern Federal District > Komi Republic > Syktyvkar (0.05)
Asia > Russia > Ural Federal District > Tyumen Oblast > Tyumen (0.05)
Europe > Russia > North Caucasian Federal District > Republic of Karelia > Petrozavodsk (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

The radius of statistical efficiency

Cutler, Joshua, Díaz, Mateo, Drusvyatskiy, Dmitriy

arXiv.org Machine LearningMay-15-2024

Classical results in asymptotic statistics show that the Fisher information matrix controls the difficulty of estimating a statistical model from observed data. In this work, we introduce a companion measure of robustness of an estimation problem: the radius of statistical efficiency (RSE) is the size of the smallest perturbation to the problem data that renders the Fisher information matrix singular. We compute RSE up to numerical constants for a variety of test bed problems, including principal component analysis, generalized linear models, phase retrieval, bilinear sensing, and matrix completion. In all cases, the RSE quantifies the compatibility between the covariance of the population data and the latent model parameter. Interestingly, we observe a precise reciprocal relationship between RSE and the intrinsic complexity/sensitivity of the problem instance, paralleling the classical Eckart-Young theorem in numerical analysis.

inequality, matrix, rse, (15 more...)

arXiv.org Machine Learning

2405.09676

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Maryland > Baltimore (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Spectral k-Support Norm Regularization

Neural Information Processing SystemsMar-13-2024, 12:44:20 GMT

The k-support norm has successfully been applied to sparse vector prediction problems. We observe that it belongs to a wider class of norms, which we call the box-norms. Within this framework we derive an efficient algorithm to compute the proximity operator of the squared norm, improving upon the original method for the k-support norm. We extend the norms from the vector to the matrix setting and we introduce the spectral k-support norm. We study its properties and show that it is closely related to the multitask learning cluster norm. We apply the norms to real and synthetic matrix completion datasets. Our findings indicate that spectral k-support norm regularization gives state of the art performance, consistently improving over trace norm regularization and the matrix elastic net.

k-support norm, proximity operator, spectral k-support norm, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Russia > Siberian Federal District > Tomsk Oblast > Tomsk (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching

Ito, Akira, Yamada, Masanori, Kumagai, Atsutoshi

arXiv.org Artificial IntelligenceFeb-6-2024

Recently, Ainsworth et al. showed that using weight matching (WM) to minimize the $L_2$ distance in a permutation search of model parameters effectively identifies permutations that satisfy linear mode connectivity (LMC), in which the loss along a linear path between two independently trained models with different seeds remains nearly constant. This paper provides a theoretical analysis of LMC using WM, which is crucial for understanding stochastic gradient descent's effectiveness and its application in areas like model merging. We first experimentally and theoretically show that permutations found by WM do not significantly reduce the $L_2$ distance between two models and the occurrence of LMC is not merely due to distance reduction by WM in itself. We then provide theoretical insights showing that permutations can change the directions of the singular vectors, but not the singular values, of the weight matrices in each layer. This finding shows that permutations found by WM mainly align the directions of singular vectors associated with large singular values across models. This alignment brings the singular vectors with large singular values, which determine the model functionality, closer between pre-merged and post-merged models, so that the post-merged model retains functionality similar to the pre-merged models, making it easy to satisfy LMC. Finally, we analyze the difference between WM and straight-through estimator (STE), a dataset-dependent permutation search method, and show that WM outperforms STE, especially when merging three or more models.

permutation, singular value, singular vector, (13 more...)

arXiv.org Artificial Intelligence

2402.04051

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Comparison of parameters of vowel sounds of russian and english languages

Fedoseev, V. I., Konev, A. A., Yakimuk, A. Yu.

arXiv.org Artificial IntelligenceJan-26-2024

In multilingual speech recognition systems, a situation can often arise when the language is not known in advance, but the signal has already been received and is being processed. For such cases, some generalized model is needed that will be able to respond to phonetic differences and, depending on them, correctly recog-nize speech in the desired language. To build such a model, it is necessary to set the values of phonetic parameters, and then compare similar sounds, establishing significant differences.

clear resemblance, frequency, main tone, (14 more...)

arXiv.org Artificial Intelligence

2401.1489

Country: