Tomsk
States of LLM-generated Texts and Phase Transitions between them
While not long ago probabilistic autoregressive language models were just models that assign probabilities to sequences of words (Bahl et al., 1983), now they are the cornerstone of any task in computational linguistics through prompting (Sanh et al., 2022) or fine-tuning (Radford et al., 2018). Such models being successfully commercialized, the number of practical applications of these models is rapidly growing, as is the number of papers considering various aspects of the use of probabilistic autoregressive language models. It is all the more surprising that the statistical properties of the output sequences produced by such models have been studied relatively little. We aim to fill this gap somewhat and empirically demonstrate that, depending on the temperature parameter, LLMs can generate text that can be classified as solid (periodic phase), critical state (that has autocorrelations decay according to the power law) or gas (amorphous phase) from the point of view of autocorrelation analysis. Our main contributions are the following: 1. We clearly identify three phases of LLM-generated texts - periodic, critical and amorphous 2. We show through computational experiments that for LLM-generated texts, there is a phase transition from ordered to amorphous state at about the same temperatures between 0.7 and 1, for different LLMs 3. We show that for amorphous state, long-range autocorrelations decay follows the exponential law independently from the generation temperature, for different LLMs 4. We show that for temperatures between 0.7 and 1 autocorrelations exhibit power law decay on medium distances of up to 2000 words, implying isles of connectivity of these sizes. We go on to introduce the key concepts.
Spectral k-Support Norm Regularization
Andrew M. McDonald, Massimiliano Pontil, Dimitris Stamos
The k-support norm has successfully been applied to sparse vector prediction problems. We observe that it belongs to a wider class of norms, which we call the box-norms. Within this framework we derive an efficient algorithm to compute the proximity operator of the squared norm, improving upon the original method for the k-support norm. We extend the norms from the vector to the matrix setting and we introduce the spectral k-support norm. We study its properties and show that it is closely related to the multitask learning cluster norm. We apply the norms to real and synthetic matrix completion datasets. Our findings indicate that spectral k-support norm regularization gives state of the art performance, consistently improving over trace norm regularization and the matrix elastic net.
Patent Figure Classification using Large Vision-language Models
Awale, Sushil, Müller-Budack, Eric, Ewerth, Ralph
Patent figure classification facilitates faceted search in patent retrieval systems, enabling efficient prior art search. Existing approaches have explored patent figure classification for only a single aspect and for aspects with a limited number of concepts. In recent years, large vision-language models (LVLMs) have shown tremendous performance across numerous computer vision downstream tasks, however, they remain unexplored for patent figure classification. Our work explores the efficacy of LVLMs in patent figure visual question answering (VQA) and classification, focusing on zero-shot and few-shot learning scenarios. For this purpose, we introduce new datasets, PatFigVQA and PatFigCLS, for fine-tuning and evaluation regarding multiple aspects of patent figures~(i.e., type, projection, patent class, and objects). For a computational-effective handling of a large number of classes using LVLM, we propose a novel tournament-style classification strategy that leverages a series of multiple-choice questions. Experimental results and comparisons of multiple classification approaches based on LVLMs and Convolutional Neural Networks (CNNs) in few-shot settings show the feasibility of the proposed approaches.
Can Large Language Model Predict Employee Attrition?
Ma, Xiaoye, Liu, Weiheng, Zhao, Changyi, Tukhvatulina, Liliya R.
Employee attrition poses significant costs for organizations, with traditional statistical prediction methods often struggling to capture modern workforce complexities. Machine learning (ML) advancements offer more scalable and accurate solutions, but large language models (LLMs) introduce new potential in human resource management by interpreting nuanced employee communication and detecting subtle turnover cues. This study leverages the IBM HR Analytics Attrition dataset to compare the predictive accuracy and interpretability of a fine-tuned GPT-3.5 model against traditional ML classifiers, including Logistic Regression, k-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree, Random Forest, AdaBoost, and XGBoost. While traditional models are easier to use and interpret, LLMs can reveal deeper patterns in employee behavior. Our findings show that the fine-tuned GPT-3.5 model outperforms traditional methods with a precision of 0.91, recall of 0.94, and an F1-score of 0.92, while the best traditional model, SVM, achieved an F1-score of 0.82, with Random Forest and XGBoost reaching 0.80. These results highlight GPT-3.5's ability to capture complex patterns in attrition risk, offering organizations improved insights for retention strategies and underscoring the value of LLMs in HR applications.
Faster Sampling from Log-Concave Densities over Polytopes via Efficient Linear Solvers
Mangoubi, Oren, Vishnoi, Nisheeth K.
We present a nearly-optimal implementation of this Markov chain with per-step complexity which is roughly the number of non-zero entries of A while the number of Markov chain steps remains the same. The key technical ingredients are 1) to show that the matrices that arise in this Dikin walk change slowly, 2) to deploy efficient linear solvers that can leverage this slow change to speed up matrix inversion by using information computed in previous steps, and 3) to speed up the computation of the determinantal term in the Metropolis filter step via a randomized Taylor series-based estimator. This result directly improves the runtime for applications that involve sampling from Gibbs distributions constrained to polytopes that arise in Bayesian statistics and private optimization.
Objective Features Extracted from Motor Activity Time Series for Food Addiction Analysis Using Machine Learning
Borisenkov, Mikhail, Velichko, Andrei, Belyaev, Maksim, Korzun, Dmitry, Tserne, Tatyana, Bakutova, Larisa, Gubin, Denis
This study investigates machine learning algorithms to identify objective features for diagnosing food addiction (FA) and assessing confirmed symptoms (SC). Data were collected from 81 participants (mean age: 21.5 years, range: 18-61 years, women: 77.8%) whose FA and SC were measured using the Yale Food Addiction Scale (YFAS). Participants provided demographic and anthropometric data, completed the YFAS, the Zung Self-Rating Depression Scale, and the Dutch Eating Behavior Questionnaire, and wore an actimeter on the non-dominant wrist for a week to record motor activity. Analysis of the actimetric data identified significant statistical and entropy-based features that accurately predicted FA and SC using ML. The Matthews correlation coefficient (MCC) was the primary metric. Activity-related features were more effective for FA prediction (MCC=0.88) than rest-related features (MCC=0.68). For SC, activity segments yielded MCC=0.47, rest segments MCC=0.38, and their combination MCC=0.51. Significant correlations were also found between actimetric features related to FA, emotional, and restrained eating behaviors, supporting the model's validity. Our results support the concept of a human bionic suite composed of IoT devices and ML sensors, which implements health digital assistance with real-time monitoring and analysis of physiological indicators related to FA and SC.
The radius of statistical efficiency
Cutler, Joshua, Díaz, Mateo, Drusvyatskiy, Dmitriy
Classical results in asymptotic statistics show that the Fisher information matrix controls the difficulty of estimating a statistical model from observed data. In this work, we introduce a companion measure of robustness of an estimation problem: the radius of statistical efficiency (RSE) is the size of the smallest perturbation to the problem data that renders the Fisher information matrix singular. We compute RSE up to numerical constants for a variety of test bed problems, including principal component analysis, generalized linear models, phase retrieval, bilinear sensing, and matrix completion. In all cases, the RSE quantifies the compatibility between the covariance of the population data and the latent model parameter. Interestingly, we observe a precise reciprocal relationship between RSE and the intrinsic complexity/sensitivity of the problem instance, paralleling the classical Eckart-Young theorem in numerical analysis.
Spectral k-Support Norm Regularization
The k-support norm has successfully been applied to sparse vector prediction problems. We observe that it belongs to a wider class of norms, which we call the box-norms. Within this framework we derive an efficient algorithm to compute the proximity operator of the squared norm, improving upon the original method for the k-support norm. We extend the norms from the vector to the matrix setting and we introduce the spectral k-support norm. We study its properties and show that it is closely related to the multitask learning cluster norm. We apply the norms to real and synthetic matrix completion datasets. Our findings indicate that spectral k-support norm regularization gives state of the art performance, consistently improving over trace norm regularization and the matrix elastic net.
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching
Ito, Akira, Yamada, Masanori, Kumagai, Atsutoshi
Recently, Ainsworth et al. showed that using weight matching (WM) to minimize the $L_2$ distance in a permutation search of model parameters effectively identifies permutations that satisfy linear mode connectivity (LMC), in which the loss along a linear path between two independently trained models with different seeds remains nearly constant. This paper provides a theoretical analysis of LMC using WM, which is crucial for understanding stochastic gradient descent's effectiveness and its application in areas like model merging. We first experimentally and theoretically show that permutations found by WM do not significantly reduce the $L_2$ distance between two models and the occurrence of LMC is not merely due to distance reduction by WM in itself. We then provide theoretical insights showing that permutations can change the directions of the singular vectors, but not the singular values, of the weight matrices in each layer. This finding shows that permutations found by WM mainly align the directions of singular vectors associated with large singular values across models. This alignment brings the singular vectors with large singular values, which determine the model functionality, closer between pre-merged and post-merged models, so that the post-merged model retains functionality similar to the pre-merged models, making it easy to satisfy LMC. Finally, we analyze the difference between WM and straight-through estimator (STE), a dataset-dependent permutation search method, and show that WM outperforms STE, especially when merging three or more models.
Comparison of parameters of vowel sounds of russian and english languages
Fedoseev, V. I., Konev, A. A., Yakimuk, A. Yu.
In multilingual speech recognition systems, a situation can often arise when the language is not known in advance, but the signal has already been received and is being processed. For such cases, some generalized model is needed that will be able to respond to phonetic differences and, depending on them, correctly recog-nize speech in the desired language. To build such a model, it is necessary to set the values of phonetic parameters, and then compare similar sounds, establishing significant differences.