Yerevan
Synthetic Data for any Differentiable Target
Thrush, Tristan, Park, Sung Min, Brunborg, Herman, Bailey, Luke, Roed, Marcel, Band, Neil, Potts, Christopher, Hashimoto, Tatsunori
What are the limits of controlling language models via synthetic training data? We develop a reinforcement learning (RL) primitive, the Dataset Policy Gradient (DPG), which can precisely optimize synthetic data generators to produce a dataset of targeted examples. When used for supervised fine-tuning (SFT) of a target model, these examples cause the target model to do well on a differentiable metric of our choice. Our approach achieves this by taking exact data attribution via higher-order gradients and using those scores as policy gradient rewards. We prove that this procedure closely approximates the true, intractable gradient for the synthetic data generator. To illustrate the potential of DPG, we show that, using only SFT on generated examples, we can cause the target model's LM head weights to (1) embed a QR code, (2) embed the pattern $\texttt{67}$, and (3) have lower $\ell^2$ norm. We additionally show that we can cause the generator to (4) rephrase inputs in a new language and (5) produce a specific UUID, even though neither of these objectives is conveyed in the generator's input prompts. These findings suggest that DPG is a powerful and flexible technique for shaping model properties using only synthetic training examples.
- Asia > Armenia > Yerevan > Yerevan (0.05)
- Africa > Senegal > Dakar Region > Dakar (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
Analysis of the TAIGA-HiSCORE Data Using the Latent Space of Autoencoders
Dubenskaya, Yu. Yu., Polyakov, S. P., Kryukov, A. P., Demichev, A. P., Gres, E. O., Postnikov, E. B., Razumov, A. Yu., Volchugov, P. A., Zhurov, D. P.
The aim of extensive air shower (EAS) analysis is to reconstruct the physical parameters of the primary particle that initiated the shower. The TAIGA experiment is a hybrid detector system that combines several imaging atmospheric Cherenkov telescopes (IACTs) and an array of non-imaging Cherenkov detectors (TAIGA-HiSCORE) for EAS detection. Because the signals recorded by different detector types differ in physical nature, the direct merging of data is unfeasible, which complicates multimodal analysis. Currently, to analyze data from the IACTs and TAIGA-HiSCORE, a set of auxiliary parameters specific to each detector type is calculated from the recorded signals. These parameters are chosen empirically, so there is no certainty that they retain all important information and are the best suited for the respective problems. We propose to use autoencoders (AE) for the analysis of TAIGA experimental data and replace the conventionally used auxiliary parameters with the parameters of the AE latent space. The advantage of the AE latent space parameters is that they preserve essential physics from experimental data without prior assumptions. This approach also holds potential for enabling seamless integration of heterogeneous IACT and HiSCORE data through a joint latent space. To reconstruct the parameters of the primary particle of the EAS from the latent space of the AE, a separate artificial neural network is used. In this paper, the proposed approach is used to reconstruct the energy of the EAS primary particles based on Monte Carlo simulation data for TAIGA-HiSCORE. The dependence of the energy determination accuracy on the dimensionality of the latent space is analyzed, and these results are also compared with the results obtained by the conventional technique. It is shown that when using the AE latent space, the energy of the primary particle is reconstructed with satisfactory accuracy.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Russia > Siberian Federal District > Irkutsk Oblast > Irkutsk (0.04)
- Asia > Armenia > Yerevan > Yerevan (0.04)
GeoCrossBench: Cross-Band Generalization for Remote Sensing
Tamazyan, Hakob, Vanyan, Ani, Barseghyan, Alvard, Khosrovyan, Anna, Shelhamer, Evan, Khachatrian, Hrant
The number and diversity of remote sensing satellites grows over time, while the vast majority of labeled data comes from older satellites. As the foundation models for Earth observation scale up, the cost of (re-)training to support new satellites grows too, so the generalization capabilities of the models towards new satellites become increasingly important. In this work we introduce GeoCrossBench, an extension of the popular GeoBench benchmark with a new evaluation protocol: it tests the in-distribution performance; generalization to satellites with no band overlap; and generalization to satellites with additional bands with respect to the training set. We also develop a self-supervised extension of ChannelViT, ChiViT, to improve its cross-satellite performance. First, we show that even the best foundation models for remote sensing (DOFA, TerraFM) do not outperform general purpose models like DINOv3 in the in-distribution setting. Second, when generalizing to new satellites with no band overlap, all models suffer 2-4x drop in performance, and ChiViT significantly outperforms the runner-up DINOv3. Third, the performance of all tested models drops on average by 5-25\% when given additional bands during test time. Finally, we show that fine-tuning just the last linear layer of these models using oracle labels from all bands can get relatively consistent performance across all satellites, highlighting that the benchmark is far from being saturated. We publicly release the code and the datasets to encourage the development of more future-proof remote sensing models with stronger cross-satellite generalization.
- Asia > Armenia > Yerevan > Yerevan (0.04)
- North America > United States > Colorado (0.04)
- North America > Canada > British Columbia (0.04)
- (3 more...)
Towards Piece-by-Piece Explanations for Chess Positions with SHAP
Contemporary chess engines offer precise yet opaque evaluations, typically expressed as centipawn scores. While effective for decision-making, these outputs obscure the underlying contributions of individual pieces or patterns. In this paper, we explore adapting SHAP (SHapley Additive exPlanations) to the domain of chess analysis, aiming to attribute a chess engines evaluation to specific pieces on the board. By treating pieces as features and systematically ablating them, we compute additive, per-piece contributions that explain the engines output in a locally faithful and human-interpretable manner. This method draws inspiration from classical chess pedagogy, where players assess positions by mentally removing pieces, and grounds it in modern explainable AI techniques. Our approach opens new possibilities for visualization, human training, and engine comparison. We release accompanying code and data to foster future research in interpretable chess AI.
- Europe > Switzerland (0.04)
- Europe > Netherlands (0.04)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- (3 more...)
Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?
Thelwall, Mike, Mohammadi, Ehsan
Assessing published academic journal articles is a common task for evaluations of departments and individuals. Whilst it is sometimes supported by citation data, Large Language Models (LLMs) may give more useful indications of article quality. Evidence of this capability exists for two of the largest LLM families, ChatGPT and Gemini, and the medium sized LLM Gemma3 27b, but it is unclear whether smaller LLMs and reasoning models have similar abilities. This is important because larger models may be slow and impractical in some situations, and reasoning models may perform differently. Four relevant questions are addressed with Gemma3 variants, Llama4 Scout, Qwen3, Magistral Small and DeepSeek R1, on a dataset of 2,780 medical, health and life science papers in 6 fields, with two different gold standards, one novel. The results suggest that smaller (open weights) and reasoning LLMs have similar performance to ChatGPT 4o-mini and Gemini 2.0 Flash, but that 1b parameters may often, and 4b sometimes, be too few. Moreover, averaging scores from multiple identical queries seems to be a universally successful strategy, and few-shot prompts (four examples) tended to help but the evidence was equivocal. Reasoning models did not have a clear advantage. Overall, the results show, for the first time, that smaller LLMs >4b, including reasoning models, have a substantial capability to score journal articles for research quality, especially if score averaging is used.
- North America > United States > South Carolina > Richland County > Columbia (0.04)
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Armenia > Yerevan > Yerevan (0.04)
Capability of using the normalizing flows for extraction rare gamma events in the TAIGA experiment
Kryukov, A. P., Razumov, A. Yu., Demichev, A. P., Dubenskaya, J. J., Gres, E. O., Polyakov, S. P., Postnikov, E. B., Volchugov, P. A., Zhurov, D. P.
The objective of this work is to develop a method for detecting rare gamma quanta against the background of charged particles in the fluxes from sources in the Universe with the help of the deep learning and normalizing flows based method designed for anomaly detection. It is shown that the suggested method has a potential for the gamma detection. The method was tested on model data from the TAIGA-IACT experiment. The obtained quantitative performance indicators are still inferior to other approaches, and therefore possible ways to improve the implementation of the method are proposed.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.07)
- Asia > Russia > Siberian Federal District > Irkutsk Oblast > Irkutsk (0.05)
- Asia > Armenia > Yerevan > Yerevan (0.04)
Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding
Pavlichenko, Nikita, Nazarov, Iurii, Dolgov, Ivan, Garanina, Ekaterina, Ustalov, Dmitry, Bondyrev, Ivan, Lysaniuk, Kseniia, Vu, Evgeniia, Chekmenev, Kirill, Shtok, Joseph, Golubev, Yaroslav, Semenkin, Anton, Sazanovich, Uladzislau
We present the Mellum models family, open-weight code completion models designed for interactive use in JetBrains IDEs. Mellums have 4B parameters, adopt a Llama-style architecture, and are pre-trained on ~4T tokens of permissively licensed, multi-language code. Our studies show that (i) careful data curation and staged training significantly improve the model's quality, (ii) editor-critical capabilities such as context packing are necessary for high-quality suggestions, and (iii) a compact, task-focused model can meet the cost and latency constraints of interactive completion. In the paper, we describe an end-to-end industrial pipeline for producing contextualized in-editor completion: disciplined data governance, multi-stage training that includes fill-in-the-middle and project context via supervised fine-tuning, and alignment via direct preference optimization using feedback from real-world scenarios. Our quality evaluations include both large-scale offline benchmarks and online telemetry from production deployments in JetBrains IDEs. Mellums are released under the Apache-2.0 license on HuggingFace, with a public model card providing a reproducible reference for practitioners. Our experience offers a pragmatic blueprint for taking a focused, open model from a research prototype to at scale production for hundreds of thousands of users.
- North America > United States > District of Columbia > Washington (0.05)
- Europe > Serbia > Central Serbia > Belgrade (0.04)
- Europe > Germany > Berlin (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Data Science > Data Quality (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
Simbeck, Katharina, Mahran, Mariam
Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.
- North America > United States > New York > New York County > New York City (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.14)
- (225 more...)
DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning
Gao, Yaxin, Lu, Yao, Zhang, Zongfei, Nie, Jiaqi, Yu, Shanqing, Xuan, Qi
Large language models (LLMs) have achieved remarkable success in many natural language processing (NLP) tasks. To achieve more accurate output, the prompts used to drive LLMs have become increasingly longer, which incurs higher computational costs. To address this prompt inflation problem, prompt compression has been proposed. However, most existing methods require training a small auxiliary model for compression, incurring a significant amount of additional computation. To avoid this, we propose a two-stage, training-free approach, called Dual-Stage Progressive Compression (DSPC). In the coarse-grained stage, semantic-related sentence filtering removes sentences with low semantic value based on TF-IDF. In the fine-grained stage, token importance is assessed using attention contribution, cross-model loss difference, and positional importance, enabling the pruning of low-utility tokens while preserving semantics. We validate DSPC on LLaMA-3.1-8B-Instruct and GPT-3.5-Turbo under a constrained token budget and observe consistent improvements. For instance, in the FewShot task of the Longbench dataset, DSPC achieves a performance of 49.17 by using only 3x fewer tokens, outperforming the best state-of-the-art baseline LongLLMLingua by 7.76.
A dating app, a niqab and a 9mm gun - how a US woman was hired to end a UK family feud
Betro initially fled the scene but returned by taxi just after midnight and fired three shots at the family home. By 13:30 BST, she was at Manchester Airport and flew to the US, prosecutors said. Days later, Nazir followed and according to Betro, the pair rented a car and drove to Seattle "just for a road trip" with stops at an amusement park, Area 51 in Nevada, Los Angeles and San Francisco. She told jurors she did not know there had been a shooting in Measham Grove and Nazir had not mentioned it during his time in the States. The investigation to find Betro and bring her co-conspirators to justice not only spanned several years but was hampered by the pandemic and involved the FBI, National Crime Agency and two UK police forces.
- Europe > United Kingdom (0.86)
- North America > United States > Nevada (0.30)
- North America > United States > California > San Francisco County > San Francisco (0.30)
- (2 more...)