Yerevan
Analysis of the TAIGA-HiSCORE Data Using the Latent Space of Autoencoders
Dubenskaya, Yu. Yu., Polyakov, S. P., Kryukov, A. P., Demichev, A. P., Gres, E. O., Postnikov, E. B., Razumov, A. Yu., Volchugov, P. A., Zhurov, D. P.
The aim of extensive air shower (EAS) analysis is to reconstruct the physical parameters of the primary particle that initiated the shower. The TAIGA experiment is a hybrid detector system that combines several imaging atmospheric Cherenkov telescopes (IACTs) and an array of non-imaging Cherenkov detectors (TAIGA-HiSCORE) for EAS detection. Because the signals recorded by different detector types differ in physical nature, the direct merging of data is unfeasible, which complicates multimodal analysis. Currently, to analyze data from the IACTs and TAIGA-HiSCORE, a set of auxiliary parameters specific to each detector type is calculated from the recorded signals. These parameters are chosen empirically, so there is no certainty that they retain all important information and are the best suited for the respective problems. We propose to use autoencoders (AE) for the analysis of TAIGA experimental data and replace the conventionally used auxiliary parameters with the parameters of the AE latent space. The advantage of the AE latent space parameters is that they preserve essential physics from experimental data without prior assumptions. This approach also holds potential for enabling seamless integration of heterogeneous IACT and HiSCORE data through a joint latent space. To reconstruct the parameters of the primary particle of the EAS from the latent space of the AE, a separate artificial neural network is used. In this paper, the proposed approach is used to reconstruct the energy of the EAS primary particles based on Monte Carlo simulation data for TAIGA-HiSCORE. The dependence of the energy determination accuracy on the dimensionality of the latent space is analyzed, and these results are also compared with the results obtained by the conventional technique. It is shown that when using the AE latent space, the energy of the primary particle is reconstructed with satisfactory accuracy.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Russia > Siberian Federal District > Irkutsk Oblast > Irkutsk (0.04)
- Asia > Armenia > Yerevan > Yerevan (0.04)
GeoCrossBench: Cross-Band Generalization for Remote Sensing
Tamazyan, Hakob, Vanyan, Ani, Barseghyan, Alvard, Khosrovyan, Anna, Shelhamer, Evan, Khachatrian, Hrant
The number and diversity of remote sensing satellites grows over time, while the vast majority of labeled data comes from older satellites. As the foundation models for Earth observation scale up, the cost of (re-)training to support new satellites grows too, so the generalization capabilities of the models towards new satellites become increasingly important. In this work we introduce GeoCrossBench, an extension of the popular GeoBench benchmark with a new evaluation protocol: it tests the in-distribution performance; generalization to satellites with no band overlap; and generalization to satellites with additional bands with respect to the training set. We also develop a self-supervised extension of ChannelViT, ChiViT, to improve its cross-satellite performance. First, we show that even the best foundation models for remote sensing (DOFA, TerraFM) do not outperform general purpose models like DINOv3 in the in-distribution setting. Second, when generalizing to new satellites with no band overlap, all models suffer 2-4x drop in performance, and ChiViT significantly outperforms the runner-up DINOv3. Third, the performance of all tested models drops on average by 5-25\% when given additional bands during test time. Finally, we show that fine-tuning just the last linear layer of these models using oracle labels from all bands can get relatively consistent performance across all satellites, highlighting that the benchmark is far from being saturated. We publicly release the code and the datasets to encourage the development of more future-proof remote sensing models with stronger cross-satellite generalization.
- Asia > Armenia > Yerevan > Yerevan (0.04)
- North America > United States > Colorado (0.04)
- North America > Canada > British Columbia (0.04)
- (3 more...)
Towards Piece-by-Piece Explanations for Chess Positions with SHAP
Contemporary chess engines offer precise yet opaque evaluations, typically expressed as centipawn scores. While effective for decision-making, these outputs obscure the underlying contributions of individual pieces or patterns. In this paper, we explore adapting SHAP (SHapley Additive exPlanations) to the domain of chess analysis, aiming to attribute a chess engines evaluation to specific pieces on the board. By treating pieces as features and systematically ablating them, we compute additive, per-piece contributions that explain the engines output in a locally faithful and human-interpretable manner. This method draws inspiration from classical chess pedagogy, where players assess positions by mentally removing pieces, and grounds it in modern explainable AI techniques. Our approach opens new possibilities for visualization, human training, and engine comparison. We release accompanying code and data to foster future research in interpretable chess AI.
- Europe > Switzerland (0.04)
- Europe > Netherlands (0.04)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- (3 more...)
Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?
Thelwall, Mike, Mohammadi, Ehsan
Assessing published academic journal articles is a common task for evaluations of departments and individuals. Whilst it is sometimes supported by citation data, Large Language Models (LLMs) may give more useful indications of article quality. Evidence of this capability exists for two of the largest LLM families, ChatGPT and Gemini, and the medium sized LLM Gemma3 27b, but it is unclear whether smaller LLMs and reasoning models have similar abilities. This is important because larger models may be slow and impractical in some situations, and reasoning models may perform differently. Four relevant questions are addressed with Gemma3 variants, Llama4 Scout, Qwen3, Magistral Small and DeepSeek R1, on a dataset of 2,780 medical, health and life science papers in 6 fields, with two different gold standards, one novel. The results suggest that smaller (open weights) and reasoning LLMs have similar performance to ChatGPT 4o-mini and Gemini 2.0 Flash, but that 1b parameters may often, and 4b sometimes, be too few. Moreover, averaging scores from multiple identical queries seems to be a universally successful strategy, and few-shot prompts (four examples) tended to help but the evidence was equivocal. Reasoning models did not have a clear advantage. Overall, the results show, for the first time, that smaller LLMs >4b, including reasoning models, have a substantial capability to score journal articles for research quality, especially if score averaging is used.
- North America > United States > South Carolina > Richland County > Columbia (0.04)
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Armenia > Yerevan > Yerevan (0.04)
Capability of using the normalizing flows for extraction rare gamma events in the TAIGA experiment
Kryukov, A. P., Razumov, A. Yu., Demichev, A. P., Dubenskaya, J. J., Gres, E. O., Polyakov, S. P., Postnikov, E. B., Volchugov, P. A., Zhurov, D. P.
The objective of this work is to develop a method for detecting rare gamma quanta against the background of charged particles in the fluxes from sources in the Universe with the help of the deep learning and normalizing flows based method designed for anomaly detection. It is shown that the suggested method has a potential for the gamma detection. The method was tested on model data from the TAIGA-IACT experiment. The obtained quantitative performance indicators are still inferior to other approaches, and therefore possible ways to improve the implementation of the method are proposed.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.07)
- Asia > Russia > Siberian Federal District > Irkutsk Oblast > Irkutsk (0.05)
- Asia > Armenia > Yerevan > Yerevan (0.04)
Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding
Pavlichenko, Nikita, Nazarov, Iurii, Dolgov, Ivan, Garanina, Ekaterina, Ustalov, Dmitry, Bondyrev, Ivan, Lysaniuk, Kseniia, Vu, Evgeniia, Chekmenev, Kirill, Shtok, Joseph, Golubev, Yaroslav, Semenkin, Anton, Sazanovich, Uladzislau
We present the Mellum models family, open-weight code completion models designed for interactive use in JetBrains IDEs. Mellums have 4B parameters, adopt a Llama-style architecture, and are pre-trained on ~4T tokens of permissively licensed, multi-language code. Our studies show that (i) careful data curation and staged training significantly improve the model's quality, (ii) editor-critical capabilities such as context packing are necessary for high-quality suggestions, and (iii) a compact, task-focused model can meet the cost and latency constraints of interactive completion. In the paper, we describe an end-to-end industrial pipeline for producing contextualized in-editor completion: disciplined data governance, multi-stage training that includes fill-in-the-middle and project context via supervised fine-tuning, and alignment via direct preference optimization using feedback from real-world scenarios. Our quality evaluations include both large-scale offline benchmarks and online telemetry from production deployments in JetBrains IDEs. Mellums are released under the Apache-2.0 license on HuggingFace, with a public model card providing a reproducible reference for practitioners. Our experience offers a pragmatic blueprint for taking a focused, open model from a research prototype to at scale production for hundreds of thousands of users.
- North America > United States > District of Columbia > Washington (0.05)
- Europe > Serbia > Central Serbia > Belgrade (0.04)
- Europe > Germany > Berlin (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Data Science > Data Quality (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models
Simbeck, Katharina, Mahran, Mariam
Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.
- North America > United States > New York > New York County > New York City (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.14)
- (225 more...)
DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning
Gao, Yaxin, Lu, Yao, Zhang, Zongfei, Nie, Jiaqi, Yu, Shanqing, Xuan, Qi
Large language models (LLMs) have achieved remarkable success in many natural language processing (NLP) tasks. To achieve more accurate output, the prompts used to drive LLMs have become increasingly longer, which incurs higher computational costs. To address this prompt inflation problem, prompt compression has been proposed. However, most existing methods require training a small auxiliary model for compression, incurring a significant amount of additional computation. To avoid this, we propose a two-stage, training-free approach, called Dual-Stage Progressive Compression (DSPC). In the coarse-grained stage, semantic-related sentence filtering removes sentences with low semantic value based on TF-IDF. In the fine-grained stage, token importance is assessed using attention contribution, cross-model loss difference, and positional importance, enabling the pruning of low-utility tokens while preserving semantics. We validate DSPC on LLaMA-3.1-8B-Instruct and GPT-3.5-Turbo under a constrained token budget and observe consistent improvements. For instance, in the FewShot task of the Longbench dataset, DSPC achieves a performance of 49.17 by using only 3x fewer tokens, outperforming the best state-of-the-art baseline LongLLMLingua by 7.76.
Can Smaller Large Language Models Evaluate Research Quality?
Research evaluation is a common and important task for academics and managers, and it is often supported by citation - based indicators (Hicks et al., 2015; Moed, 2005; Mukherjee, 2022). With the increasingly widespread use of Artificial Intelligence (AI) in research ( Mohammadi et al., 2025), it is important to check whether it can save expert time through support of the research evaluation task. ChatGPT research quality score estimates for journal articles are recent alternative s to citations as quantitative indicator s to support evaluations ( Kousha & Thelwall, 2025) . Their value lies in their positive correlation with expert judgement in all or nearly all fields, and at a slightly higher rate than for citation - based indicators ( Thelwall, 2025abc). Despite some systematic biases or disparities ( Thelwall & Kurt, 2025), t his property means that they are helpful when expert judgement fails, such as fo r areas outside of the assessor's expertise, as a cross - check for bias, and for evaluations where assessment expertise is unavailable or too expensive for the value of the task (Thelwall, 2025d) . Whilst a positive correlation with expert judgement has been established for three of the largest Large Language Models (LLMs) in 2025, ChatGPT 4o, ChatGPT 4o - mini, and Google Gemini Flash 1.5 ( Thelwall, 2025ac), these are all cloud - based services and may be too expensive or not private enough for some research evaluation purposes ( Nowak et al., 2025) . Moreover, cloud - based services can be withdrawn, updated, or made more costly, so research evaluation procedures may not be able to rely on them. Thus, there is a need to test whether any smaller "open weights" LLMs ( Sowe et al., 2024) that can be downloaded and used offline have a capability to estimate research quality.
- Europe > United Kingdom (0.14)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- (2 more...)
Multiscale geometrical and topological learning in the analysis of soft matter collective dynamics
Orlova, Tetiana, Solis, Amaranta Membrillo, Sohn, Hayley R. O., Madeleine, Tristan, D'Alessandro, Giampaolo, Smalyukh, Ivan I., Kaczmarek, Malgosia, Brodzki, Jacek
Understanding the behavior and evolution of a dynamical many-body system by analyzing patterns in their experimentally captured images is a promising method relevant for a variety of living and non-living self-assembled systems. The arrays of moving liquid crystal skyrmions studied here are a representative example of hierarchically organized materials that exhibit complex spatiotemporal dynamics driven by multiscale processes. Joint geometric and topological data analysis (TDA) offers a powerful framework for investigating such systems by capturing the underlying structure of the data at multiple scales. In the TDA approach, we introduce the $Ψ$-function, a robust numerical topological descriptor related to both the spatiotemporal changes in the size and shape of individual topological solitons and the emergence of regions with their different spatial organization. The geometric method based on the analysis of vector fields generated from images of skyrmion ensembles offers insights into the nonlinear physical mechanisms of the system's response to external stimuli and provides a basis for comparison with theoretical predictions. The methodology presented here is very general and can provide a characterization of system behavior both at the level of individual pattern-forming agents and as a whole, allowing one to relate the results of image data analysis to processes occurring in a physical, chemical, or biological system in the real world.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- Asia > Armenia > Yerevan > Yerevan (0.04)
- (3 more...)