Goto

Collaborating Authors

 Tbilisi


ChessGPT: Bridging Policy Learning and Language Modeling Xidong Feng

Neural Information Processing Systems

Chess, one of the oldest and most universally played board games, presents an ideal testbed due to the wealth of both policy data and language data. In terms of policy data, it is reported that over ten million games are played daily on Chess.com, the most frequented online chess platform.




Energy Approach from $\varepsilon$-Graph to Continuum Diffusion Model with Connectivity Functional

Yang, Yahong, Lee, Sun, Calder, Jeff, Hao, Wenrui

arXiv.org Machine Learning

We derive an energy-based continuum limit for $\varepsilon$-graphs endowed with a general connectivity functional. We prove that the discrete energy and its continuum counterpart differ by at most $O(\varepsilon)$; the prefactor involves only the $W^{1,1}$-norm of the connectivity density as $\varepsilon\to0$, so the error bound remains valid even when that density has strong local fluctuations. As an application, we introduce a neural-network procedure that reconstructs the connectivity density from edge-weight data and then embeds the resulting continuum model into a brain-dynamics framework. In this setting, the usual constant diffusion coefficient is replaced by the spatially varying coefficient produced by the learned density, yielding dynamics that differ significantly from those obtained with conventional constant-diffusion models.


Georgia arrests three Chinese nationals for trying to illegally buy uranium

BBC News

Three Chinese nationals have been arrested in Georgia on suspicion of attempting to illegally purchase 2kg of uranium. Lasha Maghradze, deputy head of the nation's State Security Service (SSG), told a news briefing the group planned to pay $400,000 (£300,570) for the nuclear material in the capital, Tblisi, before transporting it to China via Russia. The alleged plot was unearthed by intelligence agents while one member of the group was attempting to buy the radioactive substance on the black market, he said. The three pleaded not guilty at a court in Tblisi and have been placed in custody to prevent them fleeing the country, according to public broadcaster Georgia Today. They face up to five years in prison under a provision of Georgia's criminal code banning the purchasing of nuclear material.





Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models

Simbeck, Katharina, Mahran, Mariam

arXiv.org Artificial Intelligence

Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.


Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration

Tsmindashvili, Tatia, Kolkhidashvili, Ana, Kurtskhalia, Dachi, Maghlakelidze, Nino, Mekvabishvili, Elene, Dentoshvili, Guram, Shamilov, Orkhan, Gachechiladze, Zaal, Saporta, Steven, Choladze, David Dachi

arXiv.org Artificial Intelligence

Using LLMs in a production environment presents security challenges that include vulnerabilities to jailbreaks and prompt injections, which can result in harmful outputs for humans or the enterprise. The challenge is amplified when working within a specific domain, as topics generally accepted for LLMs to address may be irrelevant to that field. These problems can be mitigated, for example, by fine-tuning large language models with domain-specific and security-focused data. However, these alone are insufficient, as jailbreak techniques evolve. Additionally, API-accessed models do not offer the flexibility needed to tailor behavior to industry-specific objectives, and in-context learning is not always sufficient or reliable. In response to these challenges, we introduce Archias, an expert model adept at distinguishing between in-domain and out-of-domain communications. Archias classifies user inquiries into several categories: in-domain (specifically for the automotive industry), malicious questions, price injections, prompt injections, and out-of-domain examples. Our methodology integrates outputs from the expert model (Archias) into prompts, which are then processed by the LLM to generate responses. This method increases the model's ability to understand the user's intention and give appropriate answers. Archias can be adjusted, fine-tuned, and used for many different purposes due to its small size. Therefore, it can be easily customized to the needs of any industry. To validate our approach, we created a benchmark dataset for the automotive industry. Furthermore, in the interest of advancing research and development, we release our benchmark dataset to the community.