Goto

Collaborating Authors

 ssi


StructuredDNA: A Bio-Physical Framework for Energy-Aware Transformer Routing

Hamdi, Mustapha

arXiv.org Artificial Intelligence

The rapid scaling of large computational models has led to a critical increase in energy and compute costs. Inspired by biological systems where structure and function emerge from low-energy configurations, we introduce StructuredDNA, a sparse architecture framework for modular, energy-aware Transformer routing. StructuredDNA replaces dense Mixture-of-Experts routing with a bio-physical, energy-guided routing layer based on semantic energy minimization. Inputs are dynamically grouped into semantic codons, and routing selects a single expert by minimizing a global energy functional that combines cohesion, uncertainty, and computational cost. We validate StructuredDNA on both specialized (BioASQ) and open-domain benchmarks (WikiText-103). On BioASQ (K = 50), we achieve a 97.7% reduction in Energy Utilization Density (EUD) and a Semantic Stability Index (SSI) of 0.998. We further demonstrate a Semantic Scaling Law on WikiText-103, showing that the architecture generalizes to open domains by scaling expert granularity (K = 2048) while maintaining more than 99% energy efficiency. StructuredDNA thus establishes a robust, domain-agnostic paradigm for future sparse computational frameworks. StructuredDNA provides an explicit link between bio-physical principles and sparse expert routing in Transformer architectures, and points toward future energy-aware, modular, and scalable computational systems. We discuss limitations of this proof-of-concept study and outline directions for scaling the approach to larger models, datasets, and hardware platforms. The StructuredDNA implementation is available at https://github.com/InnoDeep-repos/StructuredDNA .


Initial Investigation of LLM-Assisted Development of Rule-Based Clinical NLP System

Shi, Jianlin, Bucher, Brian T.

arXiv.org Artificial Intelligence

Despite advances in machine learning (ML) and large language models (LLMs), rule-based natural language processing (NLP) systems remain active in clinical settings due to their interpretability and operational efficiency. However, their manual development and maintenance are labor-intensive, particularly in tasks with large linguistic variability. To overcome these limitations, we proposed a novel approach employing LLMs solely during the rule-based systems development phase. We conducted the initial experiments focusing on the first two steps of developing a rule-based NLP pipeline: find relevant snippets from the clinical note; extract informative keywords from the snippets for the rule-based named entity recognition (NER) component. Our experiments demonstrated exceptional recall in identifying clinically relevant text snippets (Deepseek: 0.98, Qwen: 0.99) and 1.0 in extracting key terms for NER. This study sheds light on a promising new direction for NLP development, enabling semi-automated or automated development of rule-based systems with significantly faster, more cost-effective, and transparent execution compared with deep learning model-based solutions.


Can we Debias Social Stereotypes in AI-Generated Images? Examining Text-to-Image Outputs and User Perceptions

Barve, Saharsh, Mao, Andy, Shi, Jiayue Melissa, Juneja, Prerna, Saha, Koustuv

arXiv.org Artificial Intelligence

Recent advances in generative AI have enabled visual content creation through text-to-image (T2I) generation. However, despite their creative potential, T2I models often replicate and amplify societal stereotypes -- particularly those related to gender, race, and culture -- raising important ethical concerns. This paper proposes a theory-driven bias detection rubric and a Social Stereotype Index (SSI) to systematically evaluate social biases in T2I outputs. We audited three major T2I model outputs -- DALL-E-3, Midjourney-6.1, and Stability AI Core -- using 100 queries across three categories -- geocultural, occupational, and adjectival. Our analysis reveals that initial outputs are prone to include stereotypical visual cues, including gendered professions, cultural markers, and western beauty norms. To address this, we adopted our rubric to conduct targeted prompt refinement using LLMs, which significantly reduced bias -- SSI dropped by 61% for geocultural, 69% for occupational, and 51% for adjectival queries. We complemented our quantitative analysis through a user study examining perceptions, awareness, and preferences around AI-generated biased imagery. Our findings reveal a key tension -- although prompt refinement can mitigate stereotypes, it can limit contextual alignment. Interestingly, users often perceived stereotypical images to be more aligned with their expectations. We discuss the need to balance ethical debiasing with contextual relevance and call for T2I systems that support global diversity and inclusivity while not compromising the reflection of real-world social complexity.


How Syntax Specialization Emerges in Language Models

Duan, Xufeng, Yao, Zhaoqian, Zhang, Yunhao, Wang, Shaonan, Cai, Zhenguang G.

arXiv.org Artificial Intelligence

Large language models (LLMs) have been found to develop surprising internal specializations: Individual neurons, attention heads, and circuits become selectively sensitive to syntactic structure, reflecting patterns observed in the human brain. While this specialization is well-documented, how it emerges during training and what influences its development remains largely unknown. In this work, we tap into the black box of specialization by tracking its formation over time. By quantifying internal syntactic consistency across minimal pairs from various syntactic phenomena, we identify a clear developmental trajectory: Syntactic sensitivity emerges gradually, concentrates in specific layers, and exhibits a 'critical period' of rapid internal specialization. This process is consistent across architectures and initialization parameters (e.g., random seeds), and is influenced by model scale and training data. We therefore reveal not only where syntax arises in LLMs but also how some models internalize it during training. To support future research, we will release the code, models, and training checkpoints upon acceptance.


Generative Induction of Dialogue Task Schemas with Streaming Refinement and Simulated Interactions

Finch, James D., Josyula, Yasasvi, Choi, Jinho D.

arXiv.org Artificial Intelligence

In task-oriented dialogue (TOD) systems, Slot Schema Induction (SSI) is essential for automatically identifying key information slots from dialogue data without manual intervention. This paper presents a novel state-of-the-art (SoTA) approach that formulates SSI as a text generation task, where a language model incrementally constructs and refines a slot schema over a stream of dialogue data. To develop this approach, we present a fully automatic LLM-based TOD simulation method that creates data with high-quality state labels for novel task domains. Furthermore, we identify issues in SSI evaluation due to data leakage and poor metric alignment with human judgment. We resolve these by creating new evaluation data using our simulation method with human guidance and correction, as well as designing improved evaluation metrics. These contributions establish a foundation for future SSI research and advance the SoTA in dialogue understanding and system development.


Data-driven Surface Solar Irradiance Estimation using Neural Operators at Global Scale

Carpentieri, Alberto, Leinonen, Jussi, Adie, Jeff, Bonev, Boris, Folini, Doris, Hariri, Farah

arXiv.org Artificial Intelligence

Accurate surface solar irradiance (SSI) forecasting is essential for optimizing renewable energy systems, particularly in the context of long-term energy planning on a global scale. This paper presents a pioneering approach to solar radiation forecasting that leverages recent advancements in numerical weather prediction (NWP) and data-driven machine learning weather models. These advances facilitate long, stable rollouts and enable large ensemble forecasts, enhancing the reliability of predictions. Our flexible model utilizes variables forecast by these NWP and AI weather models to estimate 6-hourly SSI at global scale. Developed using NVIDIA Modulus, our model represents the first adaptive global framework capable of providing long-term SSI forecasts. Furthermore, it can be fine-tuned using satellite data, which significantly enhances its performance in the fine-tuned regions, while maintaining accuracy elsewhere. The improved accuracy of these forecasts has substantial implications for the integration of solar energy into power grids, enabling more efficient energy management and contributing to the global transition to renewable energy sources. Figure 1: 6-hourly averaged SSI forecasts over a 48-hour period.


Surface solar radiation: AI satellite retrieval can outperform Heliosat and generalizes well to other climate zones

Schuurman, K. R., Meyer, A.

arXiv.org Artificial Intelligence

Accurate estimates of surface solar irradiance (SSI) are essential for solar resource assessments and solar energy forecasts in grid integration and building control applications. SSI estimates for spatially extended regions can be retrieved from geostationary satellites such as Meteosat. Traditional SSI satellite retrievals like Heliosat rely on physical radiative transfer modelling. We introduce the first machine-learning-based satellite retrieval for instantaneous SSI and demonstrate its capability to provide accurate and generalizable SSI estimates across Europe. Our deep learning retrieval provides near real-time SSI estimates based on data-driven emulation of Heliosat and fine-tuning on pyranometer networks. By including SSI from ground stations, our SSI retrieval model can outperform Heliosat accuracy and generalize well to regions with other climates and surface albedos in cloudy conditions (clear-sky index < 0.8). We also show that the SSI retrieved from Heliosat exhibits large biases in mountain regions, and that training and fine-tuning our retrieval models on SSI data from ground stations strongly reduces these biases, outperforming Heliosat. Furthermore, we quantify the relative importance of the Meteosat channels and other predictor variables like solar zenith angle for the accuracy of our deep learning SSI retrieval model in different cloud conditions. We find that in cloudy conditions multiple near-infrared and infrared channels enhance the performance. Our results can facilitate the development of more accurate satellite retrieval models of surface solar irradiance.


Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Gupta, Animesh, Hasan, Irtiza, Prasad, Dilip K., Gupta, Deepak K.

arXiv.org Artificial Intelligence

Coreset selection is among the most effective ways to reduce the training time of CNNs, however, only limited is known on how the resultant models will behave under variations of the coreset size, and choice of datasets and models. Moreover, given the recent paradigm shift towards transformer-based models, it is still an open question how coreset selection would impact their performance. There are several similar intriguing questions that need to be answered for a wide acceptance of coreset selection methods, and this paper attempts to answer some of these. We present a systematic benchmarking setup and perform a rigorous comparison of different coreset selection methods on CNNs and transformers. Our investigation reveals that under certain circumstances, random selection of subsets is more robust and stable when compared with the SOTA selection methods. We demonstrate that the conventional concept of uniform subset sampling across the various classes of the data is not the appropriate choice. Rather samples should be adaptively chosen based on the complexity of the data distribution for each class. Transformers are generally pretrained on large datasets, and we show that for certain target datasets, it helps to keep their performance stable at even very small coreset sizes. We further show that when no pretraining is done or when the pretrained transformer models are used with non-natural images (e.g. medical data), CNNs tend to generalize better than transformers at even very small coreset sizes. Lastly, we demonstrate that in the absence of the right pretraining, CNNs are better at learning the semantic coherence between spatially distant objects within an image, and these tend to outperform transformers at almost all choices of the coreset size.


On the Value of Stochastic Side Information in Online Learning

Jia, Junzhang, Wu, Xuetong, Zhu, Jingge, Evans, Jamie

arXiv.org Artificial Intelligence

As a common situation in practice, the forecaster could access some additional resources which we call it side information, We study the effectiveness of stochastic side information in deterministic that may provide some useful knowledge on the online learning scenarios. We propose a forecaster sequence of interest. Cover and Ordentlich [10] first studied to predict a deterministic sequence where its performance is a portfolio investment problem where the sequence of interest evaluated against an expert class. We assume that certain is the stock vectors that may depend on some finite-valued stochastic side information is available to the forecaster but states (as side information), and their proposed forecaster can not the experts. We define the minimax expected regret for achieve the same wealth as the best side information dependent evaluating the forecaster's performance, for which we obtain investment strategy. Xie and Barron [11] studied the case when both upper and lower bounds. Consequently, our results characterize the sequence of interest is generated according to a pair-wise the improvement in the regret due to the stochastic parametric distribution conditioning on the side information, side information. Compared with the classical online learning and derived an logarithmic upper bound of the minimax regret.


Digital Twins and Self-Sovereign Identity: Build the next generation of Simulation with privacy preservation IoTPractitioner.com The IoT Portal Platform

#artificialintelligence

The rise in the use of advanced analytics, machine learning (ML) and Artificial Intelligence (AI) and the Internet of Things (IoT) today have driven the technology of simulation into the concept of the digital twin. Digital twins are generally defined as a virtual digital model of a physical system that is used to make better decisions about the real world physical system. Digital twins are usually intertwined with sensors and include a two-way interaction between the physical and digital twin. The growth of "smart" IoT devices – devices with an increasing amount of processing or computational capability – results in more use cases where devices perform independent functions or complex system behaviors. These devices must both provide cryptographic trust to participate in network data exchange and be assured they can trust the network and nodes to which they communicate.