Materials
Differentiable High-Order Markov Models for Spectrum Prediction
Corlay, Vincent, Nakazato, Tatsuya, Yamaguchi, Kanako, Nakajima, Akinori
The advent of deep learning and recurrent neural networks revolutionized the field of time-series processing. Therefore, recent research on spectrum prediction has focused on the use of these tools. However, spectrum prediction, which involves forecasting wireless spectrum availability, is an older field where many "classical" tools were considered around the 2010s, such as Markov models. This work revisits high-order Markov models for spectrum prediction in dynamic wireless environments. We introduce a framework to address mismatches between sensing length and model order as well as state-space complexity arising with large order. Furthermore, we extend this Markov framework by enabling fine-tuning of the probability transition matrix through gradient-based supervised learning, offering a hybrid approach that bridges probabilistic modeling and modern machine learning. Simulations on real-world Wi-Fi traffic demonstrate the competitive performance of high-order Markov models compared to deep learning methods, particularly in scenarios with constrained datasets containing outliers.
Materials Learning Algorithms (MALA): Scalable Machine Learning for Electronic Structure Calculations in Large-Scale Atomistic Simulations
Cangi, Attila, Fiedler, Lenz, Brzoza, Bartosz, Shah, Karan, Callow, Timothy J., Kotik, Daniel, Schmerler, Steve, Barry, Matthew C., Goff, James M., Rohskopf, Andrew, Vogel, Dayton J., Modine, Normand, Thompson, Aidan P., Rajamanickam, Sivasankaran
We present the Materials Learning Algorithms (MALA) package, a scalable machine learning framework designed to accelerate density functional theory (DFT) calculations suitable for large-scale atomistic simulations. Using local descriptors of the atomic environment, MALA models efficiently predict key electronic observables, including local density of states, electronic density, density of states, and total energy. The package integrates data sampling, model training and scalable inference into a unified library, while ensuring compatibility with standard DFT and molecular dynamics codes. We demonstrate MALA's capabilities with examples including boron clusters, aluminum across its solid-liquid phase boundary, and predicting the electronic structure of a stacking fault in a large beryllium slab. Scaling analyses reveal MALA's computational efficiency and identify bottlenecks for future optimization. With its ability to model electronic structures at scales far beyond standard DFT, MALA is well suited for modeling complex material systems, making it a versatile tool for advanced materials research.
Deep Learning for GWP Prediction: A Framework Using PCA, Quantile Transformation, and Ensemble Modeling
Rajapriya, Navin, Kawajiri, Kotaro
Developing environmentally sustainable refrigerants is critical for mitigating the impact of anthropogenic greenhouse gases on global warming. This study presents a predictive modeling framework to estimate the 100-year global warming potential (GWP 100) of single-component refrigerants using a fully connected neural network implemented on the Multi-Sigma platform. Molecular descriptors from RDKit, Mordred, and alvaDesc were utilized to capture various chemical features. The RDKit-based model achieved the best performance, with a Root Mean Square Error (RMSE) of 481.9 and an R2 score of 0.918, demonstrating superior predictive accuracy and generalizability. Dimensionality reduction through Principal Component Analysis (PCA) and quantile transformation were applied to address the high-dimensional and skewed nature of the dataset,enhancing model stability and performance. Factor analysis identified vital molecular features, including molecular weight, lipophilicity, and functional groups, such as nitriles and allylic oxides, as significant contributors to GWP values. These insights provide actionable guidance for designing environmentally sustainable refrigerants. Integrating RDKit descriptors with Multi-Sigma's framework, which includes PCA, quantile transformation, and neural networks, provides a scalable solution for the rapid virtual screening of low-GWP refrigerants. This approach can potentially accelerate the identification of eco-friendly alternatives, directly contributing to climate mitigation by enabling the design of next-generation refrigerants aligned with global sustainability objectives.
DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models
Zhang, Yudong, Xie, Ruobing, Chen, Jiansheng, Sun, Xingwu, kang, Zhanhui, Wang, Yu
Large vision-language models (LVLMs) have demonstrated exceptional performance on complex multimodal tasks. However, they continue to suffer from significant hallucination issues, including object, attribute, and relational hallucinations. To accurately detect these hallucinations, we investigated the variations in cross-modal attention patterns between hallucination and non-hallucination states. Leveraging these distinctions, we developed a lightweight detector capable of identifying hallucinations. Our proposed method, Detecting Hallucinations by Cross-modal Attention Patterns (DHCP), is straightforward and does not require additional LVLM training or extra LVLM inference steps. Experimental results show that DHCP achieves remarkable performance in hallucination detection. By offering novel insights into the identification and analysis of hallucinations in LVLMs, DHCP contributes to advancing the reliability and trustworthiness of these models.
The Bigger the Better? Accurate Molecular Potential Energy Surfaces from Minimalist Neural Networks
Käser, Silvan, Koner, Debasish, Meuwly, Markus
Atomistic simulations are a powerful tool for studying the dynamics of molecules, proteins, and materials on wide time and length scales. Their reliability and predictiveness, however, depend directly on the accuracy of the underlying potential energy surface (PES). Guided by the principle of parsimony this work introduces KerNN, a combined kernel/neural network-based approach to represent molecular PESs. Compared to state-of-the-art neural network PESs the number of learnable parameters of KerNN is significantly reduced. This speeds up training and evaluation times by several orders of magnitude while retaining high prediction accuracy. Importantly, using kernels as the features also improves the extrapolation capabilities of KerNN far beyond the coverage provided by the training data which solves a general problem of NN-based PESs. KerNN applied to spectroscopy and reaction dynamics shows excellent performance on test set statistics and observables including vibrational bands computed from classical and quantum simulations.
Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Cui, Chenhang, Deng, Gelei, Zhang, An, Zheng, Jingnan, Li, Yicong, Gao, Lianli, Zhang, Tianwei, Chua, Tat-Seng
Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications. Despite this great success, the safety guardrail of LVLMs may not cover the unforeseen domains introduced by the visual modality. Existing studies primarily focus on eliciting LVLMs to generate harmful responses via carefully crafted image-based jailbreaks designed to bypass alignment defenses. In this study, we reveal that a safe image can be exploited to achieve the same jailbreak consequence when combined with additional safe images and prompts. This stems from two fundamental properties of LVLMs: universal reasoning capabilities and safety snowball effect. Building on these insights, we propose Safety Snowball Agent (SSA), a novel agent-based framework leveraging agents' autonomous and tool-using abilities to jailbreak LVLMs. SSA operates through two principal stages: (1) initial response generation, where tools generate or retrieve jailbreak images based on potential harmful intents, and (2) harmful snowballing, where refined subsequent prompts induce progressively harmful outputs. Our experiments demonstrate that \ours can use nearly any image to induce LVLMs to produce unsafe content, achieving high success jailbreaking rates against the latest LVLMs. Unlike prior works that exploit alignment flaws, \ours leverages the inherent properties of LVLMs, presenting a profound challenge for enforcing safety in generative multimodal systems. Our code is avaliable at \url{https://github.com/gzcch/Safety_Snowball_Agent}.
MOLA: Enhancing Industrial Process Monitoring Using Multi-Block Orthogonal Long Short-Term Memory Autoencoder
Ma, Fangyuan, Ji, Cheng, Wang, Jingde, Sun, Wei, Tang, Xun, Jiang, Zheyu
In this work, we introduce MOLA: a Multi-block Orthogonal Long short-term memory Autoencoder paradigm, to conduct accurate, reliable fault detection of industrial processes. To achieve this, MOLA effectively extracts dynamic orthogonal features by introducing an orthogonality-based loss function to constrain the latent space output. This helps eliminate the redundancy in the features identified, thereby improving the overall monitoring performance. On top of this, a multi-block monitoring structure is proposed, which categorizes the process variables into multiple blocks by leveraging expert process knowledge about their associations with the overall process. Each block is associated with its specific Orthogonal Long short-term memory Autoencoder model, whose extracted dynamic orthogonal features are monitored by distance-based Hotelling's $T^2$ statistics and quantile-based cumulative sum (CUSUM) designed for multivariate data streams that are nonparametric, heterogeneous in nature. Compared to having a single model accounting for all process variables, such a multi-block structure improves the overall process monitoring performance significantly, especially for large-scale industrial processes. Finally, we propose an adaptive weight-based Bayesian fusion (W-BF) framework to aggregate all block-wise monitoring statistics into a global statistic that we monitor for faults, with the goal of improving fault detection speed by assigning weights to blocks based on the sequential order where alarms are raised. We demonstrate the efficiency and effectiveness of our MOLA framework by applying it to the Tennessee Eastman Process and comparing the performance with various benchmark methods.
Biomolecular Analysis of Soil Samples and Rock Imagery for Tracing Evidence of Life Using a Mobile Robot
Siddique, Shah Md Ahasan, Rinath, Ragib Tahshin, Mosharrof, Shakil, Mahmud, Syed Tanjib, Ahmed, Sakib
The search for evidence of past life on Mars presents a tremendous challenge that requires the usage of very advanced robotic technologies to overcome it. Current digital microscopic imagers and spectrometers used for astrobiological examination suffer from limitations such as insufficient resolution, narrow detection range, and lack of portability. To overcome these challenges, this research study presents modifications to the Phoenix rover to expand its capability for detecting biosignatures on Mars. This paper examines the modifications implemented on the Phoenix rover to enhance its capability to detect a broader spectrum of biosignatures. One of the notable improvements comprises the integration of advanced digital microscopic imagers and spectrometers, enabling high-resolution examination of soil samples. Additionally, the mechanical components of the device have been reinforced to enhance maneuverability and optimize subsurface sampling capabilities. Empirical investigations have demonstrated that Phoenix has the capability to navigate diverse geological environments and procure samples for the purpose of biomolecular analysis. The biomolecular instrumentation and hybrid analytical methods showcased in this study demonstrate considerable potential for future astrobiology missions on Mars. The potential for enhancing the system lies in the possibility of broadening the range of detectable biomarkers and biosignatures.
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
Sinha, Sankalp, Khan, Mohammad Sadil, Usama, Muhammad, Sam, Shino, Stricker, Didier, Ali, Sk Aziz, Afzal, Muhammad Zeshan
Generating high-fidelity 3D content from text prompts remains a significant challenge in computer vision due to the limited size, diversity, and annotation depth of the existing datasets. To address this, we introduce MARVEL-40M+, an extensive dataset with 40 million text annotations for over 8.9 million 3D assets aggregated from seven major 3D datasets. Our contribution is a novel multi-stage annotation pipeline that integrates open-source pretrained multi-view VLMs and LLMs to automatically produce multi-level descriptions, ranging from detailed (150-200 words) to concise semantic tags (10-20 words). This structure supports both fine-grained 3D reconstruction and rapid prototyping. Furthermore, we incorporate human metadata from source datasets into our annotation pipeline to add domain-specific information in our annotation and reduce VLM hallucinations. Additionally, we develop MARVEL-FX3D, a two-stage text-to-3D pipeline. We fine-tune Stable Diffusion with our annotations and use a pretrained image-to-3D network to generate 3D textured meshes within 15s. Extensive evaluations show that MARVEL-40M+ significantly outperforms existing datasets in annotation quality and linguistic diversity, achieving win rates of 72.41% by GPT-4 and 73.40% by human evaluators.
Data-driven development of cycle prediction models for lithium metal batteries using multi modal mining
Lee, Jaewoong, Woo, Junhee, Kim, Sejin, Paulina, Cinthya, Park, Hyunmin, Kim, Hee-Tak, Park, Steve, Kim, Jihan
These authors contributed equally: J. Lee, J. Woo *: Corresponding author Corresponding author Email: Jihankim@kaist.ac.kr (Jihan Kim), stevepark@kaist.ac.kr (Steve Park), heetak.kim@kaist.ac.kr (Hee-Tak Kim) Abstract Recent advances in data-driven research have shown great potential in understanding the intricate relationships between materials and their performances. Herein, we introduce a novel multi modal data-driven approach employing an Automatic Battery data Collector (ABC) that integrates a large language model (LLM) with an automatic graph mining tool, Material Graph Digitizer (MatGD). This platform enables state-of-the-art accurate extraction of battery material data and cyclability performance metrics from diverse textual and graphical data sources. From the database derived through the ABC platform, we developed machine learning models that can accurately predict the capacity and stability of lithium metal batteries, which is the first-ever model developed to achieve such predictions. Our models were also experimentally validated, confirming practical applicability and reliability of our data-driven approach. INTRODUCTION Lithium metal batteries (LMBs) are a promising next-generation device that can achieve high capacity using lithium metal as an anode due to its exceptionally low density (0.534 g cm Therefore, these studies lack sufficient information to discern a comprehensive effect of different components on the battery performance. Additionally, previous mining research focused not on the entire battery cells but rather on the characteristics of individual battery components. Moreover, these studies were limited by the small number of entities considered and did not extract quantitative information such as concentrations or ratios. Furthermore, the absence of automatic graph mining tools made it difficult to obtain performance data from graphs, such as specific capacity and cycle stability.