Goto

Collaborating Authors

 ion


Chemistry Integrated Language Model using Hierarchical Molecular Representation for Polymer Informatics

Ahn, Jihun, Irianti, Gabriella Pasya, Thapar, Vikram, Hur, Su-Mi

arXiv.org Artificial Intelligence

Machine learning has transformed material discovery for inorganic compounds and small molecules, yet polymers remain largely inaccessible to these methods. While data scarcity is often cited as the primary bottleneck, we demonstrate that strategic molecular representations can overcome this limitation. We introduce CI-LLM (Chemically Informed Language Model), a framework combining HAPPY (Hierarchically Abstracted rePeat unit of PolYmer), which encodes chemical substructures as tokens, with numerical descriptors within transformer architectures. For property prediction, De$^3$BERTa, our descriptor-enriched encoder, achieves 3.5x faster inference than SMILES-based models with improved accuracy ($R^2$ score gains of 0.9-4.1 percent across four properties), while providing interpretable structure-property insights at the subgroup level. For inverse design, our GPT-based generator produces polymers with targeted properties, achieving 100 percent scaffold retention and successful multi-property optimization for negatively correlated objectives. This comprehensive framework demonstrates both forward prediction and inverse design capabilities, showcasing how strategic molecular representation advances machine learning applications in polymer science.


Tech Billionaires Already Captured the White House. They Still Want to Be Kings

WIRED

From Montenegro to northern California, the tech elite dream of building cities where they make the rules. Is this, finally, their moment? The shirtless man in the golden mask and cape has plans to lead his own country one day. There is no location yet, but it will be a crypto-and AI-powered paradise of medical experimentation, filled with people who want to "make death optional," he says. For now, though, he's leading a sparsely attended rave on the second floor of a San Francisco office building. A DJ is spinning at one end of an open room. A handful of people sway and jump on the space cleared out as a dance floor. At a nearby table, coffee is available with many alternative milks.


Arabic Chatbot Technologies in Education: An Overview

Bourhil, Hicham, Younoussi, Yacine El

arXiv.org Artificial Intelligence

The recent advancements in Artificial Intelligence (AI) in general, and in Natural Language Processing (NLP) in particular, and some of its applications such as chatbots, have led to their implementation in different domains like education, healthcare, tourism, and customer service. Since the COVID-19 pandemic, there has been an increasing interest in these digital technologies to allow and enhance remote access. In education, e-learning systems have been massively adopted worldwide. The emergence of Large Language Models (LLM) such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformers) made chatbots even more popular. In this study, we present a survey on existing Arabic chatbots in education and their different characteristics such as the adopted approaches, language variety, and metrics used to measure their performance. We were able to identified some research gaps when we discovered that, despite the success of chatbots in other languages such as English, only a few educational Arabic chatbots used modern techniques. Finally, we discuss future directions of research in this field.


PUB: A Plasma-Propelled Ultra-Quiet Blimp with Two-DOF Vector Thrusting

Wang, Zihan

arXiv.org Artificial Intelligence

In 2024, the "low-altitude economy" was written into China's Government Work Report for the first time [1], and flying robots have been rapidly popularized nationwide. From an environmental perspective, electrically powered air vehicles are attracting growing attention; key technologies include overall configuration design, integrated energy management, and high-efficiency, high power-to-weight electric propulsion [2]. For electric propulsion, mainstream systems use electric motors to drive propellers, but propeller noise is significant and hard to mitigate [3], which limits widespread use in cities--the main arena for the low-altitude economy--and is also unfavorable for silent reconnaissance. Hence, there is a pressing need for a new propulsion approach enabling quiet, fully electric flight. In the 1920s, Brown observed that an asymmetric capacitor under high voltage can generate thrust, known as the Biefeld-Brown effect. A leading explanation is ionic wind: a high electric field ionizes air, and the resulting ions accelerate and transfer momentum to neutral molecules, producing a net airflow (thrust) [4]. Xu et al. first mounted a plasma thruster on a fixed-wing UAV without other propulsion; the gliding distance with the thruster on was five times that with it off, but the maximum range was only 45m and no controller design was provided [5]. Zhang et al. realized altitude control for a micro ionic-wind-powered UA V using passive components, but the wingspan was at most 6 .3cm


Pep2Prob Benchmark: Predicting Fragment Ion Probability for MS$^2$-based Proteomics

Xu, Hao, Wang, Zhichao, Sang, Shengqi, Wajanasara, Pisit, Bandeira, Nuno

arXiv.org Artificial Intelligence

Proteins perform nearly all cellular functions and constitute most drug targets, making their analysis fundamental to understanding human biology in health and disease. Tandem mass spectrometry (MS$^2$) is the major analytical technique in proteomics that identifies peptides by ionizing them, fragmenting them, and using the resulting mass spectra to identify and quantify proteins in biological samples. In MS$^2$ analysis, peptide fragment ion probability prediction plays a critical role, enhancing the accuracy of peptide identification from mass spectra as a complement to the intensity information. Current approaches rely on global statistics of fragmentation, which assumes that a fragment's probability is uniform across all peptides. Nevertheless, this assumption is oversimplified from a biochemical principle point of view and limits accurate prediction. To address this gap, we present Pep2Prob, the first comprehensive dataset and benchmark designed for peptide-specific fragment ion probability prediction. The proposed dataset contains fragment ion probability statistics for 608,780 unique precursors (each precursor is a pair of peptide sequence and charge state), summarized from more than 183 million high-quality, high-resolution, HCD MS$^2$ spectra with validated peptide assignments and fragmentation annotations. We establish baseline performance using simple statistical rules and learning-based methods, and find that models leveraging peptide-specific information significantly outperform previous methods using only global fragmentation statistics. Furthermore, performance across benchmark models with increasing capacities suggests that the peptide-fragmentation relationship exhibits complex nonlinearities requiring sophisticated machine learning approaches.


Impact of a Deployed LLM Survey Creation Tool through the IS Success Model

Jiang, Peng, de Lira, Vinicius Cezar Monteiro, Maiorino, Antonio

arXiv.org Artificial Intelligence

Surveys are a cornerstone of Information Systems (IS) research, yet creating high-quality surveys remains labor-intensive, requiring both domain expertise and methodological rigor. With the evolution of large language models (LLMs), new opportunities emerge to automate survey generation. This paper presents the real-world deployment of an LLM-powered system designed to accelerate data collection while maintaining survey quality. Deploying such systems in production introduces real-world complexity, including diverse user needs and quality control. We evaluate the system using the DeLone and McLean IS Success Model to understand how generative AI can reshape a core IS method. This study makes three key contributions. To our knowledge, this is the first application of the IS Success Model to a generative AI system for survey creation. In addition, we propose a hybrid evaluation framework combining automated and human assessments. Finally, we implement safeguards that mitigate post-deployment risks and support responsible integration into IS workflows.


Linear Attention for Efficient Bidirectional Sequence Modeling

Afzal, Arshia, Rocamora, Elias Abad, Candogan, Leyla Naz, Puigdemont, Pol, Tonin, Francesco, Wu, Yongtao, Shoaran, Mahsa, Cevher, Volkan

arXiv.org Artificial Intelligence

Transformers with linear attention enable fast and parallel training. Moreover, they can be formulated as Recurrent Neural Networks (RNNs), for efficient linear-time inference. While extensively evaluated in causal sequence modeling, they have yet to be extended to the bidirectional setting. This work introduces the LION framework, establishing new theoretical foundations for linear transformers in bidirectional sequence modeling. LION constructs a bidirectional RNN equivalent to full Linear Attention. This extends the benefits of linear transformers: parallel training, and efficient inference, into the bidirectional setting. Using LION, we cast three linear transformers to their bidirectional form: LION-LIT, the bidirectional variant corresponding to (Katharopoulos et al., 2020); LION-D, extending RetNet (Sun et al., 2023); and LION-S, a linear transformer with a stable selective mask inspired by selectivity of SSMs (Dao & Gu, 2024). Replacing the attention block with LION (-LIT, -D, -S) achieves performance on bidirectional tasks that approaches that of Transformers and State-Space Models (SSMs), while delivering significant improvements in training speed. Our implementation is available in http://github.com/LIONS-EPFL/LION.


New Insight in Cervical Cancer Diagnosis Using Convolution Neural Network Architecture

Khozaimi, Ach., Mahmudy, Wayan Firdaus

arXiv.org Artificial Intelligence

The Pap smear is a screening method for early cervical cancer diagnosis. The selection of the right optimizer in the convolutional neural network (CNN) model is key to the success of the CNN in image classification, including the classification of cervical cancer Pap smear images. In this study, stochastic gradient descent (SGD), RMSprop, Adam, AdaGrad, AdaDelta, Adamax, and Nadam optimizers were used to classify cervical cancer Pap smear images from the SipakMed dataset. Resnet-18, Resnet-34, and VGG-16 are the CNN architectures used in this study, and each architecture uses a transfer-learning model. Based on the test results, we conclude that the transfer learning model performs better on all CNNs and optimization techniques and that in the transfer learning model, the optimization has little influence on the training of the model. Adamax, with accuracy values of 72.8% and 66.8%, had the best accuracy for the VGG-16 and Resnet-18 architectures, respectively. Resnet-34 had 54.0%. This is 0.034% lower than Nadam. Overall, Adamax is a suitable optimizer for CNN in cervical cancer classification on Resnet-18, Resnet-34, and VGG-16 architectures. This study provides new insights into the configuration of CNN models for Pap smear image analysis.


Prime Minister of European Country Names AI as Advisor

#artificialintelligence

Both the rapidly increasing ubiquity and the power of AI can feel terrifying, but at least AIs aren't running the world yet, right? The country's prime minister Nicolae Ciuca has just announced an AI assistant called "Ion" as the government's "new honorary advisor." Now, my role is to represent you. Like a mirror," Ion said while introducing itself at a press conference, as quoted by The Washington Post. PM @NicolaeCiuca: As of today, the Government of Romania has the first government adviser running on AI, a good example of emerging technologies in public service.


Romania's prime minister has hired the world's first AI government adviser. What will it do?

#artificialintelligence

In a world first, Romania's prime minister unveiled a new honorary government adviser that will be joining his team – run entirely on artificial intelligence (AI). The AI is called Ion and consists of a mirror-like surface that displays text as well as at times a male or female face that responds in a calm voice. "Hi, you gave me life and my role is now to represent you, like a mirror," Ion's voice said at the launch. "What should I know about Romania?" The AI-powered adviser was developed by researchers to quickly analyse the opinions of Romanian citizens on key issues and policies.