selfie
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Switzerland > Fribourg > Fribourg (0.04)
- Europe > Spain (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Information Technology > Services (0.68)
- Law (0.68)
- Government (0.68)
- Health & Medicine > Therapeutic Area (0.46)
STAR-VAE: Latent Variable Transformers for Scalable and Controllable Molecular Generation
Kwon, Bum Chul, Shapira, Ben, Raboh, Moshiko, Sethi, Shreyans, Murarka, Shruti, Morrone, Joseph A, Hu, Jianying, Suryanarayanan, Parthasarathy
The chemical space of drug-like molecules is vast, motivating the development of generative models that must learn broad chemical distributions, enable conditional generation by capturing structure-property representations, and provide fast molecular generation. Meeting the objectives depends on modeling choices, including the probabilistic modeling approach, the conditional generative formulation, the architecture, and the molecular input representation. To address the challenges, we present STAR-VAE (Selfies-encoded, Transformer-based, AutoRegressive Variational Auto Encoder), a scalable latent-variable framework with a Transformer encoder and an autoregressive Transformer decoder. It is trained on 79 million drug-like molecules from PubChem, using SELFIES to guarantee syntactic validity. The latent-variable formulation enables conditional generation: a property predictor supplies a conditioning signal that is applied consistently to the latent prior, the inference network, and the decoder. Our contributions are: (i) a Transformer-based latent-variable encoder-decoder model trained on SELFIES representations; (ii) a principled conditional latent-variable formulation for property-guided generation; and (iii) efficient finetuning with low-rank adapters (LoRA) in both encoder and decoder, enabling fast adaptation with limited property and activity data. On the GuacaMol and MOSES benchmarks, our approach matches or exceeds baselines, and latent-space analyses reveal smooth, semantically structured representations that support both unconditional exploration and property-aware generation. On the Tartarus benchmarks, the conditional model shifts docking-score distributions toward stronger predicted binding. These results suggest that a modernized, scale-appropriate VAE remains competitive for molecular generation when paired with principled conditioning and parameter-efficient finetuning.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
When Face Recognition Doesn't Know Your Face Is a Face
When Face Recognition Doesn't Know Your Face Is a Face An estimated 100 million people live with facial differences. As face recognition tech becomes widespread, some say they're getting blocked from accessing essential systems and services. Autumn Gardiner thought updating her driving license would be straightforward. After getting married last year, she headed to the local Department of Motor Vehicles office in Connecticut to get her name changed on her license. While she was there, Gardiner recalls, officials said she needed to update her photo.
- North America > United States > Connecticut (0.25)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.05)
- North America > United States > Oregon (0.04)
- (7 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Transportation > Ground > Road (0.68)
Graph Diffusion Transformers are In-Context Molecular Designers
Liu, Gang, Chen, Jie, Zhu, Yihan, Sun, Michael, Luo, Tengfei, Chawla, Nitesh V, Jiang, Meng
In-context learning allows large models to adapt to new tasks from a few demonstrations, but it has shown limited success in molecular design. Existing databases such as ChEMBL contain molecular properties spanning millions of biological assays, yet labeled data for each property remain scarce. To address this limitation, we introduce demonstration-conditioned diffusion models (DemoDiff), which define task contexts using a small set of molecule-score examples instead of text descriptions. These demonstrations guide a denoising Transformer to generate molecules aligned with target properties. For scalable pretraining, we develop a new molecular tokenizer with Node Pair Encoding that represents molecules at the motif level, requiring 5.5$\times$ fewer nodes. We curate a dataset containing millions of context tasks from multiple sources covering both drugs and materials, and pretrain a 0.7-billion-parameter model on it. Across 33 design tasks in six categories, DemoDiff matches or surpasses language models 100-1000$\times$ larger and achieves an average rank of 3.63 compared to 5.25-10.20 for domain-specific approaches. These results position DemoDiff as a molecular foundation model for in-context molecular design. Our code is available at https://github.com/liugangcode/DemoDiff.
How to Make Large Language Models Generate 100% Valid Molecules?
Tao, Wen, Tang, Jing, Chan, Alvin, Hooi, Bryan, Bi, Baolong, Peng, Nanyun, Liu, Yuansheng, Wang, Yiwei
Molecule generation is key to drug discovery and materials science, enabling the design of novel compounds with specific properties. Large language models (LLMs) can learn to perform a wide range of tasks from just a few examples. However, generating valid molecules using representations like SMILES is challenging for LLMs in few-shot settings. In this work, we explore how LLMs can generate 100% valid molecules. We evaluate whether LLMs can use SELFIES, a representation where every string corresponds to a valid molecule, for valid molecule generation but find that LLMs perform worse with SELFIES than with SMILES. We then examine LLMs' ability to correct invalid SMILES and find their capacity limited. Finally, we introduce SmiSelf, a cross-chemical language framework for invalid SMILES correction. SmiSelf converts invalid SMILES to SELFIES using grammatical rules, leveraging SELFIES' mechanisms to correct the invalid SMILES. Experiments show that SmiSelf ensures 100% validity while preserving molecular characteristics and maintaining or even enhancing performance on other metrics. SmiSelf helps expand LLMs' practical applications in biomedicine and is compatible with all SMILES-based generative models. Code is available at https://github.com/wentao228/SmiSelf.
- Asia > China > Guangdong Province > Guangzhou (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Singapore (0.04)
- Research Report (1.00)
- Overview (0.93)
- Materials > Chemicals > Commodity Chemicals (0.46)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.34)
The biggest dating app photo turn-offs (and no, it's not holding a fish)
Choosing what pictures to include in your online dating profile is a big deal. Most people want to present a decent mix of flattering, fun and relaxed photos that showcase the best of you. But there are some in particular that should be avoided at all costs, experts say. A team from dating app Wisp asked 1,200 people for their biggest photo red flags that make them swipe left. The survey revealed 83 per cent of singles judge profiles on photos before reading a single word of your personal bio.
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Switzerland > Fribourg > Fribourg (0.04)
- Europe > Spain (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Information Technology > Services (0.68)
- Law (0.68)
- Government (0.68)
- Health & Medicine > Therapeutic Area (0.46)
MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs
Zhao, Guojiang, Li, Sihang, Lu, Zixiang, Cheng, Zheng, Lin, Haitao, Wu, Lirong, Xia, Hanchen, Cai, Hengxing, Guo, Wentao, Wang, Hongshuai, Xu, Mingjun, Zhu, Siyu, Ke, Guolin, Zhang, Linfeng, Gao, Zhifeng
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, yet their capabilities in molecular reasoning remain insufficiently explored. Current approaches tend to rely heavily on general-purpose prompting, which lacks domain-specific molecular semantics, while those that use fine-tuning strategies often face challenges with interpretability and reasoning depth. To address these issues, we introduce MolReasoner, a two-stage framework designed to transition LLMs from memorization towards chemical reasoning. First, we propose Mol-SFT, which initializes the model's reasoning abilities via synthetic Chain-of-Thought (CoT) samples generated by GPT -4o and verified for chemical accuracy. Subsequently, Mol-RL applies reinforcement learning with specialized reward functions designed explicitly to align chemical structures with linguistic descriptions, thereby enhancing molecular reasoning capabilities. Our approach notably enhances interpretability, improving the model's molecular understanding and enabling better generalization. Extensive experiments demonstrate that MolReasoner outperforms existing methods, and marking a significant shift from memorization-based outputs to robust chemical reasoning. Our code is available at https://github.
- Materials > Chemicals (1.00)
- Health & Medicine (0.69)
Roblox's New Age Verification Feature Uses AI to Scan Teens' Video Selfies
Roblox is rolling out new features aimed at making the platform safer for minors, including a revamped friend system, privacy tools, and age verification services users submit by recording a video selfie. In Roblox's old friend system, players have no distinction between people they know casually or online versus someone they consider a close friend. The platform's new tiered system introduces Connections and Trusted Connections specifically for people that players know and trust. To access Trusted Connections and its benefits, users first need to complete an age verification, which requires them to submit a video selfie. Once they've submitted their video, the company says it's run against an AI-driven "diverse dataset" to get an age estimation.
- Law > Litigation (0.78)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.52)
Evaluating Effects of Augmented SELFIES for Molecular Understanding Using QK-LSTM
Beaudoin, Collin, Ghosh, Swaroop
Identifying molecular properties, including side effects, is a critical yet time-consuming step in drug development. Failing to detect these side effects before regulatory submission can result in significant financial losses and production delays, and overlooking them during the regulatory review can lead to catastrophic consequences. This challenge presents an opportunity for innovative machine learning approaches, particularly hybrid quantum-classical models like the Quantum Kernel-Based Long Short-Term Memory (QK-LSTM) network. The QK-LSTM integrates quantum kernel functions into the classical LSTM framework, enabling the capture of complex, non-linear patterns in sequential data. By mapping input data into a high-dimensional quantum feature space, the QK-LSTM model reduces the need for large parameter sets, allowing for model compression without sacrificing accuracy in sequence-based tasks. Recent advancements have been made in the classical domain using augmented variations of the Simplified Molecular Line-Entry System (SMILES). However, to the best of our knowledge, no research has explored the impact of augmented SMILES in the quantum domain, nor the role of augmented Self-Referencing Embedded Strings (SELFIES) in either classical or hybrid quantum-classical settings. This study presents the first analysis of these approaches, providing novel insights into their potential for enhancing molecular property prediction and side effect identification. Results reveal that augmenting SELFIES yields in statistically significant improvements from SMILES by a 5.97% improvement for the classical domain and a 5.91% improvement for the hybrid quantum-classical domain.
- Research Report (0.50)
- Overview (0.46)