Goto

Collaborating Authors

 smile



GraphT5: Unified Molecular Graph-Language Modeling via Multi-Modal Cross-Token Attention

Kim, Sangyeup, Kim, Nayeon, Piao, Yinhua, Kim, Sun

arXiv.org Artificial Intelligence

Molecular language modeling tasks such as molecule captioning have been recognized for their potential to further understand molecular properties that can aid drug discovery or material synthesis based on chemical reactions. Unlike the common use of molecule graphs in predicting molecular properties, most methods in molecular language modeling rely heavily on SMILES sequences. This preference is because the task involves generating a sequence of multiple tokens using transformer-based models. Therefore, a main challenge is determining how to integrate graph data, which contains structural and spatial information about molecules, with text data. In addition, simply using both 1D SMILES text and 2D graph as inputs without addressing how they align and represent the molecule structure in different modalities makes it challenging to fully utilize structural knowledge about molecules. To this end, we propose GraphT5, a multi-modal framework that integrates 1D SMILES text and 2D graph representations of molecules for molecular language modeling. Specifically, we introduce a novel cross-token attention module in GraphT5 to bridge the gap arising from the fundamental differences between the two modalities of molecule representations. Cross-token attention exploits implicit information between SMILES and graphs of molecules, resulting from their interactions at a fine-grained token level that benefits molecular language modeling. Extensive experiments including molecule captioning, IUPAC name prediction tasks, and case studies show that our GraphT5 outperforms the latest baseline approaches, which validates the effectiveness of our GraphT5 in sufficiently utilizing 1D SMILES text and 2D graph representations.


Representation of Molecules via Algebraic Data Types : Advancing Beyond SMILES & SELFIES

Goldstein, Oliver, March, Samuel

arXiv.org Artificial Intelligence

We introduce a novel molecular representation through Algebraic Data Types (ADTs) - composite data structures formed through the combination of simpler types that obey algebraic laws. By explicitly considering how the datatype of a representation constrains the operations which may be performed, we ensure meaningful inference can be performed over generative models (programs with sample} and score operations). This stands in contrast to string-based representations where string-type operations may only indirectly correspond to chemical and physical molecular properties, and at worst produce nonsensical output. The ADT presented implements the Dietz representation for molecular constitution via multigraphs and bonding systems, and uses atomic coordinate data to represent 3D information and stereochemical features. This creates a general digital molecular representation which surpasses the limitations of the string-based representations and the 2D-graph based models on which they are based. In addition, we present novel support for quantum information through representation of shells, subshells, and orbitals, greatly expanding the representational scope beyond current approaches, for instance in Molecular Orbital theory. The framework's capabilities are demonstrated through key applications: Bayesian probabilistic programming is demonstrated through integration with LazyPPL, a lazy probabilistic programming library; molecules are made instances of a group under rotation, necessary for geometric learning techniques which exploit the invariance of molecular properties under different representations; and the framework's flexibility is demonstrated through an extension to model chemical reactions. After critiquing previous representations, we provide an open-source solution in Haskell - a type-safe, purely functional programming language.


GitHub - haifengl/smile: Statistical Machine Intelligence & Learning Engine

#artificialintelligence

Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance. Smile is well documented and please check out the project website for programming guides and more information. Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc. Feature Selection: Genetic Algorithm based Feature Selection, Ensemble Learning based Feature Selection, TreeSHAP, Signal Noise ratio, Sum Squares ratio. You can use the libraries through Maven central repository by adding the following to your project pom.xml file.


What Makes the Smiles in em Smile /em So Freaking Creepy

Slate

How can something understood as the universal symbol for joy so easily become the makings of our worst nightmares? Unhappy, unsettling smiles--like those in Todd Phillips' Joker or the truly chilling masks donned by Ethan Hawke in last year's The Black Phone--appear here to stay as fixtures of the horror genre. Paramount's new flick, directed by Parker Finn, makes the concept its very premise, with the movie following a psychiatrist plagued by smiles everywhere she turns. Baseball fans got a taste of her strife thanks to a stunt marketing campaign for Smile featuring actors smiling creepily behind the dugout. Whereas the 2018 horror film Truth or Dare used CGI to stretch the smiles of the actors like a Snapchat filter befitting the uncanny valley, the smiles in Smile for the most part do not appear digitally modified, by my estimate.


Google Summer of Code 2022

#artificialintelligence

These SMILES can be analysed using the RDKit library to get information about the atoms and bonds in the molecules. Molecular fingerprinting is a vectorized representation of molecules capturing precise details of atomic configurations. During the featurization process, a molecule is decomposed into substructures (e.g., fragments) of a fixed-length binary fingerprint assembled into an array whose each element is either 1 or 0. For this project, I implemented atomic and bond-level featurization and molecule-level (global) featurization in DeepChem, specific to D-MPNN model requirements. The D-MPNN paper [1] suggested 133 features for each atom and 14 features for each bond in a molecule. The individual features are extracted from SMILES using RDKit library and one-hot encoded to get vectorized representation.



10 Uses of Facial Recognition Technology - Disruption Hub

#artificialintelligence

For most of us at the moment our only contact with FaceTech is likely to be at electronic passport gates or finding novelty filters on Snapchat. However, like it or not, our faces seem to be becoming an increasingly important tool for accessing possessions and information, as well as enabling different sectors to learn more about consumer markets. Wild wide ranging implications on ethics and business, in future, how might industries – not to mention everyday people – use FaceTech to their advantage? Today, essentially all phones, laptops and tablets are fitted with cameras. Say, for instance, your phone was stolen.