AITopics | Born, Jannis

Collaborating Authors

Born, Jannis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling

Egli, Eric, Manica, Matteo, Born, Jannis

arXiv.org Artificial IntelligenceFeb-20-2025

Bytes form the basis of the digital world and thus are a promising building block for multimodal foundation models. Recently, Byte Language Models (BLMs) have emerged to overcome tokenization, yet the excessive length of bytestreams requires new architectural paradigms. Therefore, we present the Multiscale Byte Language Model (MBLM), a model-agnostic hierarchical decoder stack that allows training with context windows of $5$M bytes on single GPU in full model precision. We thoroughly examine MBLM's performance with Transformer and Mamba blocks on both unimodal and multimodal tasks. Our experiments demonstrate that hybrid architectures are efficient in handling extremely long byte sequences during training while achieving near-linear generational efficiency. To the best of our knowledge, we present the first evaluation of BLMs on visual Q\&A tasks and find that, despite serializing images and the absence of an encoder, a MBLM with pure next token prediction can match custom CNN-LSTM architectures with designated classification heads. We show that MBLMs exhibit strong adaptability in integrating diverse data representations, including pixel and image filestream bytes, underlining their potential toward omnimodal foundation models. Source code is publicly available at: https://github.com/ai4sd/multiscale-byte-lm

byte, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.14553

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models

Zausinger, Jonas, Pennig, Lars, Chlodny, Kacper, Limbach, Vincent, Ketteler, Anna, Prein, Thorben, Singh, Vishwa Mohan, Danziger, Michael Morris, Born, Jannis

arXiv.org Artificial IntelligenceNov-4-2024

While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving reasoning over quantities, especially arithmetics. This has particular relevance in scientific datasets where combinations of text and numerical data are abundant. One fundamental limitation is the nature of the CE loss, which assumes a nominal (categorical) scale and thus cannot convey proximity between generated number tokens. As a remedy, we here present two versions of a number token loss. The first is based on an $L_p$ loss between the ground truth token value and the weighted sum of the predicted class probabilities. The second loss minimizes the Wasserstein-1 distance between the distribution of the predicted output probabilities and the ground truth distribution. These regression-like losses can easily be added to any language model and extend the CE objective during training. We compare the proposed schemes on a mathematics dataset against existing tokenization, encoding, and decoding schemes for improving number representation in language models. Our results reveal a significant improvement in numerical accuracy when equipping a standard T5 model with the proposed loss schemes.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2411.02083

Country:

Europe > Germany (0.15)
Europe > Switzerland (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Quantum Theory and Application of Contextual Optimal Transport

Mariella, Nicola, Akhriev, Albert, Tacchino, Francesco, Zoufal, Christa, Gonzalez-Espitia, Juan Carlos, Harsanyi, Benedek, Koskin, Eugene, Tavernelli, Ivano, Woerner, Stefan, Rapsomaniki, Marianna, Zhuk, Sergiy, Born, Jannis

arXiv.org Artificial IntelligenceJun-3-2024

Optimal Transport (OT) has fueled machine learning (ML) across many domains. When paired data measurements $(\boldsymbol{\mu}, \boldsymbol{\nu})$ are coupled to covariates, a challenging conditional distribution learning setting arises. Existing approaches for learning a $\textit{global}$ transport map parameterized through a potentially unseen context utilize Neural OT and largely rely on Brenier's theorem. Here, we propose a first-of-its-kind quantum computing formulation for amortized optimization of contextualized transportation plans. We exploit a direct link between doubly stochastic matrices and unitary operators thus unravelling a natural connection between OT and quantum computation. We verify our method (QontOT) on synthetic and real data by predicting variations in cell type distributions conditioned on drug dosage. Importantly we conduct a 24-qubit hardware experiment on a task challenging for classical computers and report a performance that cannot be matched with our classical neural OT approach. In sum, this is a first step toward learning to predict contextualized transportation plans through quantum computing.

artificial intelligence, machine learning, matrix, (15 more...)

arXiv.org Artificial Intelligence

2402.14991

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Language models in molecular discovery

Janakarajan, Nikita, Erdmann, Tim, Swaminathan, Sarath, Laino, Teodoro, Born, Jannis

arXiv.org Artificial IntelligenceSep-28-2023

The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.16235

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Overview (0.48)
Research Report (0.40)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unifying Molecular and Textual Representations via Multi-task Language Modelling

Christofidellis, Dimitrios, Giannone, Giorgio, Born, Jannis, Winther, Ole, Laino, Teodoro, Manica, Matteo

arXiv.org Artificial IntelligenceMay-17-2023

The recent advances in neural language models have also been successfully applied to the field of chemistry, offering generative solutions for classical problems in molecular design and synthesis planning. These new methods have the potential to fuel a new era of data-driven automation in scientific discovery. However, specialized models are still typically required for each task, leading to the need for problem-specific fine-tuning and neglecting task interrelations. The main obstacle in this field is the lack of a unified representation between natural language and chemical representations, complicating and limiting human-machine interaction. Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains. Our model can handle chemical and natural language concurrently, without requiring expensive pre-training on single domains or task-specific models. Interestingly, sharing weights across domains remarkably improves our model when benchmarked against state-of-the-art baselines on single-domain and cross-domain tasks. In particular, sharing information across domains and tasks gives rise to large improvements in cross-domain tasks, the magnitude of which increase with scale, as measured by more than a dozen of relevant metrics. Our work suggests that such models can robustly and efficiently accelerate discovery in physical sciences by superseding problem-specific fine-tuning and enhancing human-model interactions.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.12586

Country:

North America > United States > Michigan (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accelerating Material Design with the Generative Toolkit for Scientific Discovery

Manica, Matteo, Born, Jannis, Cadow, Joris, Christofidellis, Dimitrios, Dave, Ashish, Clarke, Dean, Teukam, Yves Gaetan Nana, Giannone, Giorgio, Hoffman, Samuel C., Buchan, Matthew, Chenthamarakshan, Vijil, Donovan, Timothy, Hsu, Hsiang Han, Zipoli, Federico, Schilter, Oliver, Kishimoto, Akihiro, Hamada, Lisa, Padhi, Inkit, Wehden, Karl, McHugh, Lauren, Khrabrov, Alexy, Das, Payel, Takeda, Seiji, Smith, John R.

arXiv.org Artificial IntelligenceJan-31-2023

The rapid technological progress in the last centuries has been largely fueled by the success of the scientific method. However, in some of the most important fields, such as material or drug discovery, the productivity has been decreasing dramatically (Smietana et al., 2016) and by today it can take almost a decade to discover a new material and cost upwards of $10-$100 million. One of the most daunting challenges in materials discovery is hypothesis generation. The reservoir of natural products and their derivatives has been largely emptied (Atanasov et al., 2021) and bottom-up human-driven hypotheses have shown that it is extremely challenging to identify and select novel and useful candidates in search spaces that are overwhelming in size, e.g., the chemical space for drug-like molecules is estimated to contain > 10

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41524-023-01028-1

2207.03928

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

Domain-agnostic and Multi-level Evaluation of Generative Models

Tadesse, Girmaw Abebe, Born, Jannis, Cintas, Celia, Ogallo, William, Zubarev, Dmitry, Manica, Matteo, Weldemariam, Komminist

arXiv.org Artificial IntelligenceJan-20-2023

Machine Learning (ML) methods, particularly generative models, are effective in addressing critical problems across different domains, which includes material sciences. Examples include the design of novel molecules by combining data-driven techniques and domain knowledge to efficiently search the space of all plausible molecules and generate new and valid ones [1, 2, 3, 4]. Traditional high-throughput wet-lab experiments, physics-based simulations, and bioinformatics tools for the molecular design process heavily depend on human expertise. These processes require significant resource expenditure to propose, synthesize and test new molecules, thereby limiting the exploration space [5, 6, 7]. For example, generative models have been applied to facilitate the material discovery process by employing inverse molecular design problem. This approach transforms the conventional and slow discovery process by mapping the desired set of properties to a set of structures. The generative process is then optimized to encourage the generation of molecules with those selected properties. Countless approaches have been suggested for such tasks, most prominently VAEs with different sampling techniques [8, 9, 10]), GANs [11, 12], diffusion models [13], flow networks [14] and Transformers [15].

artificial intelligence, generative model, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2301.0875

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.87)

Add feedback

TITAN: T Cell Receptor Specificity Prediction with Bimodal Attention Networks

Weber, Anna, Born, Jannis, Martínez, María Rodríguez

arXiv.org Artificial IntelligenceApr-21-2021

Motivation: The activity of the adaptive immune system is governed by T-cells and their specific T-cell receptors (TCR), which selectively recognize foreign antigens. Recent advances in experimental techniques have enabled sequencing of TCRs and their antigenic targets (epitopes), allowing to research the missing link between TCR sequence and epitope binding specificity. Scarcity of data and a large sequence space make this task challenging, and to date only models limited to a small set of epitopes have achieved good performance. Here, we establish a k-nearest-neighbor (K-NN) classifier as a strong baseline and then propose TITAN (Tcr epITope bimodal Attention Networks), a bimodal neural network that explicitly encodes both TCR sequences and epitopes to enable the independent study of generalization capabilities to unseen TCRs and/or epitopes. Results: By encoding epitopes at the atomic level with SMILES sequences, we leverage transfer learning and data augmentation to enrich the input data space and boost performance. TITAN achieves high performance in the prediction of specificity of unseen TCRs (ROC-AUC 0.87 in 10-fold CV) and surpasses the results of the current state-of-the-art (ImRex) by a large margin. Notably, our Levenshtein-distance-based K-NN classifier also exhibits competitive performance on unseen TCRs. While the generalization to unseen epitopes remains challenging, we report two major breakthroughs. First, by dissecting the attention heatmaps, we demonstrate that the sparsity of available epitope data favors an implicit treatment of epitopes as classes. This may be a general problem that limits unseen epitope performance for sufficiently complex models. Second, we show that TITAN nevertheless exhibits significantly improved performance on unseen epitopes and is capable of focusing attention on chemically meaningful molecular structures.

artificial intelligence, epitope, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1093/bioinformatics/btab294

2105.03323

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PaccMann$^{RL}$ on SARS-CoV-2: Designing antiviral candidates with conditional generative models

Born, Jannis, Manica, Matteo, Cadow, Joris, Markert, Greta, Mill, Nil Adell, Filipavicius, Modestas, Martínez, María Rodríguez

arXiv.org Machine LearningJul-6-2020

With the fast development of COVID-19 into a global pandemic, scientists around the globe are desperately searching for effective antiviral therapeutic agents. Bridging systems biology and drug discovery, we propose a deep learning framework for conditional de novo design of antiviral candidate drugs tailored against given protein targets. First, we train a multimodal ligand--protein binding affinity model on predicting affinities of antiviral compounds to target proteins and couple this model with pharmacological toxicity predictors. Exploiting this multi-objective as a reward function of a conditional molecular generator (consisting of two VAEs), we showcase a framework that navigates the chemical space toward regions with more antiviral molecules. Specifically, we explore a challenging setting of generating ligands against unseen protein targets by performing a leave-one-out-cross-validation on 41 SARS-CoV-2-related target proteins. Using deep RL, it is demonstrated that in 35 out of 41 cases, the generation is biased towards sampling more binding ligands, with an average increase of 83% comparing to an unbiased VAE. We present a case-study on a potential Envelope-protein inhibitor and perform a synthetic accessibility assessment of the best generated molecules is performed that resembles a viable roadmap towards a rapid in-vitro evaluation of potential SARS-CoV-2 inhibitors.

deep learning, immunology, molecule, (21 more...)

arXiv.org Machine Learning

2005.13285

Country: Europe > Switzerland > Zürich > Zürich (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reinforcement learning-driven de-novo design of anticancer compounds conditioned on biomolecular profiles

Born, Jannis, Manica, Matteo, Oskooei, Ali, Martínez, María Rodríguez

arXiv.org Machine LearningAug-29-2019

With the advent of deep generative models in computational chemistry, in silico anticancer drug design has undergone an unprecedented transformation. While state-of-the-art deep learning approaches have shown potential in generating compounds with desired chemical properties, they entirely overlook the genetic profile and properties of the target disease. In the case of cancer, this is problematic since it is a highly genetic disease in which the biomolecular profile of target cells determines the response to therapy. Here, we introduce the first deep generative model capable of generating anticancer compounds given a target biomolecular profile. Using a reinforcement learning framework, the transcriptomic profile of cancer cells is used as a context in which anticancer molecules are generated and optimized to obtain effective compounds for the given profile. Our molecule generator combines two pretrained variational autoencoders (VAEs) and a multimodal efficacy predictor - the first VAE generates transcriptomic profiles while the second conditional VAE generates novel molecular structures conditioned on the given transcriptomic profile. The efficacy predictor is used to optimize the generated molecules through a reward determined by the predicted IC50 drug sensitivity for the generated molecule and the target profile. We demonstrate how the molecule generation can be biased towards compounds with high inhibitory effect against individual cell lines or specific cancer sites. We verify our approach by investigating candidate drugs generated against specific cancer types and investigate their structural similarity to existing compounds with known efficacy against these cancer types. We envision our approach to transform in silico anticancer drug design by increasing success rates in lead compound discovery via leveraging the biomolecular characteristics of the disease.

compound, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

1909.05114

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

Add feedback