AITopics | Ismail, Aya Abdelsalam

Collaborating Authors

Ismail, Aya Abdelsalam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Concept Bottleneck Language Models For protein design

Ismail, Aya Abdelsalam, Oikarinen, Tuomas, Wang, Amy, Adebayo, Julius, Stanton, Samuel, Joren, Taylor, Kleinhenz, Joseph, Goodman, Allen, Bravo, Héctor Corrada, Cho, Kyunghyun, Frey, Nathan C.

arXiv.org Artificial IntelligenceDec-11-2024

We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can intervene on concept values to precisely control the properties of generated proteins, achieving a 3 times larger change in desired concept values compared to baselines. ii) Interpretability: A linear mapping between concept values and predicted tokens allows transparent analysis of the model's decision-making process. iii) Debugging: This transparency facilitates easy debugging of trained models. Our models achieve pre-training perplexity and downstream task performance comparable to traditional masked protein language models, demonstrating that interpretability does not compromise performance. While adaptable to any language model, we focus on masked protein language models due to their importance in drug discovery and the ability to validate our model's capabilities through real-world experiments and expert knowledge. We scale our CB-pLM from 24 million to 3 billion parameters, making them the largest Concept Bottleneck Models trained and the first capable of generative language modeling.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.0609

Country:

North America > United States > California (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Interpretable Mixture of Experts

Ismail, Aya Abdelsalam, Arik, Sercan Ö., Yoon, Jinsung, Taly, Ankur, Feizi, Soheil, Pfister, Tomas

arXiv.org Artificial IntelligenceMay-25-2023

The need for reliable model explanations is prominent for many machine learning applications, particularly for tabular and time-series data as their use cases often involve high-stakes decision making. Towards this goal, we introduce a novel interpretable modeling framework, Interpretable Mixture of Experts (IME), that yields high accuracy, comparable to `black-box' Deep Neural Networks (DNNs) in many cases, along with useful interpretability capabilities. IME consists of an assignment module and a mixture of experts, with each sample being assigned to a single expert for prediction. We introduce multiple options for IME based on the assignment and experts being interpretable. When the experts are chosen to be interpretable such as linear models, IME yields an inherently-interpretable architecture where the explanations produced by IME are the exact descriptions of how the prediction is computed. In addition to constituting a standalone inherently-interpretable architecture, IME has the premise of being integrated with existing DNNs to offer interpretability to a subset of samples while maintaining the accuracy of the DNNs. Through extensive experiments on 15 tabular and time-series datasets, IME is demonstrated to be more accurate than single interpretable models and perform comparably with existing state-of-the-art DNNs in accuracy. On most datasets, IME even outperforms DNNs, while providing faithful explanations. Lastly, IME's explanations are compared to commonly-used post-hoc explanations methods through a user study -- participants are able to better predict the model behavior when given IME explanations, while finding IME's explanations more faithful and trustworthy.

artificial intelligence, assignment module, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2206.02107

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Industry:

Health & Medicine (1.00)
Banking & Finance (0.93)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Multimodal Accuracy Through Modality Pre-training and Attention

Ismail, Aya Abdelsalam, Hasan, Mahmudul, Ishtiaq, Faisal

arXiv.org Artificial IntelligenceNov-11-2020

Training a multimodal network is challenging and it requires complex architectures to achieve reasonable performance. We show that one reason for this phenomena is the difference between the convergence rate of various modalities. We address this by pre-training modality-specific sub-networks in multimodal architectures independently before end-to-end training of the entire network. Furthermore, we show that the addition of an attention mechanism between sub-networks after pre-training helps identify the most important modality during ambiguous scenarios boosting the performance. We demonstrate that by performing these two tricks a simple network can achieve similar performance to a complicated architecture that is significantly more expensive to train on multiple tasks including sentiment analysis, emotion recognition, and speaker trait recognition.

deep learning, modality, neural network, (19 more...)

arXiv.org Artificial Intelligence

2011.06102

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Benchmarking Deep Learning Interpretability in Time Series Predictions

Ismail, Aya Abdelsalam, Gunady, Mohamed, Bravo, Héctor Corrada, Feizi, Soheil

arXiv.org Machine LearningOct-26-2020

Saliency methods are used extensively to highlight the importance of input features in model predictions. These methods are mostly used in vision and language tasks, and their applications to time series data is relatively unexplored. In this paper, we set out to extensively compare the performance of various saliency-based interpretability methods across diverse neural architectures, including Recurrent Neural Network, Temporal Convolutional Networks, and Transformers in a new benchmark of synthetic time series data. We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision (i.e., whether identified features contain meaningful signals) and recall (i.e., the number of features with signal identified as important). Through several experiments, we show that (i) in general, network architectures and saliency methods fail to reliably and accurately identify feature importance over time in time series data, (ii) this failure is mainly due to the conflation of time and feature domains, and (iii) the quality of saliency maps can be improved substantially by using our proposed two-step temporal saliency rescaling (TSR) approach that first calculates the importance of each time step before calculating the importance of each feature at a time step.

deep learning, neural network, saliency method, (20 more...)

arXiv.org Machine Learning

2010.13924

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Input-Cell Attention Reduces Vanishing Saliency of Recurrent Neural Networks

Ismail, Aya Abdelsalam, Gunady, Mohamed, Pessoa, Luiz, Bravo, Héctor Corrada, Feizi, Soheil

arXiv.org Machine LearningOct-27-2019

Recent efforts to improve the interpretability of deep neural networks use saliency to characterize the importance of input features to predictions made by models. Work on interpretability using saliency-based methods on Recurrent Neural Networks (RNNs) has mostly targeted language tasks, and their applicability to time series data is less understood. In this work we analyze saliency-based methods for RNNs, both classical and gated cell architectures. We show that RNN saliency vanishes over time, biasing detection of salient features only to later time steps and are, therefore, incapable of reliably detecting important features at arbitrary time intervals. To address this vanishing saliency problem, we propose a novel RNN cell structure (input-cell attention), which can extend any RNN cell architecture. At each time step, instead of only looking at the current input vector, input-cell attention uses a fixed-size matrix embedding, each row of the matrix attending to different inputs from current or previous time steps. Using synthetic data, we show that the saliency map produced by the input-cell attention RNN is able to faithfully detect important features regardless of their occurrence in time. We also apply the input-cell attention RNN on a neuroscience task analyzing functional Magnetic Resonance Imaging (fMRI) data for human subjects performing a variety of tasks. In this case, we use saliency to characterize brain regions (input features) for which activity is important to distinguish between tasks. We show that standard RNN architectures are only capable of detecting important brain regions in the last few time steps of the fMRI data, while the input-cell attention model is able to detect important brain region activity across time without latter time step biases.

deep learning, neural network, time step, (22 more...)

arXiv.org Machine Learning

1910.1237

Country: North America (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Long-Horizon Forecasts with Expectation-Biased LSTM Networks

Ismail, Aya Abdelsalam, Wood, Timothy, Bravo, Héctor Corrada

arXiv.org Machine LearningApr-18-2018

State-of-the-art forecasting methods using Recurrent Neural Net- works (RNN) based on Long-Short Term Memory (LSTM) cells have shown exceptional performance targeting short-horizon forecasts, e.g given a set of predictor features, forecast a target value for the next few time steps in the future. However, in many applica- tions, the performance of these methods decays as the forecasting horizon extends beyond these few time steps. This paper aims to explore the challenges of long-horizon forecasting using LSTM networks. Here, we illustrate the long-horizon forecasting problem in datasets from neuroscience and energy supply management. We then propose expectation-biasing, an approach motivated by the literature of Dynamic Belief Networks, as a solution to improve long-horizon forecasting using LSTMs. We propose two LSTM ar- chitectures along with two methods for expectation biasing that significantly outperforms standard practice.

deep learning, lstm, neural network, (24 more...)

arXiv.org Machine Learning

1804.06776

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.64)

Industry:

Energy (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.96)
Government > Regional Government > North America Government > United States Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback