AITopics | Shehu, Amarda

Collaborating Authors

Shehu, Amarda

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula

Blouir, Sam, Smith, Jimmy T. H., Anastasopoulos, Antonios, Shehu, Amarda

arXiv.org Artificial IntelligenceNov-6-2024

Efficient state space models (SSMs), such as linear recurrent neural networks and linear attention variants, offer computational advantages over Transformers but struggle with tasks requiring long-range in-context retrieval-like text copying, associative recall, and question answering over long contexts. Previous efforts to address these challenges have focused on architectural modifications, often reintroducing computational inefficiencies. In this paper, we propose a novel training procedure, Birdie, that significantly enhances the in-context retrieval capabilities of SSMs without altering their architecture. Our approach combines bidirectional input processing with dynamic mixtures of specialized pre-training objectives, optimized via reinforcement learning. We introduce a new bidirectional SSM architecture that seamlessly transitions from bidirectional context processing to causal generation. Experimental evaluations demonstrate that Birdie markedly improves performance on retrieval-intensive tasks such as multi-number phone book lookup, long paragraph question-answering, and infilling. This narrows the performance gap with Transformers, while retaining computational efficiency. Our findings highlight the importance of training procedures in leveraging the fixed-state capacity of SSMs, offering a new direction to advance their capabilities. All code and pre-trained models are available at https://www.github.com/samblouir/birdie, with support for JAX and PyTorch.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.0103

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accounting for Work Zone Disruptions in Traffic Flow Forecasting

Lu, Yuanjie, Shehu, Amarda, Lattanzi, David

arXiv.org Artificial IntelligenceJul-16-2024

Traffic speed forecasting is an important task in intelligent transportation system management. The objective of much of the current computational research is to minimize the difference between predicted and actual speeds, but information modalities other than speed priors are largely not taken into account. In particular, though state of the art performance is achieved on speed forecasting with graph neural network methods, these methods do not incorporate information on roadway maintenance work zones and their impacts on predicted traffic flows; yet, the impacts of construction work zones are of significant interest to roadway management agencies, because they translate to impacts on the local economy and public well-being. In this paper, we build over the convolutional graph neural network architecture and present a novel ``Graph Convolutional Network for Roadway Work Zones" model that includes a novel data fusion mechanism and a new heterogeneous graph aggregation methodology to accommodate work zone information in spatio-temporal dependencies among traffic states. The model is evaluated on two data sets that capture traffic flows in the presence of work zones in the Commonwealth of Virginia. Extensive comparative evaluation and ablation studies show that the proposed model can capture complex and nonlinear spatio-temporal relationships across a transportation corridor, outperforming baseline models, particularly when predicting traffic flow during a workzone event.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2407.11407

Country: North America > United States > Virginia (0.49)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (1.00)
Consumer Products & Services > Travel (1.00)
Transportation > Ground > Road (0.69)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms

Inan, Toki Tahmid, Liu, Mingrui, Shehu, Amarda

arXiv.org Artificial IntelligenceMar-1-2024

Despite an extensive body of literature on deep learning optimization, our current understanding of what makes an optimization algorithm effective is fragmented. In particular, we do not understand well whether enhanced optimization translates to improved generalizability. Current research overlooks the inherent stochastic nature of stochastic gradient descent (SGD) and its variants, resulting in a lack of comprehensive benchmarking and insight into their statistical performance. This paper aims to address this gap by adopting a novel approach. Rather than solely evaluating the endpoint of individual optimization trajectories, we draw from an ensemble of trajectories to estimate the stationary distribution of stochastic optimizers. Our investigation encompasses a wide array of techniques, including SGD and its variants, flat-minima optimizers, and new algorithms we propose under the Basin Hopping framework. Through our evaluation, which encompasses synthetic functions with known minima and real-world problems in computer vision and natural language processing, we emphasize fair benchmarking under a statistical framework, comparing stationary distributions and establishing statistical significance. Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD, noise-enabled variants, and novel optimizers utilizing the BH framework. Notably, these algorithms demonstrate performance on par with flat-minima optimizers like SAM, albeit with half the gradient evaluations. We anticipate that our work will catalyze further exploration in deep learning optimization, encouraging a shift away from single-model approaches towards methodologies that acknowledge and leverage the stochastic nature of optimizers.

artificial intelligence, machine learning, optimizer, (18 more...)

arXiv.org Artificial Intelligence

2403.00574

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Traffic Flow Forecasting with Maintenance Downtime via Multi-Channel Attention-Based Spatio-Temporal Graph Convolutional Networks

Lu, Yuanjie, Kamranfar, Parastoo, Lattanzi, David, Shehu, Amarda

arXiv.org Artificial IntelligenceOct-4-2021

Forecasting traffic flows is a central task in intelligent transportation system management. Graph structures have shown promise as a modeling framework, with recent advances in spatio-temporal modeling via graph convolution neural networks, improving the performance or extending the prediction horizon on traffic flows. However, a key shortcoming of state-of-the-art methods is their inability to take into account information of various modalities, for instance the impact of maintenance downtime on traffic flows. This is the issue we address in this paper. Specifically, we propose a novel model to predict traffic speed under the impact of construction work. The model is based on the powerful attention-based spatio-temporal graph convolution architecture but utilizes various channels to integrate different sources of information, explicitly builds spatio-temporal dependencies among traffic states, captures the relationships between heterogeneous roadway networks, and then predicts changes in traffic flow resulting from maintenance downtime events. The model is evaluated on two benchmark datasets and a novel dataset we have collected over the bustling Tyson's corner region in Northern Virginia. Extensive comparative experiments and ablation studies show that the proposed model can capture complex and nonlinear spatio-temporal relationships across a transportation corridor, outperforming baseline models.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2110.01535

Country: North America > United States > Virginia (0.35)

Genre: Research Report > Promising Solution (0.54)

Industry:

Consumer Products & Services > Travel (1.00)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generating Tertiary Protein Structures via an Interpretative Variational Autoencoder

Guo, Xiaojie, Du, Yuanqi, Tadepalli, Sivani, Zhao, Liang, Shehu, Amarda

arXiv.org Machine LearningJun-16-2021

Much scientific enquiry across disciplines is founded upon a mechanistic treatment of dynamic systems that ties form to function. A highly visible instance of this is in molecular biology, where an important goal is to determine functionally-relevant forms/structures that a protein molecule employs to interact with molecular partners in the living cell. This goal is typically pursued under the umbrella of stochastic optimization with algorithms that optimize a scoring function. Research repeatedly shows that current scoring function, though steadily improving, correlate weakly with molecular activity. Inspired by recent momentum in generative deep learning, this paper proposes and evaluates an alternative approach to generating functionally-relevant three-dimensional structures of a protein. Though typically deep generative models struggle with highly-structured data, the work presented here circumvents this challenge via graph-generative models. A comprehensive evaluation of several deep architectures shows the promise of generative models in directly revealing the latent space for sampling novel tertiary structures, as well as in highlighting axes/factors that carry structural meaning and open the black box often associated with deep models. The work presented here is a first step towards interpretative, deep generative models becoming viable and informative complementary approaches to protein structure prediction.

contact map, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2004.07119

Country: North America > United States (0.48)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Interpretable Deep Graph Generation with Node-Edge Co-Disentanglement

Guo, Xiaojie, Zhao, Liang, Qin, Zhao, Wu, Lingfei, Shehu, Amarda, Ye, Yanfang

arXiv.org Machine LearningJun-9-2020

Disentangled representation learning has recently attracted a significant amount of attention, particularly in the field of image representation learning. However, learning the disentangled representations behind a graph remains largely unexplored, especially for the attributed graph with both node and edge features. Disentanglement learning for graph generation has substantial new challenges including 1) the lack of graph deconvolution operations to jointly decode node and edge attributes; and 2) the difficulty in enforcing the disentanglement among latent factors that respectively influence: i) only nodes, ii) only edges, and iii) joint patterns between them. To address these challenges, we propose a new disentanglement enhancement framework for deep generative models for attributed graphs. In particular, a novel variational objective is proposed to disentangle the above three types of latent factors, with novel architecture for node and edge deconvolutions. Moreover, within each type, individual-factor-wise disentanglement is further enhanced, which is shown to be a generalization of the existing framework for images. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed model and its extensions.

deep learning, disentanglement, neural network, (20 more...)

arXiv.org Machine Learning

doi: 10.1145/3394486.3403221

2006.05385

Country: North America > United States > New York (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

The AAAI-13 Conference Workshops

AI MagazineJan-10-2014

The AAAI-13 Workshop Program, a part of the 27th AAAI Conference on Artificial Intelligence, was held Sunday and Monday, July 14‚Äì15, 2013 at the Hyatt Regency Bellevue Hotel in Bellevue, Washington, USA.

artificial intelligence, Computer Engineering, management and information, (2 more...)

AI Magazine

Industry: Information Technology (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

The AAAI-13 Conference Workshops

AI MagazineJan-10-2014

Benjamin Grosof (Coherent Knowledge from episodic memory to great progress is being made on methods Systems) on representing activity create semantic memory, using a combination to solve problems related to structure context through semantic rule methods, of semantic memory and prediction, motion simulation, deriving from experience in the episodic memory to guide users?

constraint-based reasoning, health & medicine, workshop, (19 more...)

AI Magazine

Country:

North America > United States > California (0.28)
North America > Canada > Alberta (0.28)
Europe > Germany > Bremen > Bremen (0.14)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Consumer Health (0.95)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback