AITopics | Ali, Amin Ahsan

Collaborating Authors

Ali, Amin Ahsan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MIXAD: Memory-Induced Explainable Time Series Anomaly Detection

Kim, Minha, Bhaumik, Kishor Kumar, Ali, Amin Ahsan, Woo, Simon S.

arXiv.org Artificial IntelligenceOct-30-2024

For modern industrial applications, accurately detecting and diagnosing anomalies in multivariate time series data is essential. Despite such need, most state-of-the-art methods often prioritize detection performance over model interpretability. Addressing this gap, we introduce MIXAD (Memory-Induced Explainable Time Series Anomaly Detection), a model designed for interpretable anomaly detection. MIXAD leverages a memory network alongside spatiotemporal processing units to understand the intricate dynamics and topological structures inherent in sensor relationships. We also introduce a novel anomaly scoring method that detects significant shifts in memory activation patterns during anomalies. Our approach not only ensures decent detection performance but also outperforms state-of-the-art baselines by 34.30% and 34.51% in interpretability metrics.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.22735

Country: Asia (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

SSMT: Few-Shot Traffic Forecasting with Single Source Meta-Transfer

Bhaumik, Kishor Kumar, Kim, Minha, Niloy, Fahim Faisal, Ali, Amin Ahsan, Woo, Simon S.

arXiv.org Artificial IntelligenceOct-20-2024

Traffic forecasting in Intelligent Transportation Systems (ITS) is vital for intelligent traffic prediction. Yet, ITS often relies on data from traffic sensors or vehicle devices, where certain cities might not have all those smart devices or enabling infrastructures. Also, recent studies have employed meta-learning to generalize spatial-temporal traffic networks, utilizing data from multiple cities for effective traffic forecasting for data-scarce target cities. However, collecting data from multiple cities can be costly and time-consuming. To tackle this challenge, we introduce Single Source Meta-Transfer Learning (SSMT) which relies only on a single source city for traffic prediction. Our method harnesses this transferred knowledge to enable few-shot traffic forecasting, particularly when the target city possesses limited data. Specifically, we use memory-augmented attention to store the heterogeneous spatial knowledge from the source city and selectively recall them for the data-scarce target city. We extend the idea of sinusoidal positional encoding to establish meta-learning tasks by leveraging diverse temporal traffic patterns from the source city. Moreover, to capture a more generalized representation of the positions we introduced a meta-positional encoding that learns the most optimal representation of the temporal pattern across all the tasks. We experiment on five real-world benchmark datasets to demonstrate that our method outperforms several existing methods in time series traffic prediction.

artificial intelligence, machine learning, spatial reasoning, (17 more...)

arXiv.org Artificial Intelligence

2410.15589

Country:

Asia (0.69)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology (0.93)
Transportation > Infrastructure & Services (0.48)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.87)

Add feedback

DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Ahasan, Md Mubtasim, Fahim, Md, Mohiuddin, Tasnim, Rahman, A K M Mahbubur, Chadha, Aman, Iqbal, Tariq, Amin, M Ashraful, Islam, Md Mofijul, Ali, Amin Ahsan

arXiv.org Artificial IntelligenceOct-19-2024

Recent advancements in speech-language models have yielded significant improvements in speech tokenization and synthesis. However, effectively mapping the complex, multidimensional attributes of speech into discrete tokens remains challenging. Existing speech representations generally fall into two categories: acoustic tokens from audio codecs and semantic tokens from speech self-supervised learning models. Although recent efforts have unified acoustic and semantic tokens for improved performance, they overlook the crucial role of contextual representation in comprehensive speech modeling. Our empirical investigations reveal that the absence of contextual representations results in elevated Word Error Rate (WER) and Word Information Lost (WIL) scores in speech transcriptions. To address these limitations, we propose two novel distillation approaches: (1) a language model (LM)-guided distillation method that incorporates contextual information, and (2) a combined LM and self-supervised speech model (SM)-guided distillation technique that effectively distills multimodal representations (acoustic, semantic, and contextual) into a comprehensive speech tokenizer, termed DM-Codec. The DM-Codec architecture adopts a streamlined encoder-decoder framework with a Residual Vector Quantizer (RVQ) and incorporates the LM and SM during the training process. Experiments show DM-Codec significantly outperforms state-of-the-art speech tokenization models, reducing WER by up to 13.46%, WIL by 9.82%, and improving speech quality by 5.84% and intelligibility by 1.85% on the LibriSpeech benchmark dataset. In recent years, the advent of Large Language Models (LLMs) has revolutionized various domains, offering unprecedented advancements across a wide array of tasks (OpenAI, 2024). A critical component of this success has been the tokenization of input data, enabling vast amounts of information processing (Du et al., 2024; Rust et al., 2021).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.15017

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD

Paul, Ovi, Nayem, Abu Bakar Siddik, Sarker, Anis, Ali, Amin Ahsan, Amin, M Ashraful, Rahman, AKM Mahbubur

arXiv.org Artificial IntelligenceJun-9-2024

Land Use Land Cover (LULC) analysis on satellite images using deep learning-based methods is significantly helpful in understanding the geography, socio-economic conditions, poverty levels, and urban sprawl in developing countries. Recent works involve segmentation with LULC classes such as farmland, built-up areas, forests, meadows, water bodies, etc. Training deep learning methods on satellite images requires large sets of images annotated with LULC classes. However, annotated data for developing countries are scarce due to a lack of funding, absence of dedicated residential/industrial/economic zones, a large population, and diverse building materials. BD-SAT provides a high-resolution dataset that includes pixel-by-pixel LULC annotations for Dhaka metropolitan city and surrounding rural/urban areas. Using a strict and standardized procedure, the ground truth is created using Bing satellite imagery with a ground spatial distance of 2.22 meters per pixel. A three-stage, well-defined annotation process has been followed with support from GIS experts to ensure the reliability of the annotations. We performed several experiments to establish benchmark results. The results show that the annotated BD-SAT is sufficient to train large deep learning models with adequate accuracy for five major LULC classes: forest, farmland, built-up areas, water bodies, and meadows.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2406.05912

Country: Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.62)

Genre: Research Report > New Finding (0.87)

Industry:

Food & Agriculture > Agriculture (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automatic Detection of Natural Disaster Effect on Paddy Field from Satellite Images using Deep Learning Techniques

Ishmam, Tahmid Alavi, Ali, Amin Ahsan, Amin, Md Ahsraful, Rahman, A K M Mahbubur

arXiv.org Artificial IntelligenceApr-2-2023

This paper aims to detect rice field damage from natural disasters in Bangladesh using high-resolution satellite imagery. The authors developed ground truth data for rice field damage from the field level. At first, NDVI differences before and after the disaster are calculated to identify possible crop loss. The areas equal to and above the 0.33 threshold are marked as crop loss areas as significant changes are observed. The authors also verified crop loss areas by collecting data from local farmers. Later, different bands of satellite data (Red, Green, Blue) and (False Color Infrared) are useful to detect crop loss area. We used the NDVI different images as ground truth to train the DeepLabV3plus model. With RGB, we got IoU 0.41 and with FCI, we got IoU 0.51. As FCI uses NIR, Red, Blue bands and NDVI is normalized difference between NIR and Red bands, so greater FCI's IoU score than RGB is expected. But RGB does not perform very badly here. So, where other bands are not available, RGB can use to understand crop loss areas to some extent. The ground truth developed in this paper can be used for segmentation models with very high resolution RGB only images such as Bing, Google etc.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.00622

Country:

Asia > Bangladesh (0.68)
North America > United States > Texas > Ellis County (0.45)

Genre: Research Report (0.50)

Industry:

Government (0.47)
Media (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.34)
Food & Agriculture > Agriculture (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition

Tonmoy, M Tanjid Hasan, Mahmud, Saif, Rahman, A K M Mahbubur, Amin, M Ashraful, Ali, Amin Ahsan

arXiv.org Artificial IntelligenceMar-7-2021

Wearable sensor based human activity recognition is a challenging problem due to difficulty in modeling spatial and temporal dependencies of sensor signals. Recognition models in closed-set assumption are forced to yield members of known activity classes as prediction. However, activity recognition models can encounter an unseen activity due to body-worn sensor malfunction or disability of the subject performing the activities. This problem can be addressed through modeling solution according to the assumption of open-set recognition. Hence, the proposed self attention based approach combines data hierarchically from different sensor placements across time to classify closed-set activities and it obtains notable performance improvement over state-of-the-art models on five publicly available datasets. The decoder in this autoencoder architecture incorporates self-attention based feature representations from encoder to detect unseen activity classes in open-set recognition setting. Furthermore, attention maps generated by the hierarchical model demonstrate explainable selection of features in activity recognition. We conduct extensive leave one subject out validation experiments that indicate significantly improved robustness to noise and subject specific variability in body-worn sensor signals.

artificial intelligence, machine learning, recognition, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-75768-7_28

2103.04279

Country: Asia > Bangladesh (0.28)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback