AITopics | Alam, Nahid

Collaborating Authors

Alam, Nahid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Maya: An Instruction Finetuned Multilingual Multimodal Model

Alam, Nahid, Kanjula, Karthik Reddy, Guthikonda, Surya, Chung, Timothy, Vegesna, Bala Krishna S, Das, Abhipsha, Susevski, Anthony, Chan, Ryan Sze-Yin, Uddin, S M Iftekhar, Islam, Shayekh Bin, Santhosh, Roshan, A, Snegha, Sharma, Drishti, Liu, Chen, Chaturvedi, Isha, Winata, Genta Indra, S, Ashvanth., Mukherjee, Snehanshu, Aji, Alham Fikri

arXiv.org Artificial IntelligenceDec-9-2024

The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to understand low-resource languages and cultural nuances in a manner free from toxicity. To address these limitations, we introduce Maya, an open-source Multimodal Multilingual model. Our contributions are threefold: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; 2) a thorough analysis of toxicity within the LLaVA dataset, followed by the creation of a novel toxicity-free version across eight languages; and 3) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.07112

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report (0.50)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Embedding Geometries of Contrastive Language-Image Pre-Training

Chou, Jason Chuan-Chih, Alam, Nahid

arXiv.org Artificial IntelligenceSep-19-2024

Since the publication of CLIP, the approach of using InfoNCE loss for contrastive pre-training has become widely popular for bridging two or more modalities. Despite its wide adoption, CLIP's original design choices of L2 normalization and cosine similarity logit have rarely been revisited. We have systematically experimented with alternative geometries and softmax logits for language-image pre-training and identified that variants with intuitive Euclidean geometry, Euclidean CLIP (EuCLIP), match or exceed the performance of CLIP and support hierarchical relationships at least as well as more complicated hyperbolic alternative.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.13079

Country:

Europe (0.93)
North America > United States > California (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Media > Photography (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Vision Transformers for Mobile Applications: A Short Survey

Alam, Nahid, Kolawole, Steven, Sethi, Simardeep, Bansali, Nishant, Nguyen, Karina

arXiv.org Artificial IntelligenceMay-30-2023

Vision Transformers (ViTs) have demonstrated state-of-the-art performance on many Computer Vision Tasks. Unfortunately, deploying these large-scale ViTs is resource-consuming and impossible for many mobile devices. While most in the community are building for larger and larger ViTs, we ask a completely opposite question: How small can a ViT be within the tradeoffs of accuracy and inference latency that make it suitable for mobile deployment? We look into a few ViTs specifically designed for mobile applications and observe that they modify the transformer's architecture or are built around the combination of CNN and transformer. Recent work has also attempted to create sparse ViT networks and proposed alternatives to the attention module. In this paper, we study these architectures, identify the challenges and analyze what really makes a vision transformer suitable for mobile applications. We aim to serve as a baseline for future research direction and hopefully lay the foundation to choose the exemplary vision transformer architecture for your application running on mobile devices.

artificial intelligence, machine learning, transformer, (15 more...)

arXiv.org Artificial Intelligence

2305.19365

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback