AITopics | Khan, Muhammad Haris

Collaborating Authors

Khan, Muhammad Haris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

Lykov, Artem, Serpiva, Valerii, Khan, Muhammad Haris, Sautenkov, Oleg, Myshlyaev, Artyom, Tadevosyan, Grik, Yaqoot, Yasheerah, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMar-3-2025

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive T ask Solving and Reasoning in UA Vs Artem Lykov, V alerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Y asheerah Y aqoot, and Dzmitry Tsetserukou Abstract -- This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial V ehicles (UA Vs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories--Human Recognition, Symbol Understanding, and Reasoning--the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. T o further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UA V control systems. Our contributions include the development of a state-of-the-art VLA model for UA V control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations.

available, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.01378

Country: Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Robotics & Automation (0.48)
Transportation > Air (0.34)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.67)
(2 more...)

Add feedback

Evolution 6.0: Evolving Robotic Capabilities Through Generative Design

Khan, Muhammad Haris, Myshlyaev, Artyom, Lykov, Artem, Cabrera, Miguel Altamirano, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceFeb-25-2025

We propose a new concept, Evolution 6.0, which represents the evolution of robotics driven by Generative AI. When a robot lacks the necessary tools to accomplish a task requested by a human, it autonomously designs the required instruments and learns how to use them to achieve the goal. Evolution 6.0 is an autonomous robotic system powered by Vision-Language Models (VLMs), Vision-Language Action (VLA) models, and Text-to-3D generative models for tool design and task execution. The system comprises two key modules: the Tool Generation Module, which fabricates task-specific tools from visual and textual data, and the Action Generation Module, which converts natural language instructions into robotic actions. It integrates QwenVLM for environmental understanding, OpenVLA for task execution, and Llama-Mesh for 3D tool generation. Evaluation results demonstrate a 90% success rate for tool generation with a 10-second inference time, and action generation achieving 83.5% in physical and visual generalization, 70% in motion generalization, and 37% in semantic generalization. Future improvements will focus on bimanual manipulation, expanded task capabilities, and enhanced environmental interpretation to improve real-world adaptability.

artificial intelligence, evolving robotic capability, natural language, (2 more...)

arXiv.org Artificial Intelligence

2502.17034

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.53)

Add feedback

GazeGrasp: DNN-Driven Robotic Grasping with Wearable Eye-Gaze Interface

Tokmurziyev, Issatay, Cabrera, Miguel Altamirano, Moreno, Luis, Khan, Muhammad Haris, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-14-2025

Abstract--We present GazeGrasp, a gaze-based manipulation system enabling individuals with motor impairments to control collaborative robots using eye-gaze. The system employs an ESP32 CAM for eye tracking, MediaPipe for gaze detection, and YOLOv8 for object localization, integrated with a Universal Robot UR10 for manipulation tasks. After user-specific calibration, the system allows intuitive object selection with a magnetic snapping effect and robot control via eye gestures. Experimental evaluation involving 13 participants demonstrated that the magnetic snapping effect significantly reduced gaze alignment time, improving task efficiency by 31%. GazeGrasp provides a robust, hands-free interface for assistive robotics, enhancing accessibility and autonomy for users.

artificial intelligence, interface, robot, (16 more...)

arXiv.org Artificial Intelligence

2501.07255

Country: Europe > Russia (0.17)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking

Liu, Yu, Mahmood, Arif, Khan, Muhammad Haris

arXiv.org Artificial IntelligenceOct-27-2024

Many current visual object tracking benchmarks such as OTB100, NfS, UAV123, LaSOT, and GOT-10K, predominantly contain day-time scenarios while the challenges posed by the night-time has been less investigated. It is primarily because of the lack of a large-scale, well-annotated night-time benchmark for rigorously evaluating tracking algorithms. To this end, this paper presents NT-VOT211, a new benchmark tailored for evaluating visual object tracking algorithms in the challenging night-time conditions. NT-VOT211 consists of 211 diverse videos, offering 211,000 well-annotated frames with 8 attributes including camera motion, deformation, fast motion, motion blur, tiny target, distractors, occlusion and out-of-view. To the best of our knowledge, it is the largest night-time tracking benchmark to-date that is specifically designed to address unique challenges such as adverse visibility, image blur, and distractors inherent to night-time tracking scenarios. Through a comprehensive analysis of results obtained from 42 diverse tracking algorithms on NT-VOT211, we uncover the strengths and limitations of these algorithms, highlighting opportunities for enhancements in visual object tracking, particularly in environments with suboptimal lighting. Besides, a leaderboard for revealing performance rankings, annotation tools, comprehensive meta-information and all the necessary code for reproducibility of results is made publicly available. We believe that our NT-VOT211 benchmark will not only be instrumental in facilitating field deployment of VOT algorithms, but will also help VOT enhancements and it will unlock new real-world tracking applications. Our dataset and other assets can be found at: {https://github.com/LiuYuML/NV-VOT211.

artificial intelligence, machine learning, nt-vot211, (12 more...)

arXiv.org Artificial Intelligence

2410.20421

Country:

Europe (0.93)
Asia (0.67)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Towards Combating Frequency Simplicity-biased Learning for Domain Generalization

He, Xilin, Hu, Jingyu, Lin, Qinliang, Luo, Cheng, Xie, Weicheng, Song, Siyang, Khan, Muhammad Haris, Shen, Linlin

arXiv.org Artificial IntelligenceOct-21-2024

Domain generalization methods aim to learn transferable knowledge from source domains that can generalize well to unseen target domains. Recent studies show that neural networks frequently suffer from a simplicity-biased learning behavior which leads to over-reliance on specific frequency sets, namely as frequency shortcuts, instead of semantic information, resulting in poor generalization performance. Despite previous data augmentation techniques successfully enhancing generalization performances, they intend to apply more frequency shortcuts, thereby causing hallucinations of generalization improvement. In this paper, we aim to prevent such learning behavior of applying frequency shortcuts from a data-driven perspective. Given the theoretical justification of models' biased learning behavior on different spatial frequency components, which is based on the dataset frequency properties, we argue that the learning behavior on various frequency components could be manipulated by changing the dataset statistical structure in the Fourier domain. Intuitively, as frequency shortcuts are hidden in the dominant and highly dependent frequencies of dataset structure, dynamically perturbating the over-reliance frequency components could prevent the application of frequency shortcuts. To this end, we propose two effective data augmentation modules designed to collaboratively and adaptively adjust the frequency characteristic of the dataset, aiming to dynamically influence the learning behavior of the model and ultimately serving as a strategy to mitigate shortcut learning. Code is available at AdvFrequency.

artificial intelligence, frequency shortcut, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.16146

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification

Hassan, Salma, Hammadi, Hamad Al, Mohammed, Ibrahim, Khan, Muhammad Haris

arXiv.org Artificial IntelligenceSep-27-2024

The early detection and nuanced subtype classification of non-small cell lung cancer (NSCLC), a predominant cause of cancer mortality worldwide, is a critical and complex issue. In this paper, we introduce an innovative integration of multi-modal data, synthesizing fused medical imaging (CT and PET scans) with clinical health records and genomic data. This unique fusion methodology leverages advanced machine learning models, notably MedClip and BEiT, for sophisticated image feature extraction, setting a new standard in computational oncology. Our research surpasses existing approaches, as evidenced by a substantial enhancement in NSCLC detection and classification precision. The results showcase notable improvements across key performance metrics, including accuracy, precision, recall, and F1-score. Specifically, our leading multi-modal classifier model records an impressive accuracy of 94.04%. We believe that our approach has the potential to transform NSCLC diagnostics, facilitating earlier detection and more effective treatment planning and, ultimately, leading to superior patient outcomes in lung cancer care.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICIP51287.2024.10648275

2409.18715

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning

Munir, Ans, Qureshi, Faisal Z., Khan, Muhammad Haris, Ali, Mohsen

arXiv.org Artificial IntelligenceJul-18-2024

Compositional Zero-Shot Learning (CZSL) aims to predict unknown compositions made up of attribute and object pairs. Predicting compositions unseen during training is a challenging task. We are exploring Open World Compositional Zero-Shot Learning (OW-CZSL) in this study, where our test space encompasses all potential combinations of attributes and objects. Our approach involves utilizing the self-attention mechanism between attributes and objects to achieve better generalization from seen to unseen compositions. Utilizing a self-attention mechanism facilitates the model's ability to identify relationships between attribute and objects. The similarity between the self-attended textual and visual features is subsequently calculated to generate predictions during the inference phase. The potential test space may encompass implausible object-attribute combinations arising from unrestricted attribute-object pairings. To mitigate this issue, we leverage external knowledge from ConceptNet to restrict the test space to realistic compositions. Our proposed model, Attention-based Simple Primitives (ASP), demonstrates competitive performance, achieving results comparable to the state-of-the-art.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.13715

Country:

North America > Canada > Ontario (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)

Add feedback

Multiclass Alignment of Confidence and Certainty for Network Calibration

Kugathasan, Vinith, Khan, Muhammad Haris

arXiv.org Artificial IntelligenceSep-5-2023

Deep neural networks (DNNs) have made great strides in pushing the state-of-the-art in several challenging domains. Recent studies reveal that they are prone to making overconfident predictions. This greatly reduces the overall trust in model predictions, especially in safety-critical applications. Early work in improving model calibration employs post-processing techniques which rely on limited parameters and require a hold-out set. Some recent train-time calibration methods, which involve all model parameters, can outperform the postprocessing methods. To this end, we propose a new train-time calibration method, which features a simple, plug-and-play auxiliary loss known as multi-class alignment of predictive mean confidence and predictive certainty (MACC). It is based on the observation that a model miscalibration is directly related to its predictive certainty, so a higher gap between the mean confidence and certainty amounts to a poor calibration both for in-distribution and out-of-distribution predictions. Armed with this insight, our proposed loss explicitly encourages a confident (or underconfident) model to also provide a low (or high) spread in the presoftmax distribution. Extensive experiments on ten challenging datasets, covering in-domain, out-domain, non-visual recognition and medical image classification scenarios, show that our method achieves state-of-the-art calibration performance for both in-domain and out-domain predictions. Our code and models will be publicly released.

artificial intelligence, confidence and certainty, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2309.02636

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Sensing and Signal Processing > Image Processing (0.53)

Add feedback

Unsupervised Deep Graph Matching Based on Cycle Consistency

Tourani, Siddharth, Rother, Carsten, Khan, Muhammad Haris, Savchynskyy, Bogdan

arXiv.org Artificial IntelligenceAug-1-2023

We contribute to the sparsely populated area of unsupervised deep graph matching with application to keypoint matching in images. Contrary to the standard \emph{supervised} approach, our method does not require ground truth correspondences between keypoint pairs. Instead, it is self-supervised by enforcing consistency of matchings between images of the same object category. As the matching and the consistency loss are discrete, their derivatives cannot be straightforwardly used for learning. We address this issue in a principled way by building our method upon the recent results on black-box differentiation of combinatorial solvers. This makes our method exceptionally flexible, as it is compatible with arbitrary network architectures and combinatorial solvers. Our experimental evaluation suggests that our technique sets a new state-of-the-art for unsupervised graph matching.

artificial intelligence, machine learning, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

2307.0893

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.34)

Add feedback