AITopics | Sharma, Anuj

Collaborating Authors

Sharma, Anuj

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Graph Neural Network based Handwritten Trajectories Recognition

Sharma, Anuj, Singh, Sukhdeep, Ratna, S

arXiv.org Artificial IntelligenceMay-15-2024

One of the Artificial Intelligence (AI) important applications is human handwritten text recognition. The Handwriting Recognition (HWR) refers to recognizing handwriting through machines. The handwritten text scanned and recognized is offline HWR in nature, whereas recognizing while writing is online HWR Singh et al. [2017]. The handwriting trajectory refers to handwritten strokes which are set of sequential pixels in online HWR and set of pixels in offline HWR Pan et al. [2019]. In offline HWR, these trajectories writing orders recovered through drawing order techniques, and in online HWR, the digital pen strokes refer to trajectories Sharma [2015]. In either case, these trajectories are important sources of information to understand and recognize handwriting. Handwriting trajectories are also understood as the paths traced by a writing pen or stylus movements across a writing surface.

artificial intelligence, handwriting recognition, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.09247

Country: Asia > India (0.29)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System

Kumari, Lalita, Singh, Sukhdeep, Rathore, Vaibhav Varish Singh, Sharma, Anuj

arXiv.org Artificial IntelligenceApr-22-2024

The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different font styles of handwritten texts with degradation of historical text images make it a challenging problem. Recognizing scanned document images in neural network-based systems typically involves a two-step approach: segmentation and recognition. However, this method has several drawbacks. These shortcomings encompass challenges in identifying text regions, analyzing layout diversity within pages, and establishing accurate ground truth segmentation. Consequently, these processes are prone to errors, leading to bottlenecks in achieving high recognition accuracies. Thus, in this study, we present an end-to-end paragraph recognition system that incorporates internal line segmentation and gated convolutional layers based encoder. The gating is a mechanism that controls the flow of information and allows to adaptively selection of the more relevant features in handwritten text recognition models. The attention module plays an important role in performing internal line segmentation, allowing the page to be processed line-by-line. During the decoding step, we have integrated a connectionist temporal classification-based word beam search decoder as a post-processing step. In this work, we have extended existing LexiconNet by carefully applying and utilizing gated convolutional layers in the existing deep neural network. Our results at line and page levels also favour our new GatedLexiconNet. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on IAM, 2.76% on RIMES, and 6.52% on READ-2016 datasets.

artificial intelligence, machine learning, recognition, (14 more...)

arXiv.org Artificial Intelligence

2404.14062

Country:

Asia > India (0.46)
Europe (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.83)

Add feedback

The 8th AI City Challenge

Wang, Shuo, Anastasiu, David C., Tang, Zheng, Chang, Ming-Ching, Yao, Yue, Zheng, Liang, Rahman, Mohammed Shaiqur, Arya, Meenakshi S., Sharma, Anuj, Chakraborty, Pranamesh, Prajapati, Sanjita, Kong, Quan, Kobori, Norimasa, Gochoo, Munkhjargal, Otgonbold, Munkh-Erdene, Alnajjar, Fady, Batnasan, Ganzorig, Chen, Ping-Yang, Hsieh, Jun-Wei, Wu, Xunlei, Pusegaonkar, Sameer Satish, Wang, Yizhou, Biswas, Sujit, Chellappa, Rama

arXiv.org Artificial IntelligenceApr-14-2024

The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.09432

Country:

Asia (1.00)
North America > United States > Pennsylvania (0.14)
North America > United States > Michigan (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Transportation > Ground > Road (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An inclusive review on deep learning techniques and their scope in handwriting recognition

Singh, Sukhdeep, Rohilla, Sudhir, Sharma, Anuj

arXiv.org Artificial IntelligenceApr-10-2024

Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For the achievement of state-of-the-art performances in diverse domains, the deep learning used different architectures and these architectures used activation functions to perform various computations between hidden and output layers of any architecture. This paper presents a survey on the existing studies of deep learning in handwriting recognition field. Even though the recent progress indicates that the deep learning methods has provided valuable means for speeding up or proving accurate results in handwriting recognition, but following from the extensive literature survey, the present study finds that the deep learning has yet to revolutionize more and has to resolve many of the most pressing challenges in this field, but promising advances have been made on the prior state of the art. Additionally, an inadequate availability of labelled data to train presents problems in this domain. Nevertheless, the present handwriting recognition survey foresees deep learning enabling changes at both bench and bedside with the potential to transform several domains as image processing, speech recognition, computer vision, machine translation, robotics and control, medical imaging, medical information processing, bio-informatics, natural language processing, cyber security, and many others.

artificial intelligence, machine learning, recognition, (17 more...)

arXiv.org Artificial Intelligence

2404.08011

Country:

North America > United States (0.46)
Asia > India (0.28)
North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.24)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.87)
Information Technology > Security & Privacy (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Precise and Robust Sidewalk Detection: Leveraging Ensemble Learning to Surpass LLM Limitations in Urban Environments

Shihab, Ibne Farabi, Alvee, Benjir Islam, Bhagat, Sudesh Ramesh, Sharma, Anuj

arXiv.org Artificial IntelligenceApr-1-2024

This study aims to compare the effectiveness of a robust ensemble model with the state-of-the-art ONE-PEACE Large Language Model (LLM) for accurate detection of sidewalks. Accurate sidewalk detection is crucial in improving road safety and urban planning. The study evaluated the model's performance on Cityscapes, Ade20k, and the Boston Dataset. The results showed that the ensemble model performed better than the individual models, achieving mean Intersection Over Union (mIOU) scores of 93.1\%, 90.3\%, and 90.6\% on these datasets under ideal conditions. Additionally, the ensemble model maintained a consistent level of performance even in challenging conditions such as Salt-and-Pepper and Speckle noise, with only a gradual decrease in efficiency observed. On the other hand, the ONE-PEACE LLM performed slightly better than the ensemble model in ideal scenarios but experienced a significant decline in performance under noisy conditions. These findings demonstrate the robustness and reliability of the ensemble model, making it a valuable asset for improving urban infrastructure related to road safety and curb space management. This study contributes positively to the broader context of urban health and mobility.

large language model, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2405.14876

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GTAGCN: Generalized Topology Adaptive Graph Convolutional Networks

Singh, Sukhdeep, Sharma, Anuj, Chauhan, Vinod Kumar

arXiv.org Artificial IntelligenceMar-22-2024

Graph Neural Networks (GNN) have emerged as a popular and standard approach for learning from graph-structured data. The literature on GNN highlights the potential of this evolving research area and its widespread adoption in real-life applications. However, most of the approaches are either new in concept or derived from specific techniques. Therefore, the potential of more than one approach in hybrid form has not been studied extensively, which can be well utilized for sequenced data or static data together. We derive a hybrid approach based on two established techniques as generalized aggregation networks and topology adaptive graph convolution networks that solve our purpose to apply on both types of sequenced and static nature of data, effectively. The proposed method applies to both node and graph classification. Our empirical analysis reveals that the results are at par with literature results and better for handwritten strokes as sequenced data, where graph structures have not been explored.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.15077

Country:

Asia > India (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers

Gunn, James, Lenyk, Zygmunt, Sharma, Anuj, Donati, Andrea, Buburuzan, Alexandru, Redford, John, Mueller, Romain

arXiv.org Artificial IntelligenceDec-26-2023

Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not leverage depth as expected and show that naively improving depth estimation does not lead to improvements in object detection performance and that, strikingly, removing depth estimation altogether does not degrade object detection performance. This suggests that relying on monocular depth could be an unnecessary architectural bottleneck during camera-lidar fusion. In this work, we introduce a novel fusion method that bypasses monocular depth estimation altogether and instead selects and fuses camera and lidar features in a bird's-eye-view grid using a simple attention mechanism. We show that our model can modulate its use of camera features based on the availability of lidar features and that it yields better 3D object detection on the nuScenes dataset than baselines relying on monocular depth estimation.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2312.14919

Country: Europe > Switzerland (0.14)

Genre: Research Report > Promising Solution (0.46)

Industry:

Transportation (0.34)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Investigating Speed Deviation Patterns During Glucose Episodes: A Quantile Regression Approach

Joshi, Aparna, Merickel, Jennifer, Desouza, Cyrus V., Rizzo, Matthew, Gunaratne, Pujitha, Sharma, Anuj

arXiv.org Artificial IntelligenceOct-3-2023

Given the growing prevalence of diabetes, there has been significant interest in determining how diabetes affects instrumental daily functions, like driving. Complication of glucose control in diabetes includes hypoglycemic and hyperglycemic episodes, which may impair cognitive and psychomotor functions needed for safe driving. The goal of this paper was to determine patterns of diabetes speed behavior during acute glucose to drivers with diabetes who were euglycemic or control drivers without diabetes in a naturalistic driving environment. By employing distribution-based analytic methods which capture distribution patterns, our study advances prior literature that has focused on conventional approach of average speed to explore speed deviation patterns.

glucose episode, quantile regression approach, speed deviation pattern

arXiv.org Artificial Intelligence

2310.02351

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

Synthetic Distracted Driving (SynDD2) dataset for analyzing distracted behaviors and various gaze zones of a driver

Rahman, Mohammed Shaiqur, Wang, Jiyang, Gursoy, Senem Velipasalar, Anastasiu, David, Wang, Shuo, Sharma, Anuj

arXiv.org Artificial IntelligenceApr-10-2023

This article presents a synthetic distracted driving (SynDD2 - a continuum of SynDD1 [1]) dataset for machine learning models to detect and analyze drivers' various distracted behavior and different gaze zones. We collected the data in a stationary vehicle using three in-vehicle cameras positioned at locations: on the dashboard, near the rearview mirror, and on the top right-side window corner. The dataset contains two activity types: distracted activities [2][3][4] and gaze zones [5][6][7] for each participant, and each activity type has two sets: without appearance blocks and with appearance blocks such as wearing a hat or sunglasses. The order and duration of each activity for each participant are random. In addition, the dataset contains manual annotations for each activity, having its start and end time annotated. Researchers could use this dataset to evaluate the performance of machine learning algorithms to classify various distracting activities and gaze zones of drivers.

artificial intelligence, machine learning, participant, (15 more...)

arXiv.org Artificial Intelligence

2204.08096

Country: North America > United States > Iowa (0.16)

Genre: Research Report (0.83)

Industry: Transportation > Ground > Road (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Newton methods based convolution neural networks using parallel processing

Thakur, Ujjwal, Sharma, Anuj

arXiv.org Artificial IntelligenceDec-2-2021

Training of convolutional neural networks is a high dimensional and a non-convex optimization problem. At present, it is inefficient in situations where parametric learning rates can not be confidently set. Some past works have introduced Newton methods for training deep neural networks. Newton methods for convolutional neural networks involve complicated operations. Finding the Hessian matrix in second-order methods becomes very complex as we mainly use the finite differences method with the image data. Newton methods for convolutional neural networks deals with this by using the sub-sampled Hessian Newton methods. In this paper, we have used the complete data instead of the sub-sampled methods that only handle partial data at a time. Further, we have used parallel processing instead of serial processing in mini-batch computations. The results obtained using parallel processing in this study, outperform the time taken by the previous approach.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2112.01401

Country: North America > United States (0.29)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback