AITopics | recognizer

Collaborating Authors

recognizer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online Dynamic Goal Recognition in Gym Environments

Matan, Shamir, Osher, Elhadad, Ben, Nageris, Reuth, Mirsky

arXiv.org Artificial IntelligenceSep-30-2025

Goal Recognition (GR) is the task of inferring an agent's intended goal from partial observations of its behavior, typically in an online and one-shot setting. Despite recent advances in model-free GR, particularly in applications such as human-robot interaction, surveillance, and assistive systems, the field remains fragmented due to inconsistencies in benchmarks, domains, and evaluation protocols. To address this, we introduce gr-libs (https://github.com/MatanShamir1/gr_libs) and gr-envs (https://github.com/MatanShamir1/gr_envs), two complementary open-source frameworks that support the development, evaluation, and comparison of GR algorithms in Gym-compatible environments. gr-libs includes modular implementations of MDP-based GR baselines, diagnostic tools, and evaluation utilities. gr-envs provides a curated suite of environments adapted for dynamic and goal-directed behavior, along with wrappers that ensure compatibility with standard reinforcement learning toolkits. Together, these libraries offer a standardized, extensible, and reproducible platform for advancing GR research. Both packages are open-source and available on GitHub and PyPI.

machine learning, recognition, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2509.23244

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.75)

Add feedback

Building Tailored Speech Recognizers for Japanese Speaking Assessment

Kubo, Yotaro, Sproat, Richard, Taguchi, Chihiro, Jones, Llion

arXiv.org Artificial IntelligenceSep-26-2025

This paper presents methods for building speech recognizers tailored for Japanese speaking assessment tasks. Specifically, we build a speech recognizer that outputs phonemic labels with accent markers. Although Japanese is resource-rich, there is only a small amount of data for training models to produce accurate phonemic transcriptions that include accent marks. We propose two methods to mitigate data sparsity. First, a multitask training scheme introduces auxiliary loss functions to estimate orthographic text labels and pitch patterns of the input signal, so that utterances with only orthographic annotations can be leveraged in training. The second fuses two estimators, one over phonetic alphabet strings, and the other over text token sequences. To combine these estimates we develop an algorithm based on the finite-state transducer framework. Our results indicate that the use of multitask learning and fusion is effective for building an accurate phonemic recognizer. We show that this approach is advantageous compared to the use of generic multilingual recognizers. The relative advantages of the proposed methods were also compared. Our proposed methods reduced the average of mora-label error rates from 12.3% to 7.1% over the CSJ core evaluation sets.

artificial intelligence, machine learning, recognizer, (18 more...)

arXiv.org Artificial Intelligence

2509.20655

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robot Tactile Gesture Recognition Based on Full-body Modular E-skin

Jiang, Shuo, Hu, Boce, Zhao, Linfeng, Wong, Lawson L. S.

arXiv.org Artificial IntelligenceJun-24-2025

With the development of robot electronic skin technology, various tactile sensors, enhanced by AI, are unlocking a new dimension of perception for robots. In this work, we explore how robots equipped with electronic skin can recognize tactile gestures and interpret them as human commands. We developed a modular robot E-skin, composed of multiple irregularly shaped skin patches, which can be assembled to cover the robot's body while capturing real-time pressure and pose data from thousands of sensing points. To process this information, we propose an equivariant graph neural network-based recognizer that efficiently and accurately classifies diverse tactile gestures, including poke, grab, stroke, and double-pat. By mapping the recognized gestures to predefined robot actions, we enable intuitive human-robot interaction purely through tactile input.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Artificial Intelligence

2506.18256

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition

Chakraborty, Ritabrata, Palaiahnakote, Shivakumara, Pal, Umapada, Liu, Cheng-Lin

arXiv.org Artificial IntelligenceMar-19-2025

Modern scene text recognition systems often depend on large end-to-end architectures that require extensive training and are prohibitively expensive for real-time scenarios. In such cases, the deployment of heavy models becomes impractical due to constraints on memory, computational resources, and latency. To address these challenges, we propose a novel, training-free plug-and-play framework that leverages the strengths of pre-trained text recognizers while minimizing redundant computations. Our approach uses context-based understanding and introduces an attention-based segmentation stage, which refines candidate text regions at the pixel level, improving downstream recognition. Instead of performing traditional text detection that follows a block-level comparison between feature map and source image and harnesses contextual information using pretrained captioners, allowing the framework to generate word predictions directly from scene context.Candidate texts are semantically and lexically evaluated to get a final score. Predictions that meet or exceed a pre-defined confidence threshold bypass the heavier process of end-to-end text STR profiling, ensuring faster inference and cutting down on unnecessary computations. Experiments on public benchmarks demonstrate that our paradigm achieves performance on par with state-of-the-art systems, yet requires substantially fewer resources.

machine learning, natural language, recognition, (17 more...)

arXiv.org Artificial Intelligence

2503.15639

Country:

Asia > India > West Bengal > Kolkata (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Greater Manchester > Salford (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Reviews: Unsupervised Learning of Spoken Language with Visual Context

Neural Information Processing SystemsJan-20-2025, 14:44:42 GMT

This is interesting work that is pointing into the right direction, but a few aspects of this paper are a bit problematic: 1) It would have been useful (or interesting) to use a corpus that has existing text captions, and either have users re-speak the text captions, or collect additional captions. The data collections seems generally well thought-out, but why was the Places205 data set used? Prompted speech (such as collected here) is not "spontaneous", otherwise the WSJ recognizer would not have given 20 % WER (this aspect is irrelevant for the purpose of this paper, though, I think). Typically, multiple captions are being generated for a single image. Has this been done here as well? Or is there only a single caption for each image?

caption, spoken language, unsupervised learning, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Speech (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)

Add feedback

Reviews: Learnable Visual Markers

Neural Information Processing SystemsJan-20-2025, 08:57:20 GMT

Briefly, the overall idea of serializing the whole encoding--environment simulation--recognition into an end-to-end network is interesting and worth exploration. The presentation is clear and comprehensive. To me, however, there is still large room to fulfill the idea and its application possibility. Specifically, more experiments should have been done to support and demonstrate the idea. The end-to-end learning is intuitive and proved effective, at least qualitatively, but the model structure binds the synthesizer and the recognizer together, assuming that we already know the recognizer in the first place. In real situations, the two parts are mostly fully decoupled, and this structure limits its applications.

learnable visual marker, protocol, transformation, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.35)

Add feedback

CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers?

Mallis, Dimitrios, Karadeniz, Ahmet Serdar, Cavada, Sebastian, Rukhovich, Danila, Foteinopoulou, Niki, Cherenkova, Kseniya, Kacem, Anis, Aouada, Djamila

arXiv.org Artificial IntelligenceDec-18-2024

We propose CAD-Assistant, a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CAD-specific modules. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via its Python API. Our framework is able to assess the impact of generated CAD commands on geometry and adapts subsequent actions based on the evolving state of the CAD design. We consider a wide range of CAD-specific tools including Python libraries, modules of the FreeCAD Python API, helpful routines, rendering functions and other specialized modules. We evaluate our method on multiple CAD benchmarks and qualitatively demonstrate the potential of tool-augmented VLLMs as generic CAD task solvers across diverse CAD workflows.

constraint, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.1381

Country:

North America > United States (0.04)
Asia (0.04)

Genre:

Research Report (0.63)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers

Serai, Prashant, Wang, Peidong, Fosler-Lussier, Eric

arXiv.org Artificial IntelligenceAug-20-2024

Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or even no audio data is available at train time. Previous work typically considered replicating behavior of GMM-HMM based systems, but the behavior of more modern posterior-based neural network acoustic models is not the same and requires adjustments to the error prediction model. In this work, we extend a prior phonetic confusion based model for predicting speech recognition errors in two ways: first, we introduce a sampling-based paradigm that better simulates the behavior of a posterior-based acoustic model. Second, we investigate replacing the confusion matrix with a sequence-to-sequence model in order to introduce context dependency into the prediction. We evaluate the error predictors in two ways: first by predicting the errors made by a Switchboard ASR system on unseen data (Fisher), and then using that same predictor to estimate the behavior of an unrelated cloud-based ASR system on a novel task. Sampling greatly improves predictive accuracy within a 100-guess paradigm, while the sequence model performs similarly to the confusion matrix.

confusion matrix, sequence, word sequence, (14 more...)

arXiv.org Artificial Intelligence

2408.11258

Country: North America > United States > Ohio (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.70)
Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Towards Semantic Markup of Mathematical Documents via User Interaction

Vrečar, Luka, Wells, Joe, Kamareddine, Fairouz

arXiv.org Artificial IntelligenceAug-5-2024

Mathematical documents written in LaTeX often contain ambiguities. We can resolve some of them via semantic markup using, e.g., sTeX, which also has other potential benefits, such as interoperability with computer algebra systems, proof systems, and increased accessibility. However, semantic markup is more involved than "regular" typesetting and presents a challenge for authors of mathematical documents. We aim to smooth out the transition from plain LaTeX to semantic markup by developing semi-automatic tools for authors. In this paper we present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing sTeX macro definitions and parsing mathematical formulas with them. We also present a GUI-based tool for the disambiguation of parse results and showcase its functionality and potential using a grammar for parsing untyped $\lambda$-terms.

formula, grammar, macro, (15 more...)

arXiv.org Artificial Intelligence

2408.04656

Country:

Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Masked Face Recognition with Generative-to-Discriminative Representations

Ge, Shiming, Guo, Weijia, Li, Chenyu, Zhang, Junzheng, Li, Yong, Zeng, Dan

arXiv.org Artificial IntelligenceMay-26-2024

Masked face recognition is important for social good but challenged by diverse occlusions that cause insufficient or inaccurate representations. In this work, we propose a unified deep network to learn generative-to-discriminative representations for facilitating masked face recognition. To this end, we split the network into three modules and learn them on synthetic masked faces in a greedy module-wise pretraining manner. First, we leverage a generative encoder pretrained for face inpainting and finetune it to represent masked faces into category-aware descriptors. Attribute to the generative encoder's ability in recovering context information, the resulting descriptors can provide occlusion-robust representations for masked faces, mitigating the effect of diverse masks. Then, we incorporate a multi-layer convolutional network as a discriminative reformer and learn it to convert the category-aware descriptors into identity-aware vectors, where the learning is effectively supervised by distilling relation knowledge from off-the-shelf face recognition model. In this way, the discriminative reformer together with the generative encoder serves as the pretrained backbone, providing general and discriminative representations towards masked faces. Finally, we cascade one fully-connected layer following by one softmax layer into a feature classifier and finetune it to identify the reformed identity-aware vectors. Extensive experiments on synthetic and realistic datasets demonstrate the effectiveness of our approach in recognizing masked faces.

face recognition, recognition, representation, (12 more...)

arXiv.org Artificial Intelligence

2405.16761

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Social Sector (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback