AITopics | Aragon-Camarasa, Gerardo

Collaborating Authors

Aragon-Camarasa, Gerardo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations

Long, Zijun, Liang, Kangheng, Aragon-Camarasa, Gerardo, Mccreadie, Richard, Henderson, Paul

arXiv.org Artificial IntelligenceJan-25-2025

Interactive Text-to-Image Retrieval (I-TIR) has emerged as a transformative user-interactive tool for applications in domains such as e-commerce and education. Yet, current methodologies predominantly depend on finetuned Multimodal Large Language Models (MLLMs), which face two critical limitations: (1) Finetuning imposes prohibitive computational overhead and long-term maintenance costs. (2) Finetuning narrows the pretrained knowledge distribution of MLLMs, reducing their adaptability to novel scenarios. These issues are exacerbated by the inherently dynamic nature of real-world I-TIR systems, where queries and image databases evolve in complexity and diversity, often deviating from static training distributions. To overcome these constraints, we propose Diffusion Augmented Retrieval (DAR), a paradigm-shifting framework that bypasses MLLM finetuning entirely. DAR synergizes Large Language Model (LLM)-guided query refinement with Diffusion Model (DM)-based visual synthesis to create contextually enriched intermediate representations. This dual-modality approach deciphers nuanced user intent more holistically, enabling precise alignment between textual queries and visually relevant images. Rigorous evaluations across four benchmarks reveal DAR's dual strengths: (1) Matches state-of-the-art finetuned I-TIR models on straightforward queries without task-specific training. (2) Scalable Generalization: Surpasses finetuned baselines by 7.61% in Hits@10 (top-10 accuracy) under multi-turn conversational complexity, demonstrating robustness to intricate, distributionally shifted interactions. By eliminating finetuning dependencies and leveraging generative-augmented representations, DAR establishes a new trajectory for efficient, adaptive, and scalable cross-modal retrieval systems.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.15379

Country:

Asia > Middle East > Syria > Daraa Governorate > Dar'a (0.24)
Europe > United Kingdom (0.14)
North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Breaking Down the Barriers: Investigating Non-Expert User Experiences in Robotic Teleoperation in UK and Japan

Audonnet, Florent P, Hamilton, Andrew, Domae, Yakiyasu, Ramirez-Alpizar, Ixchel G, Aragon-Camarasa, Gerardo

arXiv.org Artificial IntelligenceDec-3-2024

Robots are being created each year with the goal of integrating them into our daily lives. As such, there is an interest in research in evaluating the trust of humans toward robots. In addition, teleoperating robotic arms can be challenging for non-experts. To reduce the strain put on the user, we created TELESIM, a modular and plug-and-play framework that enables direct teleoperation of any robotic arm using a digital twin as the interface between users and the robotic system. We evaluated our framework using a user survey with three robots and control methods and recorded the user's workload and performance at completing a tower stacking task. However, an analysis of the strain on the user and their ability to trust robots was omitted. This paper addresses these omissions by presenting the additional results of our user survey of 37 participants carried out in United Kingdom. In addition, we present the results of an additional user survey, under similar conditions performed in Japan, with the goal of addressing the limitations of our previous approach, by interfacing a VR controller with a UR5e. Our experimental results show that the UR5e has more towers built. Additionally, the UR5e gives the least amount of cognitive stress, while the combination of Senseglove and UR3 provides the user with the highest physical strain and causes the user to feel more frustrated. Finally, the Japanese participants seem more trusting of robots than the British participants.

artificial intelligence, human computer interaction, robot, (18 more...)

arXiv.org Artificial Intelligence

2410.18727

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Japan > Honshū > Kantō (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.89)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.66)

Add feedback

Flat'n'Fold: A Diverse Multi-Modal Dataset for Garment Perception and Manipulation

Zhuang, Lipeng, Fan, Shiyu, Ru, Yingdong, Audonnet, Florent, Henderson, Paul, Aragon-Camarasa, Gerardo

arXiv.org Artificial IntelligenceSep-26-2024

Abstract-- We present Flat'n'Fold, a novel large-scale dataset for garment manipulation that addresses critical gaps in existing datasets. We quantify the dataset's diversity and complexity compared to existing benchmarks and show that our dataset features natural and diverse manipulations of real-world demonstrations of human and robot demonstrations in terms of visual and action information. To showcase Flat'n'Fold's utility, we establish new benchmarks for grasping point prediction and This underscores Flat'n'Fold's potential to drive advances in robotic perception and manipulation of deformable objects. Human-controlled Robot Demonstrations, where an expert Manipulating garments remains a significant challenge in human operator controls a robot to execute similar robotics. Tasks such as flattening and folding require understanding garment manipulation tasks, aiming to replicate natural, the vast space of configurations that garments can human-like approaches within the robot's operational adopt [1], [2], and planning complex sequences of actions limitations.

artificial intelligence, dataset, human computer interaction, (17 more...)

arXiv.org Artificial Intelligence

2409.18297

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.69)

Add feedback

CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion

Long, Zijun, Killick, George, Zhuang, Lipeng, Aragon-Camarasa, Gerardo, Meng, Zaiqiao, Mccreadie, Richard

arXiv.org Artificial IntelligenceFeb-22-2024

State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperforms CE in Top-1 accuracy across twelve benchmarks, achieving gains of up to 3.52% in few-shot learning scenarios and 3.41% in transfer learning settings with the BEiT-3 model. Importantly, our proposed CLCE approach effectively mitigates the dependency of contrastive learning on large batch sizes such as 4096 samples per batch, a limitation that has previously constrained the application of contrastive learning in budget-limited hardware environments.

artificial intelligence, clce, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2402.14551

Country:

North America > Canada > Quebec (0.14)
North America > United States > New York (0.14)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > Promising Solution (0.88)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Foveation in the Era of Deep Learning

Killick, George, Henderson, Paul, Siebert, Paul, Aragon-Camarasa, Gerardo

arXiv.org Artificial IntelligenceDec-3-2023

Many biological vision systems sense the world with a foveated sensor, where the highest resolution processing is limited to only a small central portion of the visual field (the fovea). Computer vision systems have taken inspiration from this aspect of biological vision and incorporated it into visual attention models that learn to sample and process visual scenes actively [1, 2, 3]. The promise of foveated vision is the ability to resolve and process fine details while simultaneously maintaining a wide field of view, which has applications to problems where semantic information can exist over a high-dynamic range of scales. More generally, it is well known that scaling the resolution of inputs to CNNs can reliably improve accuracy in objection recognition problems [4]. Through sparse sampling in the periphery of the field of view, foveated sensors can achieve this with significantly fewer pixels than a uniform sensor, making it an appealing approach to building parsimonious vision systems.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2312.0145

Country:

North America > United States (0.29)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)

Add feedback

TELESIM: A Modular and Plug-and-Play Framework for Robotic Arm Teleoperation using a Digital Twin

Audonnet, Florent P, Grizou, Jonathan, Hamilton, Andrew, Aragon-Camarasa, Gerardo

arXiv.org Artificial IntelligenceSep-20-2023

We present TELESIM, a modular and plug-and-play framework for direct teleoperation of a robotic arm using a digital twin as the interface between the user and the robotic system. We tested TELESIM by performing a user survey with 37 participants on two different robots using two different control modalities: a virtual reality controller and a finger mapping hardware controller using different grasping systems. Users were asked to teleoperate the robot to pick and place 3 cubes in a tower and to repeat this task as many times as possible in 10 minutes, with only 5 minutes of training beforehand. Our experimental results show that most users were able to succeed by building at least a tower of 3 cubes regardless of the control modality or robot used, demonstrating the user-friendliness of TELESIM.

artificial intelligence, modular and plug-and-play framework, robotic arm teleoperation, (2 more...)

arXiv.org Artificial Intelligence

2309.10579

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Enabling the Sense of Self in a Dual-Arm Robot

AlQallaf, Ali, Aragon-Camarasa, Gerardo

arXiv.org Artificial IntelligenceNov-13-2020

While humans are aware of their body and capabilities, robots are not. To address this, we present in this paper a neural network architecture that enables a dual-arm robot to get a sense of itself in an environment. Our approach is inspired by human self-awareness developmental levels and serves as the underlying building block for a robot to achieve awareness of itself while carrying out tasks in an environment. We assume that a robot has to know itself before interacting with the environment in order to be able to support different robotic tasks. Hence, we implemented a neural network architecture to enable a robot to differentiate its limbs from the environment using visual and proprioception sensory inputs. We demonstrate experimentally that a robot can distinguish itself with an accuracy of 88.7% on average in cluttered environmental settings and under confounding input signals.

deep learning, neural network, robot, (21 more...)

arXiv.org Artificial Intelligence

2011.07026

Country: North America > Canada (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.42)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Intrinsic Robotic Introspection: Learning Internal States From Neuron Activations

Pitsillos, Nikos, Pore, Ameya, Jensen, Bjorn Sand, Aragon-Camarasa, Gerardo

arXiv.org Artificial IntelligenceNov-3-2020

We present an introspective framework inspired by the process of how humans perform introspection. Our working assumption is that neural network activations encode information, and building internal states from these activations can improve the performance of an actor-critic model. We perform experiments where we first train a Variational Autoencoder model to reconstruct the activations of a feature extraction network and use the latent space to improve the performance of an actor-critic when deciding which low-level robotic behaviour to execute. We show that internal states reduce the number of episodes needed by about 1300 episodes while training an actor-critic, denoting faster convergence to get a high success value while completing a robotic task.

health & medicine, internal state, neural network, (18 more...)

arXiv.org Artificial Intelligence

2011.0188

Country:

Europe (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.94)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback