AITopics | El-Refai, Karim

Collaborating Authors

El-Refai, Karim

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects

Yu, Justin, Hari, Kush, El-Refai, Karim, Dalal, Arnav, Kerr, Justin, Kim, Chung Min, Cheng, Richard, Irshad, Muhammad Zubair, Goldberg, Ken

arXiv.org Artificial IntelligenceMar-7-2025

Tracking and manipulating irregularly-shaped, previously unseen objects in dynamic environments is important for robotic applications in manufacturing, assembly, and logistics. Recently introduced Gaussian Splats efficiently model object geometry, but lack persistent state estimation for task-oriented manipulation. We present Persistent Object Gaussian Splat (POGS), a system that embeds semantics, self-supervised visual features, and object grouping features into a compact representation that can be continuously updated to estimate the pose of scanned objects. POGS updates object states without requiring expensive rescanning or prior CAD models of objects. After an initial multi-view scene capture and training phase, POGS uses a single stereo camera to integrate depth estimates along with self-supervised vision encoder features for object pose estimation. POGS supports grasping, reorientation, and natural language-driven manipulation by refining object pose estimates, facilitating sequential object reset operations with human-induced object perturbations and tool servoing, where robots recover tool pose despite tool perturbations of up to 30{\deg}. POGS achieves up to 12 consecutive successful object resets and recovers from 80% of in-grasp tool perturbations.

artificial intelligence, conference, image understanding, (15 more...)

arXiv.org Artificial Intelligence

2503.05189

Country: Asia (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.51)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.49)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.40)

Add feedback

Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot

Yu, Justin, Hari, Kush, Srinivas, Kishore, El-Refai, Karim, Rashid, Adam, Kim, Chung Min, Kerr, Justin, Cheng, Richard, Irshad, Muhammad Zubair, Balakrishna, Ashwin, Kollar, Thomas, Goldberg, Ken

arXiv.org Artificial IntelligenceSep-26-2024

Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy.

artificial intelligence, gaussian, representation, (10 more...)

arXiv.org Artificial Intelligence

2409.18108

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.42)

Add feedback

SWAG: Storytelling With Action Guidance

Patel, Zeeshan, El-Refai, Karim, Pei, Jonathan, Li, Tianle

arXiv.org Artificial IntelligenceFeb-5-2024

Automated long-form story generation typically employs long-context large language models (LLMs) for one-shot creation, which can produce cohesive but not necessarily engaging content. We introduce Storytelling With Action Guidance (SWAG), a novel approach to storytelling with LLMs. Our approach reduces story writing to a search problem through a two-model feedback loop: one LLM generates story content, and another auxiliary LLM is used to choose the next best "action" to steer the story's future direction. Our results show that SWAG can substantially outperform previous end-to-end story generation techniques when evaluated by GPT-4 and through human evaluation, and our SWAG pipeline using only open-source models surpasses GPT-3.5-Turbo.

large language model, machine learning, swag, (21 more...)

arXiv.org Artificial Intelligence

2402.03483

Country: North America > United States > Minnesota (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback