AITopics | Singh, Harsh

Collaborating Authors

Singh, Harsh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

Singh, Harsh, Das, Rocktim Jyoti, Han, Mingfei, Nakov, Preslav, Laptev, Ivan

arXiv.org Artificial IntelligenceNov-26-2024

Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans in a single pass without real-time feedback. To address these limitations, we propose a novel multi-agent LLM framework, Multi-Agent Large Language Model for Manipulation (MALMM) that distributes high-level planning and low-level control code generation across specialized LLM agents, supervised by an additional agent that dynamically manages transitions. By incorporating observations from the environment after each step, our framework effectively handles intermediate failures and enables adaptive re-planning. Unlike existing methods, our approach does not rely on pre-trained skill policies or in-context learning examples and generalizes to a variety of new tasks. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting, thereby overcoming key limitations of existing LLM-based manipulation methods.

container, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.17636

Genre: Workflow (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Vayani, Ashmal, Dissanayake, Dinura, Watawana, Hasindri, Ahsan, Noor, Sasikumar, Nevasini, Thawakar, Omkar, Ademtew, Henok Biadglign, Hmaiti, Yahya, Kumar, Amandeep, Kuckreja, Kartik, Maslych, Mykola, Ghallabi, Wafa Al, Mihaylov, Mihail, Qin, Chao, Shaker, Abdelrahman M, Zhang, Mike, Ihsani, Mahardika Krisna, Esplana, Amiel, Gokani, Monil, Mirkin, Shachar, Singh, Harsh, Srivastava, Ashay, Hamerlik, Endre, Izzati, Fathinah Asma, Maani, Fadillah Adamsyah, Cavada, Sebastian, Chim, Jenny, Gupta, Rohit, Manjunath, Sanjay, Zhumakhanova, Kamila, Rabevohitra, Feno Heriniaina, Amirudin, Azril, Ridzuan, Muhammad, Kareem, Daniya, More, Ketan, Li, Kunyang, Shakya, Pramesh, Saad, Muhammad, Ghasemaghaei, Amirpouya, Djanibekov, Amirbek, Azizov, Dilshod, Jankovic, Branislava, Bhatia, Naman, Cabrera, Alvaro, Obando-Ceron, Johan, Otieno, Olympiah, Farestam, Fabian, Rabbani, Muztoba, Baliah, Sanoojan, Sanjeev, Santosh, Shtanchaev, Abduragim, Fatima, Maheen, Nguyen, Thao, Kareem, Amrin, Aremu, Toluwani, Xavier, Nathan, Bhatkal, Amit, Toyin, Hawau, Chadha, Aman, Cholakkal, Hisham, Anwer, Rao Muhammad, Felsberg, Michael, Laaksonen, Jorma, Solorio, Thamar, Choudhury, Monojit, Laptev, Ivan, Shah, Mubarak, Khan, Salman, Khan, Fahad

arXiv.org Artificial IntelligenceNov-26-2024

Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model's ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available.

category, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.16508

Country:

Europe > Spain (1.00)
Asia > India (0.93)
Asia > Middle East (0.67)
Africa > South Africa > Gauteng (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Television (0.46)
Leisure & Entertainment > Sports > Martial Arts (0.46)
Government > Military (0.46)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Information Management (0.92)
(2 more...)

Add feedback

Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

Deng, Shijian, Kosloski, Erin E., Patel, Siddhi, Barnett, Zeke A., Nan, Yiyang, Kaplan, Alexander, Aarukapalli, Sisira, Doan, William T., Wang, Matthew, Singh, Harsh, Rollins, Pamela R., Tian, Yapeng

arXiv.org Artificial IntelligenceMar-22-2024

In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-related behaviors. To facilitate this new research direction, we collected an audio-visual autism spectrum dataset (AV-ASD), currently the largest video dataset for autism screening using a behavioral approach. It covers an extensive range of autism-associated behaviors, including those related to social communication and interaction. To pave the way for further research on this new problem, we intensively explored leveraging foundation models and multimodal large language models across different modalities. Our experiments on the AV-ASD dataset demonstrate that integrating audio, visual, and speech modalities significantly enhances the performance in autism behavior recognition. Additionally, we explored the use of a post-hoc to ad-hoc pipeline in a multimodal large language model to investigate its potential to augment the model's explanatory capability during autism behavior recognition. We will release our dataset, code, and pre-trained models.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.02554

Country:

North America > United States > Texas > Dallas County (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback