AITopics | Jain, Amit

Collaborating Authors

Jain, Amit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Visual Language Models as Operator Agents in the Space Domain

Carrasco, Alejandro, Nedungadi, Marco, Zucchelli, Enrico M., Jain, Amit, Rodriguez-Fernandez, Victor, Linares, Richard

arXiv.org Artificial IntelligenceJan-13-2025

Since the emergence of the LLM trend, initiated with the first release of ChatGPT [1], these systems have undergone continuous development and have evolved into multimodal architectures. Multimodal models, such as GPT-4o [2], LLaMA 3.2 [3] and Claude with its latest 3.5 Sonnet model [4], integrate language understanding with non-language capabilities, including vision and audio processing. This progression unlocks new opportunities for developing intelligent agents capable of recognizing and interpreting patterns not only at a semantic level but also through components that can incorporate other types of unstructured data into prompts, significantly expanding their potential applications and impact. Extending these capabilities, Vision-Language Models (VLMs) build on multimodal principles by integrating visual reasoning into the LLM framework. By introducing new tokens into the prompts to process image frames, VLMs enable simultaneous semantic and visual reasoning. This enhancement is particularly valuable in dynamic applications like robotics, where the integration of vision and language reasoning enables systems to generate environment-responsive actions. Such actions, often described as descriptive policies, translate reasoning into meaningful, executable commands. Language models able to generate such commands are usually referred to as "agentic".

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.2514/6.2025-1543

2501.07802

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data-Driven Shape Sensing in Continuum Manipulators via Sliding Resistive Flex Sensors

Zhang, Chenhan, Jiang, Shaopeng, Wang, Heyun, Liu, Joshua, Jain, Amit, Armand, Mehran

arXiv.org Artificial IntelligenceNov-29-2023

We introduce a novel shape-sensing method using Resistive Flex Sensors (RFS) embedded in cable-driven Continuum Dexterous Manipulators (CDMs). The RFS is predominantly sensitive to deformation rather than direct forces, making it a distinctive tool for shape sensing. The RFS unit we designed is a considerably less expensive and robust alternative, offering comparable accuracy and real-time performance to existing shape sensing methods used for the CDMs proposed for minimally-invasive surgery. Our design allows the RFS to move along and inside the CDM conforming to its curvature, offering the ability to capture resistance metrics from various bending positions without the need for elaborate sensor setups. The RFS unit is calibrated using an overhead camera and a ResNet machine learning framework. Experiments using a 3D printed prototype of the CDM achieved an average shape estimation error of 0.968 mm with a standard error of 0.275 mm. The response time of the model was approximately 1.16 ms, making real-time shape sensing feasible. While this preliminary study successfully showed the feasibility of our approach for C-shape CDM deformations with non-constant curvatures, we are currently extending the results to show the feasibility for adapting to more complex CDM configurations such as S-shape created in obstructed environments or in presence of the external forces.

artificial intelligence, real time system, sliding resistive flex sensor, (2 more...)

arXiv.org Artificial Intelligence

2311.18154

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence (0.87)
Information Technology > Architecture > Real Time Systems (0.73)

Add feedback

Data Filtering Networks

Fang, Alex, Jose, Albin Madappally, Jain, Amit, Schmidt, Ludwig, Toshev, Alexander, Shankar, Vaishaal

arXiv.org Artificial IntelligenceNov-5-2023

Large training sets have become a cornerstone of machine learning and are the foundation for recent advances in language modeling and multimodal learning. While data curation for pre-training is often still ad-hoc, one common paradigm is to first collect a massive pool of data from the Web and then filter this candidate pool down to an actual training set via various heuristics. In this work, we study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset. Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks: for instance, a model that performs well on ImageNet can yield worse training sets than a model with low ImageNet accuracy that is trained on a small amount of high-quality data. Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets. Specifically, our best performing dataset DFN-5B enables us to train state-of-the-art CLIP models for their compute budgets: among other improvements on a variety of tasks, a ViT-H trained on our dataset achieves 84.4% zero-shot transfer accuracy on ImageNet, out-performing models trained on other datasets such as LAION-2B, DataComp-1B, or OpenAI's WIT. In order to facilitate further research in dataset design, we also release a new 2 billion example dataset DFN-2B and show that high performance data filtering networks can be trained from scratch using only publicly available data.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.17425

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)

Add feedback