AITopics

arXiv.org Artificial IntelligenceOct-26-2024

GiVE: Guiding Visual Encoder to Perceive Overlooked Information

Li, Junjie, Ma, Jianghong, Zhang, Xiaofeng, Li, Yuhang, Shi, Jianyang

Multimodal Large Language Models have advanced AI in applications like text-to-video generation and visual question answering. These models rely on visual encoders to convert non-text data into vectors, but current encoders either lack semantic alignment or overlook non-salient objects. We propose the Guiding Visual Encoder to Perceive Overlooked Information (GiVE) approach. GiVE enhances visual representation with an Attention-Guided Adapter (AG-Adapter) module and an Object-focused Visual Semantic Learning module. These incorporate three novel loss terms: Object-focused Image-Text Contrast (OITC) loss, Object-focused Image-Image Contrast (OIIC) loss, and Object-focused Image Discrimination (OID) loss, improving object consideration, retrieval accuracy, and comprehensiveness. Our contributions include dynamic visual focus adjustment, novel loss functions to enhance object retrieval, and the Multi-Object Instruction (MOInst) dataset. Experiments show our approach achieves state-of-the-art performance.

large language model, machine learning, natural language, (20 more...)

2410.20109

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(3 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Transportation > Passenger (0.67)
Transportation > Ground > Road (0.67)
Automobiles & Trucks > Manufacturer (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-25-2024

A Tutorial on Teaching Data Analytics with Generative AI

Bray, Robert L.

This tutorial addresses the challenge of incorporating large language models (LLMs), such as ChatGPT, in a data analytics class. It details several new in-class and out-of-class teaching techniques enabled by AI. For example, instructors can parallelize instruction by having students interact with different custom-made GPTs to learn different parts of an analysis and then teach each other what they learned from their AIs. For another example, instructors can turn problem sets into AI tutoring sessions, whereby a custom-made GPT guides a student through the problems, and the student uploads the chatlog for their homework submission. For a third example, you can assign different labs to each section of your class and have each section create AI assistants to help the other sections work through their labs. This tutorial advocates the programming in the English paradigm, in which students express the desired data transformations in prose and then use AI to generate the corresponding code. Students can wrangle data more effectively by programming in English than by manipulating in Excel. However, some students will program in English better than others, so you will still derive a robust grade distribution (at least with current LLMs).

large language model, machine learning, natural language, (19 more...)

2411.07244

Country: North America > United States > Illinois > Cook County > Evanston (0.04)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Leisure & Entertainment (1.00)
Education > Educational Setting (1.00)
Education > Curriculum (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

AIHubOct-24-2024, 08:33:04 GMT

Interview with Pulkit Verma: Towards safe and reliable behavior of AI agents

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In this latest interview, we hear from Pulkit Verma, recent PhD graduate from Arizona State University. I recently completed my PhD in Computer Science from School of Computing and Augmented Intelligence, Arizona State University. My research focuses on safe and reliable behavior of AI agents.

ai system, assessment, black-box ai system, (13 more...)

AIHub

Country:

North America > United States > Arizona (0.46)
Asia > India (0.05)

Genre:

Personal > Interview (0.36)
Instructional Material (0.36)

Industry:

Leisure & Entertainment > Sports (0.31)
Education > Educational Setting > K-12 Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.72)
Information Technology > Artificial Intelligence > Robots (0.50)

arXiv.org Machine LearningOct-24-2024

Structured Diffusion Models with Mixture of Gaussians as Prior Distribution

Jia, Nanshan, Zhu, Tingyu, Liu, Haoyu, Zheng, Zeyu

We propose a class of structured diffusion models, in which the prior distribution is chosen as a mixture of Gaussians, rather than a standard Gaussian distribution. The specific mixed Gaussian distribution, as prior, can be chosen to incorporate certain structured information of the data. We develop a simple-to-implement training procedure that smoothly accommodates the use of mixed Gaussian as prior. Theory is provided to quantify the benefits of our proposed models, compared to the classical diffusion models. Numerical experiments with synthetic, image and operational data are conducted to show comparative advantages of our model. Our method is shown to be robust to mis-specifications and in particular suits situations where training resources are limited or faster training in real time is desired.

artificial intelligence, diffusion model, machine learning, (14 more...)

arXiv.org Machine Learning

2410.19149

Genre:

Research Report (0.40)
Instructional Material (0.35)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

SAMG: State-Action-Aware Offline-to-Online Reinforcement Learning with Offline Model Guidance

Zhang, Liyu, Wu, Haochi, Wan, Xu, Kong, Quan, Deng, Ruilong, Sun, Mingyang

The offline-to-online (O2O) paradigm in reinforcement learning (RL) utilizes pre-trained models on offline datasets for subsequent online fine-tuning. However, conventional O2O RL algorithms typically require maintaining and retraining the large offline datasets to mitigate the effects of out-of-distribution (OOD) data, which limits their efficiency in exploiting online samples. To address this challenge, we introduce a new paradigm called SAMG: State-Action-Conditional Offline-to-Online Reinforcement Learning with Offline Model Guidance. In particular, rather than directly training on offline data, SAMG freezes the pre-trained offline critic to provide offline values for each state-action pair to deliver compact offline information. This framework eliminates the need for retraining with offline data by freezing and leveraging these values of the offline model. These are then incorporated with the online target critic using a Bellman equation weighted by a policy state-action-aware coefficient. This coefficient, derived from a conditional variational auto-encoder (C-VAE), aims to capture the reliability of the offline data on a state-action level. SAMG could be easily integrated with existing Q-function based O2O RL algorithms. Theoretical analysis shows good optimality and lower estimation error of SAMG. Empirical evaluations demonstrate that SAMG outperforms four state-of-the-art O2O RL algorithms in the D4RL benchmark.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2410.18626

Country: North America > United States > Montana (0.04)

Genre:

Research Report (0.64)
Instructional Material > Online (0.60)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Guiding Empowerment Model: Liberating Neurodiversity in Online Higher Education

Beaux, Hannah, Karimi, Pegah, Pop, Otilia, Clark, Rob

In this innovative practice full paper, we address the equity gap for neurodivergent and situationally limited learners by identifying the spectrum of dynamic factors that impact learning and function. Educators have shown a growing interest in identifying learners' cognitive abilities and learning preferences to measure their impact on academic achievement. Often institutions employ one-size-fits-all approaches leaving the burden on disabled students to self-advocate or tolerate inadequate support. Emerging frameworks guide neurodivergent learners through instructional approaches, such as online education. However, these frameworks fail to address holistic environmental needs or recommend technology interventions, particularly for those with undisclosed learning or developmental disabilities and situational limitations. In this article, we integrate a neurodivergent perspective through secondary research of around 100 articles to introduce a Guiding Empowerment Model involving key cognitive and situational factors that contextualize day-to-day experiences affecting learner ability. We synthesize three sample student profiles that highlight user problems in functioning. We use this model to evaluate sample learning platform features and other supportive technology solutions. The proposed approach augments frameworks such as Universal Design for Learning to consider factors including various sensory processing differences, social connection challenges, and environmental limitations. We suggest that by applying the mode through technology-enabled features such as customizable task management, guided varied content access, and guided multi-modal collaboration, major learning barriers of neurodivergent and situationally limited learners will be removed to activate the successful pursuit of their academic goals.

artificial intelligence, learner, machine learning, (16 more...)

2410.18876

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
(3 more...)

Genre: Instructional Material > Online (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Education > Educational Setting > Online (1.00)
Education > Educational Setting > Higher Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.89)

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Jang, Lawrence, Li, Yinheng, Ding, Charles, Lin, Justin, Liang, Paul Pu, Zhao, Dan, Bonatti, Rogerio, Koishida, Kazuhito

Videos are often used to learn or extract the necessary information to complete tasks in ways different than what text and static imagery alone can provide. However, many existing agent benchmarks neglect long-context video understanding, instead focusing on text or static image inputs. To bridge this gap, we introduce VideoWebArena (VideoWA), a benchmark for evaluating the capabilities of long-context multimodal agents for video understanding. VideoWA consists of 2,021 web agent tasks based on manually crafted video tutorials, which total almost four hours of content. For our benchmark, we define a taxonomy of long-context video-based agent tasks with two main areas of focus: skill retention and factual retention. While skill retention tasks evaluate whether an agent can use a given human demonstration to complete a task efficiently, the factual retention task evaluates whether an agent can retrieve instruction-relevant information from a video to complete a task. We find that the best model achieves 13.3% success on factual retention tasks and 45.8% on factual retention QA pairs, far below human performance at 73.9% and 79.3%, respectively. On skill retention tasks, long-context models perform worse with tutorials than without, exhibiting a 5% performance decrease in WebArena tasks and a 10.3% decrease in VisualWebArena tasks. Our work highlights the need to improve the agentic abilities of long-context multimodal models and provides a testbed for future development with long-context video agents.

large language model, machine learning, natural language, (20 more...)

2410.191

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
(2 more...)

Hingle, Ashish, Johri, Aditya

Expanding AI Awareness Through Everyday Interactions with AI: A Reflective Journal Study

As the application of AI continues to expand, students in technology programs are poised to be both producers and users of the technologies. They are also positioned to engage with AI applications within and outside the classroom. While focusing on the curriculum when examining students' AI knowledge is common, extending this connection to students' everyday interactions with AI provides a more complete picture of their learning. In this paper, we explore student's awareness and engagement with AI in the context of school and their daily lives. Over six weeks, 22 undergraduate students participated in a reflective journal study and submitted a weekly journal entry about their interactions with AI. The participants were recruited from a technology and society course that focuses on the implications of technology on people, communities, and processes. In their weekly journal entries, participants reflected on interactions with AI on campus (coursework, advertises campus events, or seminars) and beyond (social media, news, or conversations with friends and family). The journal prompts were designed to help them think through what they had read, watched, or been told and reflect on the development of their own perspectives, knowledge, and literacy on the topic. Overall, students described nine categories of interactions: coursework, news and current events, using software and applications, university events, social media related to their work, personal discussions with friends and family, interacting with content, and gaming. Students reported that completing the diaries allowed them time for reflection and made them more aware of the presence of AI in their daily lives and of its potential benefits and drawbacks. This research contributes to the ongoing work on AI awareness and literacy by bringing in perspectives from beyond a formal educational context.

artificial intelligence, machine learning, natural language, (14 more...)

2410.18845

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Massachusetts > Essex County > Newburyport (0.04)
(4 more...)

Genre:

Instructional Material (1.00)
Research Report > Experimental Study (0.88)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Leisure & Entertainment > Games (0.93)
Education > Educational Setting > Higher Education (0.88)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.93)

Wanzare, Lilian, Okutoyi, Joel, Kang'ahi, Maurine, Ayere, Mildred

Kenyan Sign Language (KSL) Dataset: Using Artificial Intelligence (AI) in Bridging Communication Barrier among the Deaf Learners

arXiv.org Artificial IntelligenceOct-23-2024

Kenyan Sign Language (KSL) is the primary language used by the deaf community in Kenya. It is the medium of instruction from Pre-primary 1 to university among deaf learners, facilitating their education and academic achievement. Kenyan Sign Language is used for social interaction, expression of needs, making requests and general communication among persons who are deaf in Kenya. However, there exists a language barrier between the deaf and the hearing people in Kenya. Thus, the innovation on AI4KSL is key in eliminating the communication barrier. Artificial intelligence for KSL is a two-year research project (2023-2024) that aims to create a digital open-access AI of spontaneous and elicited data from a representative sample of the Kenyan deaf community. The purpose of this study is to develop AI assistive technology dataset that translates English to KSL as a way of fostering inclusion and bridging language barriers among deaf learners in Kenya. Specific objectives are: Build KSL dataset for spoken English and video recorded Kenyan Sign Language and to build transcriptions of the KSL signs to a phonetic-level interface of the sign language. In this paper, the methodology for building the dataset is described. Data was collected from 48 teachers and tutors of the deaf learners and 400 learners who are Deaf. Participants engaged mainly in sign language elicitation tasks through reading and singing. Findings of the dataset consisted of about 14,000 English sentences with corresponding KSL Gloss derived from a pool of about 4000 words and about 20,000 signed KSL videos that are either signed words or sentences. The second level of data outcomes consisted of 10,000 split and segmented KSL videos. The third outcome of the dataset consists of 4,000 transcribed words into five articulatory parameters according to HamNoSys system.

artificial intelligence, machine translation, natural language, (13 more...)

2410.18295

Country:

Asia > Pakistan (0.04)
North America > United States > Hawaii (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(8 more...)

Genre:

Overview (0.68)
Research Report (0.64)
Instructional Material (0.46)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)