Goto

Collaborating Authors

 maker space


Visual AI and Linguistic Intelligence Through Steerability and Composability

arXiv.org Artificial Intelligence

This study explores the capabilities of multimodal large language models (LLMs) in handling challenging multistep tasks that integrate language and vision, focusing on model steerability, composability, and the application of long-term memory and context understanding. The problem addressed is the LLM's ability (Nov 2023 GPT-4 Vision Preview) to manage tasks that require synthesizing visual and textual information, especially where stepwise instructions and sequential logic are paramount. The research presents a series of 14 creatively and constructively diverse tasks, ranging from AI Lego Designing to AI Satellite Image Analysis, designed to test the limits of current LLMs in contexts that previously proved difficult without extensive memory and contextual understanding. Key findings from evaluating 800 guided dialogs include notable disparities in task completion difficulty. For instance, 'Image to Ingredient AI Bartender' (Low difficulty) contrasted sharply with 'AI Game Self-Player' (High difficulty), highlighting the LLM's varying proficiency in processing complex visual data and generating coherent instructions. Tasks such as 'AI Genetic Programmer' and 'AI Negotiator' showed high completion difficulty, emphasizing challenges in maintaining context over multiple steps. The results underscore the importance of developing LLMs that combine long-term memory and contextual awareness to mimic human-like thought processes in complex problem-solving scenarios.


Maker Spaces, Learning And Reality

#artificialintelligence

How we explain reality to ourselves is a construction with many parts. We gather knowledge and generate meaning through our experiences and traditions; from what we learn in school, at work and at home; from how we witness others explaining reality for themselves (on TV, via social media, etc). This narrative that we tell ourselves everyday throughout our entire lives largely defines who we are and how we approach the world. The first time I ran (in Mexico) an adaptation of Stanford's workshop "Makers in Residence" (an intensive 80 hour program for high schoolers on digital fabrication and design thinking which was designed by the Transformative Learning Technology Lab) I was shocked by the comments of participants regarding their place in relation to technology. Most participants were impressed that they were "smarter" than the computers they programmed; when I asked them more about it I started understanding the new narrative that a generation of kids growing up surrounded by digital technology are developing in their heads.