Instructional Material
Inclusive STEAM Education: A Framework for Teaching Cod-2 ing and Robotics to Students with Visually Impairment Using 3 Advanced Computer Vision
Hamash, Mahmoud, Khan, Md Raqib, Tiernan, Peter
STEAM education integrates Science, Technology, Engineering, Arts, and Mathematics to foster creativity and problem-solving. However, students with visual impairments (VI) encounter significant challenges in programming and robotics, particularly in tracking robot movements and developing spatial awareness. This paper presents a framework that leverages pre-constructed robots and algorithms, such as maze-solving techniques, within an accessible learning environment. The proposed system employs Contrastive Language-Image Pre-training (CLIP) to process global camera-captured maze layouts, converting visual data into textual descriptions that generate spatial audio prompts in an Audio Virtual Reality (AVR) system. Students issue verbal commands, which are refined through CLIP, while robot-mounted stereo cameras provide real-time data processed via Simultaneous Localization and Mapping (SLAM) for continuous feedback. By integrating these technologies, the framework empowers VI students to develop coding skills and engage in complex problem-solving tasks. Beyond maze-solving applications, this approach demonstrates the broader potential of computer vision in special education, contributing to improved accessibility and learning experiences in STEAM disciplines.
LLMs' Reshaping of People, Processes, Products, and Society in Software Development: A Comprehensive Exploration with Early Adopters
Tabarsi, Benyamin, Reichert, Heidi, Limke, Ally, Kuttal, Sandeep, Barnes, Tiffany
Large language models (LLMs) like OpenAI ChatGPT, Google Gemini, and GitHub Copilot are rapidly gaining traction in the software industry, but their full impact on software engineering remains insufficiently explored. Despite their growing adoption, there is a notable lack of formal, qualitative assessments of how LLMs are applied in real-world software development contexts. To fill this gap, we conducted semi-structured interviews with sixteen early-adopter professional developers to explore their use of LLMs throughout various stages of the software development life cycle. Our investigation examines four dimensions: people - how LLMs affect individual developers and teams; process - how LLMs alter software engineering workflows; product - LLM impact on software quality and innovation; and society - the broader socioeconomic and ethical implications of LLM adoption. Thematic analysis of our data reveals that while LLMs have not fundamentally revolutionized the development process, they have substantially enhanced routine coding tasks, including code generation, refactoring, and debugging. Developers reported the most effective outcomes when providing LLMs with clear, well-defined problem statements, indicating that LLMs excel with decomposed problems and specific requirements. Furthermore, these early-adopters identified that LLMs offer significant value for personal and professional development, aiding in learning new languages and concepts. Early-adopters, highly skilled in software engineering and how LLMs work, identified early and persisting challenges for software engineering, such as inaccuracies in generated content and the need for careful manual review before integrating LLM outputs into production environments. Our study provides a nuanced understanding of how LLMs are shaping the landscape of software development, with their benefits, limitations, and ongoing implications.
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Shikhar, Sambal, Kurpath, Mohammed Irfan, Mullappilly, Sahal Shaji, Lahoud, Jean, Khan, Fahad, Anwer, Rao Muhammad, Khan, Salman, Cholakkal, Hisham
Recent advancements in speech-to-speech dialogue systems leverage LLMs for multimodal interactions, yet they remain hindered by fine-tuning requirements, high computational overhead, and text-speech misalignment. Existing speech-enabled LLMs often degrade conversational quality by modifying the LLM, thereby compromising its linguistic capabilities. In contrast, we propose LLMVoX, a lightweight 30M-parameter, LLM-agnostic, autoregressive streaming TTS system that generates high-quality speech with low latency, while fully preserving the capabilities of the base LLM. Our approach achieves a significantly lower Word Error Rate compared to speech-enabled LLMs, while operating at comparable latency and UTMOS score. By decoupling speech synthesis from LLM processing via a multi-queue token streaming system, LLMVoX supports seamless, infinite-length dialogues. Its plug-and-play design also facilitates extension to various tasks with different backbones. Furthermore, LLMVoX generalizes to new languages with only dataset adaptation, attaining a low Character Error Rate on an Arabic speech task. Additionally, we have integrated LLMVoX with a Vision-Language Model to create an omni-model with speech, text, and vision capabilities, without requiring additional multimodal training. Our code base and project page is available at https://mbzuai-oryx.github.io/LLMVoX .
Pretrained Embeddings as a Behavior Specification Mechanism
Kapoor, Parv, Hammer, Abigail, Kapoor, Ashish, Leung, Karen, Kang, Eunsuk
We propose an approach to formally specifying the behavioral properties of systems that rely on a perception model for interactions with the physical world. The key idea is to introduce embeddings -- mathematical representations of a real-world concept -- as a first-class construct in a specification language, where properties are expressed in terms of distances between a pair of ideal and observed embeddings. To realize this approach, we propose a new type of temporal logic called Embedding Temporal Logic (ETL), and describe how it can be used to express a wider range of properties about AI-enabled systems than previously possible. We demonstrate the applicability of ETL through a preliminary evaluation involving planning tasks in robots that are driven by foundation models; the results are promising, showing that embedding-based specifications can be used to steer a system towards desirable behaviors.
Dexterous Hand Manipulation via Efficient Imitation-Bootstrapped Online Reinforcement Learning
Huang, Dongchi, Zhang, Tianle, Li, Yihang, Zhao, Ling, Li, Jiayi, Fang, Zhirui, Xia, Chunhe, Li, Lusong, He, Xiaodong
Dexterous hand manipulation in real-world scenarios presents considerable challenges due to its demands for both dexterity and precision. While imitation learning approaches have thoroughly examined these challenges, they still require a significant number of expert demonstrations and are limited by a constrained performance upper bound. In this paper, we propose a novel and efficient Imitation-Bootstrapped Online Reinforcement Learning (IBORL) method tailored for robotic dexterous hand manipulation in real-world environments. Specifically, we pretrain the policy using a limited set of expert demonstrations and subsequently finetune this policy through direct reinforcement learning in the real world. To address the catastrophic forgetting issues that arise from the distribution shift between expert demonstrations and real-world environments, we design a regularization term that balances the exploration of novel behaviors with the preservation of the pretrained policy. Our experiments with real-world tasks demonstrate that our method significantly outperforms existing approaches, achieving an almost 100% success rate and a 23% improvement in cycle time. Furthermore, by finetuning with online reinforcement learning, our method surpasses expert demonstrations and uncovers superior policies. Our code and empirical results are available in https://hggforget.github.io/iborl.github.io/.
CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
Zhang, Arthur, Sikchi, Harshit, Zhang, Amy, Biswas, Joydeep
We address the long-horizon mapless navigation problem: enabling robots to traverse novel environments without relying on high-definition maps or precise waypoints that specify exactly where to navigate. Achieving this requires overcoming two major challenges -- learning robust, generalizable perceptual representations of the environment without pre-enumerating all possible navigation factors and forms of perceptual aliasing and utilizing these learned representations to plan human-aligned navigation paths. Existing solutions struggle to generalize due to their reliance on hand-curated object lists that overlook unforeseen factors, end-to-end learning of navigation features from scarce large-scale robot datasets, and handcrafted reward functions that scale poorly to diverse scenarios. To overcome these limitations, we propose CREStE, the first method that learns representations and rewards for addressing the full mapless navigation problem without relying on large-scale robot datasets or manually curated features. CREStE leverages visual foundation models trained on internet-scale data to learn continuous bird's-eye-view representations capturing elevation, semantics, and instance-level features. To utilize learned representations for planning, we propose a counterfactual-based loss and active learning procedure that focuses on the most salient perceptual cues by querying humans for counterfactual trajectory annotations in challenging scenes. We evaluate CREStE in kilometer-scale navigation tasks across six distinct urban environments. CREStE significantly outperforms all state-of-the-art approaches with 70% fewer human interventions per mission, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. For videos and additional materials, see https://amrl.cs.utexas.edu/creste .
Neural Models of Task Adaptation: A Tutorial on Spiking Networks for Executive Control
Kannan, Ashwin Viswanathan, Ganesan, Madhumitha
The ability to adapt and switch between tasks is a fundamental Empirical studies further established the prefrontal cortex aspect of cognitive flexibility, shaping decision-making (PFC) as a key region in task-switching, with experiments such and behavioral efficiency in dynamic environments. Taskswitching as the Wisconsin Card Sorting Test (WCST) demonstrating its has been widely studied across disciplines such as role in adaptive behavior [14]-[16]. Spiking Neural Networks psychology, cognitive neuroscience, and artificial intelligence (SNNs) have emerged as a biologically realistic approach to [1], [2]. While humans often shift between tasks seamlessly, modeling neural dynamics, particularly due to their ability to performance variations arise depending on prior experience, replicate synaptic plasticity mechanisms such as Spike Timing-task familiarity, and cognitive load. Understanding these processes Dependent Plasticity (STDP) [10], [17]. Prior studies have requires computational models that can capture the successfully applied SNNs to pattern recognition and classification underlying neural mechanisms driving adaptive control and tasks [18] and have modeled sensory processing systems decision-making. Empirical studies have identified increased like the mammalian olfactory system [19]. These findings neural activity in the cognitive control network, particularly in establish a computational foundation for implementing taskswitching the prefrontal cortex (PFC), when engaging in task-switching models with biologically grounded learning dynamics.
Quantum Non-Linear Bandit Optimization
Siam, Zakaria Shams, Guan, Chaowen, Liu, Chong
We study non-linear bandit optimization where the learner maximizes a black-box function with zeroth order function oracle, which has been successfully applied in many critical applications such as drug discovery and hyperparameter tuning. Existing works have showed that with the aid of quantum computing, it is possible to break the $\Omega(\sqrt{T})$ regret lower bound in classical settings and achieve the new $O(\mathrm{poly}\log T)$ upper bound. However, they usually assume that the objective function sits within the reproducing kernel Hilbert space and their algorithms suffer from the curse of dimensionality. In this paper, we propose the new Q-NLB-UCB algorithm which uses the novel parametric function approximation technique and enjoys performance improvement due to quantum fast-forward and quantum Monte Carlo mean estimation. We prove that the regret bound of Q-NLB-UCB is not only $O(\mathrm{poly}\log T)$ but also input dimension-free, making it applicable for high-dimensional tasks. At the heart of our analyses are a new quantum regression oracle and a careful construction of parameter uncertainty region. Our algorithm is also validated for its efficiency on both synthetic and real-world tasks.
AI Literacy in K-12 and Higher Education in the Wake of Generative AI: An Integrative Review
Gu, Xingjian, Ericson, Barbara J.
Accordingly, education researchers and practitioners have increasingly turned to AI literacy as an important learning objective. However, the definition of AI literacy remains vague. Researchers have used the term to describe learning interventions that differ by in school contexts, learning objectives, and types of AI technologies they use. Furthermore, the research of AI literacy is shifting significantly in the wake of generative AI. Thus, it is crucial to review the field and develop a conceptual framework that captures the diverse conceptualizations of AI literacy. The concept of AI literacy and recognition of its potential significance are well-established [75, 127]. One of the pioneering works by Touretzky et al. in 2019 laid out "five big ideas" for the AI4K12 initiative: "computers perceive the world using sensors", "agents maintain models/representations of the world and use them for reasoning", "computers can learn from data", "making agents interact with humans is a substantial challenge for AI developers", and "AI applications can impact society in both positive and negative ways" [127]. This paper had a major influence on subsequent AI literacy curriculum design. The next year, another prominent work by Long and Magerko defined AI literacy as "a set
Active Robot Curriculum Learning from Online Human Demonstrations
Hou, Muhan, Hindriks, Koen, Eiben, A. E., Baraka, Kim
Learning from Demonstrations (LfD) allows robots to learn skills from human users, but its effectiveness can suffer due to sub-optimal teaching, especially from untrained demonstrators. Active LfD aims to improve this by letting robots actively request demonstrations to enhance learning. However, this may lead to frequent context switches between various task situations, increasing the human cognitive load and introducing errors to demonstrations. Moreover, few prior studies in active LfD have examined how these active query strategies may impact human teaching in aspects beyond user experience, which can be crucial for developing algorithms that benefit both robot learning and human teaching. To tackle these challenges, we propose an active LfD method that optimizes the query sequence of online human demonstrations via Curriculum Learning (CL), where demonstrators are guided to provide demonstrations in situations of gradually increasing difficulty. We evaluate our method across four simulated robotic tasks with sparse rewards and conduct a user study (N=26) to investigate the influence of active LfD methods on human teaching regarding teaching performance, post-guidance teaching adaptivity, and teaching transferability. Our results show that our method significantly improves learning performance compared to three other LfD baselines in terms of the final success rate of the converged policy and sample efficiency. Additionally, results from our user study indicate that our method significantly reduces the time required from human demonstrators and decreases failed demonstration attempts. It also enhances post-guidance human teaching in both seen and unseen scenarios compared to another active LfD baseline, indicating enhanced teaching performance, greater post-guidance teaching adaptivity, and better teaching transferability achieved by our method.