Education
Google Translate is now better at translating slang terms and idioms using AI
Google has also introduced a new speech-to-speech translation feature for headphones. Google is rolling out new Gemini-assisted functionality to Search and its Translate app. It says its AI can now provide more natural and accurate text translations for phrases that have more nuanced meanings. Translate will now take slang terms and colloquial expressions into consideration rather than provide sometimes unhelpful direct translations. The latest update to its text translation feature is rolling out first in the US and India, translating between English and just under 20 other languages, including German, Spanish, Chinese and Arabic.
Robot Talk Episode 137 – Getting two-legged robots moving, with Oluwami Dosunmu-Ogunbi
Claire chatted to Oluwami Dosunmu-Ogunbi from Ohio Northern University about bipedal robots that can walk and even climb stairs. Oluwami Dosunmu-Ogunbi (Wami) is an Assistant Professor in the Mechanical Engineering Department at Ohio Northern University. Her research focuses on controls with applications in bipedal locomotion and engineering education. She is the first Black woman to receive a PhD in Robotics at the University of Michigan. During her Ph.D., she developed the Biped Bootcamp technical document, which she is transforming into an undergraduate curriculum --introducing students to bipedal robotics while providing advanced coursework for juniors and seniors.
Supervised Learning of Random Neural Architectures Structured by Latent Random Fields on Compact Boundaryless Multiply-Connected Manifolds
This paper introduces a new probabilistic framework for supervised learning in neural systems. It is designed to model complex, uncertain systems whose random outputs are strongly non-Gaussian given deterministic inputs. The architecture itself is a random object stochastically generated by a latent anisotropic Gaussian random field defined on a compact, boundaryless, multiply-connected manifold. The goal is to establish a novel conceptual and mathematical framework in which neural architectures are realizations of a geometry-aware, field-driven generative process. Both the neural topology and synaptic weights emerge jointly from a latent random field. A reduced-order parameterization governs the spatial intensity of an inhomogeneous Poisson process on the manifold, from which neuron locations are sampled. Input and output neurons are identified via extremal evaluations of the latent field, while connectivity is established through geodesic proximity and local field affinity. Synaptic weights are conditionally sampled from the field realization, inducing stochastic output responses even for deterministic inputs. To ensure scalability, the architecture is sparsified via percentile-based diffusion masking, yielding geometry-aware sparse connectivity without ad hoc structural assumptions. Supervised learning is formulated as inference on the generative hyperparameters of the latent field, using a negative log-likelihood loss estimated through Monte Carlo sampling from single-observation-per-input datasets. The paper initiates a mathematical analysis of the model, establishing foundational properties such as well-posedness, measurability, and a preliminary analysis of the expressive variability of the induced stochastic mappings, which support its internal coherence and lay the groundwork for a broader theory of geometry-driven stochastic learning.
Defining the Scope of Learning Analytics: An Axiomatic Approach for Analytic Practice and Measurable Learning Phenomena
Takii, Kensuke, Liang, Changhao, Ogata, Hiroaki
Learning Analytics (LA) has rapidly expanded through practical and technological innovation, yet its foundational identity has remained theoretically under-specified. This paper addresses this gap by proposing the first axiomatic theory that formally defines the essential structure, scope, and limitations of LA. Derived from the psychological definition of learning and the methodological requirements of LA, the framework consists of five axioms specifying discrete observation, experience construction, state transition, and inference. From these axioms, we derive a set of theorems and propositions that clarify the epistemological stance of LA, including the inherent unobservability of learner states, the irreducibility of temporal order, constraints on reachable states, and the impossibility of deterministically predicting future learning. We further define LA structure and LA practice as formal objects, demonstrating the sufficiency and necessity of the axioms and showing that diverse LA approaches -- such as Bayesian Knowledge Tracing and dashboards -- can be uniformly explained within this framework. The theory provides guiding principles for designing analytic methods and interpreting learning data while avoiding naive behaviorism and category errors by establishing an explicit theoretical inference layer between observations and states. This work positions LA as a rigorous science of state transition systems based on observability, establishing the theoretical foundation necessary for the field's maturation as a scholarly discipline.
Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length
Xu, Zhiyu, Liu, Jia, Wang, Yixin, Gu, Yuqi
The proliferation of Large Language Models (LLMs) necessitates valid evaluation methods to guide downstream applications and actionable future improvements. The Item Response Theory (IRT) has recently emerged as a promising framework for evaluating LLMs via their response accuracy. Beyond simple response accuracy, LLMs' chain of thought (CoT) lengths serve as a vital indicator of their reasoning ability. To leverage the CoT length information to assist the evaluation of LLMs, we propose Latency-Response Theory (LaRT) to jointly model the response accuracy and CoT length by introducing the latent ability, latent speed, and a key correlation parameter between them. We derive an efficient estimation algorithm and establish rigorous identifiability results for the population parameters to ensure the statistical validity of estimation. Theoretical asymptotic analyses and simulation studies demonstrate LaRT's advantages over IRT in terms of higher estimation accuracy and shorter confidence intervals for latent traits. A key finding is that the asymptotic estimation precision of the latent ability under LaRT exceeds that of IRT whenever the latent ability and latent speed are correlated. We collect real responses from diverse LLMs on popular benchmark datasets. The application of LaRT reveals a strong negative correlation between the latent ability and latent speed in all benchmarks, with stronger correlation for more difficult benchmarks. This finding supports the intuition that higher reasoning ability correlates with slower speed and longer response latency. LaRT yields different LLM rankings than IRT and outperforms IRT across multiple key evaluation metrics including predictive power, item efficiency, ranking validity, and LLM evaluation efficiency. Code and data are available at https://github.com/Toby-X/Latency-Response-Theory-Model.
HunyuanOCR Technical Report
Hunyuan Vision Team, null, Lyu, Pengyuan, Wan, Xingyu, Li, Gengluo, Peng, Shangpin, Wang, Weinong, Wu, Liang, Shen, Huawen, Zhou, Yu, Tang, Canhui, Yang, Qi, Peng, Qiming, Luo, Bin, Yang, Hower, Zhang, Xinsong, Zhang, Jinnian, Peng, Houwen, Yang, Hongming, Xie, Senhao, Zhou, Longsha, Pei, Ge, Wu, Binghong, Yan, Rui, Wu, Kan, Yang, Jieneng, Wang, Bochao, Liu, Kai, Zhu, Jianchen, Jiang, Jie, Linus, null, Hu, Han, Zhang, Chengquan
This paper presents HunyuanOCR, a commercial-grade, open-source, and lightweight (1B parameters) Vision-Language Model (VLM) dedicated to OCR tasks. The architecture comprises a Native Vision Transformer (ViT) and a lightweight LLM connected via an MLP adapter. HunyuanOCR demonstrates superior performance, outperforming commercial APIs, traditional pipelines, and larger models (e.g., Qwen3-VL-4B). Specifically, it surpasses current public solutions in perception tasks (Text Spotting, Parsing) and excels in semantic tasks (IE, Text Image Translation), securing first place in the ICDAR 2025 DIMT Challenge (Small Model Track). Furthermore, it achieves state-of-the-art (SOTA) results on OCRBench among VLMs with fewer than 3B parameters. HunyuanOCR achieves breakthroughs in three key aspects: 1) Unifying Versatility and Efficiency: We implement comprehensive support for core capabilities including spotting, parsing, IE, VQA, and translation within a lightweight framework. This addresses the limitations of narrow "OCR expert models" and inefficient "General VLMs". 2) Streamlined End-to-End Architecture: Adopting a pure end-to-end paradigm eliminates dependencies on pre-processing modules (e.g., layout analysis). This fundamentally resolves error propagation common in traditional pipelines and simplifies system deployment. 3) Data-Driven and RL Strategies: We confirm the critical role of high-quality data and, for the first time in the industry, demonstrate that Reinforcement Learning (RL) strategies yield significant performance gains in OCR tasks. HunyuanOCR is officially open-sourced on HuggingFace. We also provide a high-performance deployment solution based on vLLM, placing its production efficiency in the top tier. We hope this model will advance frontier research and provide a solid foundation for industrial applications.
SCALE: Upscaled Continual Learning of Large Language Models
Lee, Jin-woo, Choi, Junhwa, Hwang, Bongkyu, Choo, Jinho, Kim, Bogun, Yi, JeongSeon, Lee, Joonseok, Jung, DongYoung, Park, Jaeseon, Park, Kyoungwon, Jung, Suk-hoon
We revisit continual pre-training for large language models and argue that progress now depends more on scaling the right structure than on scaling parameters alone. We introduce SCALE, a width upscaling architecture that inserts lightweight expansion into linear modules while freezing all pre-trained parameters. This preserves the residual and attention topologies and increases capacity without perturbing the base model's original functionality. SCALE is guided by two principles: Persistent Preservation, which maintains the base model's behavior via preservation-oriented initialization and freezing of the pre-trained weights, and Collaborative Adaptation, which selectively trains a subset of expansion components to acquire new knowledge with minimal interference. We instantiate these ideas as SCALE-Preserve (preservation-first), SCALE-Adapt (adaptation-first), and SCALE-Route, an optional routing extension that performs token-level routing between preservation and adaptation heads. On a controlled synthetic biography benchmark, SCALE mitigates the severe forgetting observed with depth expansion while still acquiring new knowledge. In continual pre-training on a Korean corpus, SCALE variants achieve less forgetting on English evaluations and competitive gains on Korean benchmarks, with these variants offering the best overall stability-plasticity trade-off. Accompanying analysis clarifies when preservation provably holds and why the interplay between preservation and adaptation stabilizes optimization compared to standard continual learning setups.
AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library
Kong, Minwei, Qu, Ao, Guo, Xiaotong, Ouyang, Wenbin, Jiang, Chonghe, Zheng, Han, Ma, Yining, Zhuang, Dingyi, Tang, Yuhan, Li, Junyi, Wang, Shenhao, Koutsopoulos, Haris, Wang, Hai, Wu, Cathy, Zhao, Jinhua
Optimization modeling enables critical decisions across industries but remains difficult to automate: informal language must be mapped to precise mathematical formulations and executable solver code. Prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present AlphaOPT, a self-improving experience library that enables an LLM to learn from limited demonstrations (even answers alone, without gold-standard programs) and solver feedback - without annotated reasoning traces or parameter updates. AlphaOPT operates in a continual two-phase cycle: (i) a Library Learning phase that reflects on failed attempts, extracting solver-verified, structured insights as {taxonomy, condition, explanation, example}; and (ii) a Library Evolution phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65% to 72% from 100 to 300 training items) and surpasses the strongest baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only on answers. Code and data are available at: https://github.com/Minw913/AlphaOPT.
Human or AI? Comparing Design Thinking Assessments by Teaching Assistants and Bots
Khan, Sumbul, Liow, Wei Ting, Ang, Lay Kee
ORCID: 0000 -0003-2811-1194 Abstract --As design thinking education is growing in secondary and tertiary education, educators face a mounting challenge of evaluating creative artefacts that comprise visual and textual elements. Traditional, rubric-based methods of assessment are laborious, time-consuming, and inconsistent, due to their reliance on Teaching Assistants (TAs) in large, multi - section cohorts. This paper presents an exploratory study to investigate the reliability and perceived accuracy of AI -assisted assessment vis -à -vis TA-assisted assessment in evaluating student posters in design thinking education. Two activities were conducted with 33 Ministry of Education (MOE), Singapore school teachers, with the objective (1) to compare AI -generated scores with TA grading across three key dimensions: empathy and user understanding, identification of pain points and opportunities, and visual communication, and (2) to understand teacher preferences for AI-assigned, TA-assigned, and hybrid scores. Results showed low statistical agreement between instructor and AI scores for empathy and pain points, though slightly higher alignment for visual communication. Teachers generally preferred TA -assigned scores in six of ten samples. Qualitative feedback highlighted AI's potential for formative feedback, consistency, and student self -reflection, but raised concerns about its limitations in capturing contextual nuance and creative insight. The study underscores the need for hybrid assessment models that integrate computational efficiency with human insights . This research contributes to the evolving conversation around responsible AI adoption in creative disciplines, emphasizing the balance between automation and human judgment for scalable and pedagogically sound assessment practices. Design thinking is a human-centered approach to innovation that draws from the designer's toolkit to integrate the needs of people, the possibilities of technology, and the requirements for business success. It is a non - linear, iterative process that teams use to understand users, challenge assumptions, redefine problems, and create innovative solutions to prototype and test.
AI-Specific Code Smells: From Specification to Detection
Mahmoudi, Brahim, Moha, Naouel, Stiévenart, Quentin, Avellaneda, Florent
The rise of Artificial Intelligence (AI) is reshaping how software systems are developed and maintained. However, AI-based systems give rise to new software issues that existing detection tools often miss. Among these, we focus on AI-specific code smells, recurring patterns in the code that may indicate deeper problems such as unreproducibility, silent failures, or poor model generalization. We introduce SpecDetect4AI, a tool-based approach for the specification and detection of these code smells at scale. This approach combines a high-level declarative Domain-Specific Language (DSL) for rule specification with an extensible static analysis tool that interprets and detects these rules for AI-based systems. We specified 22 AI-specific code smells and evaluated SpecDetect4AI on 826 AI-based systems (20M lines of code), achieving a precision of 88.66% and a recall of 88.89%, outperforming other existing detection tools. Our results show that SpecDetect4AI supports the specification and detection of AI-specific code smells through dedicated rules and can effectively analyze large AI-based systems, demonstrating both efficiency and extensibility (SUS 81.7/100).