Goto

Collaborating Authors

 preparation


Supplementary Material Information Geometry of the Retinal Representation ManifoldXuehao Ding

Neural Information Processing Systems

Further experimental details are described in Ref. [4]. Each spatiotemporal stimulus spanned over 400 ms corresponding to the retinal integration timescale. Figure 1: (a) The log-likelihood of empirical data for each PMF averaged over cells. Black line is the identity line. The central 20 20 arrays are shown.


Quantum Temporal Convolutional Neural Networks for Cross-Sectional Equity Return Prediction: A Comparative Benchmark Study

Chen, Chi-Sheng, Zhang, Xinyu, Fu, Rong, Xie, Qiuzhe, Zhang, Fan

arXiv.org Artificial Intelligence

Quantum machine learning offers a promising pathway for enhancing stock market prediction, particularly under complex, noisy, and highly dynamic financial environments. However, many classical forecasting models struggle with noisy input, regime shifts, and limited generalization capacity. To address these challenges, we propose a Quantum Temporal Convolutional Neural Network (QTCNN) that combines a classical temporal encoder with parameter-efficient quantum convolution circuits for cross-sectional equity return prediction. The temporal encoder extracts multi-scale patterns from sequential technical indicators, while the quantum processing leverages superposition and entanglement to enhance feature representation and suppress overfitting. We conduct a comprehensive benchmarking study on the JPX Tokyo Stock Exchange dataset and evaluate predictions through long-short portfolio construction using out-of-sample Sharpe ratio as the primary performance metric. QTCNN achieves a Sharpe ratio of 0.538, outperforming the best classical baseline by approximately 72\%. These results highlight the practical potential of quantum-enhanced forecasting model, QTCNN, for robust decision-making in quantitative finance.


GigaBrain-0: A World Model-Powered Vision-Language-Action Model

GigaBrain Team, null, Ye, Angen, Wang, Boyuan, Ni, Chaojun, Huang, Guan, Zhao, Guosheng, Li, Haoyun, Li, Jie, Zhu, Jiagang, Feng, Lv, Li, Peng, Deng, Qiuping, Ouyang, Runqi, Qin, Wenkang, Chen, Xinze, Wang, Xiaofeng, Wang, Yang, Li, Yifan, Li, Yilong, Ding, Yiran, Xu, Yuan, Ye, Yun, Zhou, Yukun, Dong, Zhehao, Wang, Zhenan, Liu, Zhichao, Zhu, Zheng

arXiv.org Artificial Intelligence

Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.


What Shape Is Optimal for Masks in Text Removal?

Nakada, Hyakka, Kubota, Marika

arXiv.org Artificial Intelligence

The advent of generative models has dramatically improved the accuracy of image inpainting. In particular, by removing specific text from document images, reconstructing original images is extremely important for industrial applications. However, most existing methods of text removal focus on deleting simple scene text which appears in images captured by a camera in an outdoor environment. There is little research dedicated to complex and practical images with dense text. Therefore, we created benchmark data for text removal from images including a large amount of text. From the data, we found that text-removal performance becomes vulnerable against mask profile perturbation. Thus, for practical text-removal tasks, precise tuning of the mask shape is essential. This study developed a method to model highly flexible mask profiles and learn their parameters using Bayesian optimization. The resulting profiles were found to be character-wise masks. It was also found that the minimum cover of a text region is not optimal. Our research is expected to pave the way for a user-friendly guideline for manual masking.


IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants

Chavan, Vivek, Imgrund, Yasmina, Dao, Tung, Bai, Sanwantri, Wang, Bosong, Lu, Ze, Heimann, Oliver, Krüger, Jörg

arXiv.org Artificial Intelligence

We introduce IndEgo, a multimodal egocentric and exocentric dataset addressing common industrial tasks, including assembly/disassembly, logistics and organisation, inspection and repair, woodworking, and others. The dataset contains 3,460 egocentric recordings (approximately 197 hours), along with 1,092 exocentric recordings (approximately 97 hours). A key focus of the dataset is collaborative work, where two workers jointly perform cognitively and physically intensive tasks. The egocentric recordings include rich multimodal data and added context via eye gaze, narration, sound, motion, and others. We provide detailed annotations (actions, summaries, mistake annotations, narrations), metadata, processed outputs (eye gaze, hand pose, semi-dense point cloud), and benchmarks on procedural and non-procedural task understanding, Mistake Detection, and reasoning-based Question Answering. Baseline evaluations for Mistake Detection, Question Answering and collaborative task understanding show that the dataset presents a challenge for the state-of-the-art multimodal models. Our dataset is available at: https://huggingface.co/datasets/FraunhoferIPK/IndEgo


A Proof of Lemma 4.4 The first stage of Algorithm 1 outputs an entity level

Neural Information Processing Systems

A.1 The Feldman-Langberg framework We first give the definition of query space and the corresponding coresets. Specifically, if u(x) = 1 for all x X, we use ( X, P,f) for simplicity. Due to the separability of f, we have the following coreset definition. Then by Definition A.2, Lemma 4.4 represents that Now we are ready to give the Feldman-Langberg framework. We also introduce a notion which measures the combinatorial complexity of a query space.


aa1f5f73327ba40d47ebce155e785aaf-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to thank all the reviewers for their thoughtful comments and their enthusiasm for our work. These results are consistent with those of Zoltowski et al. [2020], where they found Laplace EM compared Section 3. Segmenting the continuous latent states for each population (which is equivalent to imposing hard constraints On top of that, the "sticky" parameterization of discrete state transitions reveals which neural populations C. elegans offers an illustrative demonstration of the mp-srSLDS For example, we explore interactions between ganglia in Appendix C. Thanks again for spending the time to provide valuable feedback on our work.



Scalable Quantum State Preparation via Large-Language-Model-Driven Discovery

Cao, Qing-Hong, Hou, Zong-Yue, Li, Ying-Ying, Liu, Xiaohui, Song, Zhuo-Yang, Zhang, Liang-Qi, Zhang, Shutao, Zhao, Ke

arXiv.org Artificial Intelligence

Efficient quantum state preparation remains a central challenge in first-principles quantum simulations of dynamics in quantum field theories, where the Hilbert space is intrinsically infinite-dimensional. Here, we introduce a large language model (LLM)-assisted framework for quantum-circuit design that systematically scales state-preparation circuits to large lattice volumes. Applied to a 1+1d XY spin chain, the LLM autonomously discovers a compact 4-parameter circuit that captures boundary-induced symmetry breaking with sub-percent energy deviation, enabling successful validation on the \texttt{Zuchongzhi} quantum processor. Guided by this insight, we extend the framework to 2+1d quantum field theories, where scalable variational ansätze have remained elusive. For a scalar field theory, the search yields a symmetry-preserving, 3-parameter shallow-depth ansatz whose optimized parameters converge to size-independent constants for lattices $n \ge 4$, providing, to our knowledge, the first scalable ansatz for this class of 2+1d models. Our results establish a practical route toward AI-assisted, human-guided discovery in quantum simulation.


Artificial Intelligence in Elementary STEM Education: A Systematic Review of Current Applications and Future Challenges

Memari, Majid, Ruggles, Krista

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is transforming elementary STEM education, yet evidence remains fragmented. This systematic review synthesizes 258 studies (2020-2025) examining AI applications across eight categories: intelligent tutoring systems (45% of studies), learning analytics (18%), automated assessment (12%), computer vision (8%), educational robotics (7%), multimodal sensing (6%), AI-enhanced extended reality (XR) (4%), and adaptive content generation. The analysis shows that most studies focus on upper elementary grades (65%) and mathematics (38%), with limited cross-disciplinary STEM integration (15%). While conversational AI demonstrates moderate effectiveness (d = 0.45-0.70 where reported), only 34% of studies include standardized effect sizes. Eight major gaps limit real-world impact: fragmented ecosystems, developmental inappropriateness, infrastructure barriers, lack of privacy frameworks, weak STEM integration, equity disparities, teacher marginalization, and narrow assessment scopes. Geographic distribution is also uneven, with 90% of studies originating from North America, East Asia, and Europe. Future directions call for interoperable architectures that support authentic STEM integration, grade-appropriate design, privacy-preserving analytics, and teacher-centered implementations that enhance rather than replace human expertise.