preparation
Supplementary Material Information Geometry of the Retinal Representation ManifoldXuehao Ding
Further experimental details are described in Ref. [4]. Each spatiotemporal stimulus spanned over 400 ms corresponding to the retinal integration timescale. Figure 1: (a) The log-likelihood of empirical data for each PMF averaged over cells. Black line is the identity line. The central 20 20 arrays are shown.
Don't PourCerealintoCoffee: Differentiable TemporalLogicforTemporalActionSegmentation
We propose Differentiable Temporal Logic (DTL), a model-agnostic framework that introduces temporal constraints to deep networks. DTL treats the outputs of a network as a truth assignment of a temporal logic formula, and computes a temporal logic loss reflecting the consistency between the output and the constraints.
Quantum Temporal Convolutional Neural Networks for Cross-Sectional Equity Return Prediction: A Comparative Benchmark Study
Chen, Chi-Sheng, Zhang, Xinyu, Fu, Rong, Xie, Qiuzhe, Zhang, Fan
Quantum machine learning offers a promising pathway for enhancing stock market prediction, particularly under complex, noisy, and highly dynamic financial environments. However, many classical forecasting models struggle with noisy input, regime shifts, and limited generalization capacity. To address these challenges, we propose a Quantum Temporal Convolutional Neural Network (QTCNN) that combines a classical temporal encoder with parameter-efficient quantum convolution circuits for cross-sectional equity return prediction. The temporal encoder extracts multi-scale patterns from sequential technical indicators, while the quantum processing leverages superposition and entanglement to enhance feature representation and suppress overfitting. We conduct a comprehensive benchmarking study on the JPX Tokyo Stock Exchange dataset and evaluate predictions through long-short portfolio construction using out-of-sample Sharpe ratio as the primary performance metric. QTCNN achieves a Sharpe ratio of 0.538, outperforming the best classical baseline by approximately 72\%. These results highlight the practical potential of quantum-enhanced forecasting model, QTCNN, for robust decision-making in quantitative finance.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.25)
- Asia > Taiwan (0.05)
- North America > United States > Idaho > Ada County > Boise (0.04)
- (12 more...)
- Health & Medicine (1.00)
- Banking & Finance > Trading (1.00)
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
GigaBrain Team, null, Ye, Angen, Wang, Boyuan, Ni, Chaojun, Huang, Guan, Zhao, Guosheng, Li, Haoyun, Li, Jie, Zhu, Jiagang, Feng, Lv, Li, Peng, Deng, Qiuping, Ouyang, Runqi, Qin, Wenkang, Chen, Xinze, Wang, Xiaofeng, Wang, Yang, Li, Yifan, Li, Yilong, Ding, Yiran, Xu, Yuan, Ye, Yun, Zhou, Yukun, Dong, Zhehao, Wang, Zhenan, Liu, Zhichao, Zhu, Zheng
Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.
What Shape Is Optimal for Masks in Text Removal?
Nakada, Hyakka, Kubota, Marika
The advent of generative models has dramatically improved the accuracy of image inpainting. In particular, by removing specific text from document images, reconstructing original images is extremely important for industrial applications. However, most existing methods of text removal focus on deleting simple scene text which appears in images captured by a camera in an outdoor environment. There is little research dedicated to complex and practical images with dense text. Therefore, we created benchmark data for text removal from images including a large amount of text. From the data, we found that text-removal performance becomes vulnerable against mask profile perturbation. Thus, for practical text-removal tasks, precise tuning of the mask shape is essential. This study developed a method to model highly flexible mask profiles and learn their parameters using Bayesian optimization. The resulting profiles were found to be character-wise masks. It was also found that the minimum cover of a text region is not optimal. Our research is expected to pave the way for a user-friendly guideline for manual masking.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (12 more...)
IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants
Chavan, Vivek, Imgrund, Yasmina, Dao, Tung, Bai, Sanwantri, Wang, Bosong, Lu, Ze, Heimann, Oliver, Krüger, Jörg
We introduce IndEgo, a multimodal egocentric and exocentric dataset addressing common industrial tasks, including assembly/disassembly, logistics and organisation, inspection and repair, woodworking, and others. The dataset contains 3,460 egocentric recordings (approximately 197 hours), along with 1,092 exocentric recordings (approximately 97 hours). A key focus of the dataset is collaborative work, where two workers jointly perform cognitively and physically intensive tasks. The egocentric recordings include rich multimodal data and added context via eye gaze, narration, sound, motion, and others. We provide detailed annotations (actions, summaries, mistake annotations, narrations), metadata, processed outputs (eye gaze, hand pose, semi-dense point cloud), and benchmarks on procedural and non-procedural task understanding, Mistake Detection, and reasoning-based Question Answering. Baseline evaluations for Mistake Detection, Question Answering and collaborative task understanding show that the dataset presents a challenge for the state-of-the-art multimodal models. Our dataset is available at: https://huggingface.co/datasets/FraunhoferIPK/IndEgo
- North America > United States > Kentucky > Butler County (0.04)
- Europe > Switzerland (0.04)
- Europe > Monaco (0.04)
- (2 more...)
- Workflow (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Information Technology > Security & Privacy (0.67)
- Health & Medicine (0.67)
aa1f5f73327ba40d47ebce155e785aaf-AuthorFeedback.pdf
We would like to thank all the reviewers for their thoughtful comments and their enthusiasm for our work. These results are consistent with those of Zoltowski et al. [2020], where they found Laplace EM compared Section 3. Segmenting the continuous latent states for each population (which is equivalent to imposing hard constraints On top of that, the "sticky" parameterization of discrete state transitions reveals which neural populations C. elegans offers an illustrative demonstration of the mp-srSLDS For example, we explore interactions between ganglia in Appendix C. Thanks again for spending the time to provide valuable feedback on our work.