Goto

Collaborating Authors

 Kantō


RAM-W600: AMulti-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis

Neural Information Processing Systems

Rheumatoid arthritis (RA) is a common autoimmune disease that has been the focus of research in computer-aided diagnosis (CAD) and disease monitoring. In clinical settings, conventional radiography (CR) is widely used for the screening and evaluation of RA due to its low cost and accessibility. The wrist is a critical region for the diagnosis of RA. However, CAD research in this area remains limited, primarily due to the challenges in acquiring high-quality instance-level annotations.


Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update

Neural Information Processing Systems

We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a nonlinear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with O(1)time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.


Gaussian Process Upper Confidence Bound Achieves Nearly-Optimal Regret in Noise-Free Gaussian Process Bandits

Neural Information Processing Systems

We study the noise-free Gaussian Process (GP) bandit problem, in which a learner seeks to minimize regret through noise-free observations of a black-box objective function that lies in a known reproducing kernel Hilbert space (RKHS). The Gaussian Process Upper Confidence Bound (GP-UCB) algorithm is a well-known approach for GP bandits, where query points are adaptively selected based on the GP-based upper confidence bound score. While several existing works have reported the practical success of GP-UCB, its theoretical performance remains suboptimal. However, GP-UCB often empirically outperforms other nearly-optimal noise-free algorithms that use non-adaptive sampling schemes. This paper resolves the gap between theoretical and empirical performance by establishing a nearly-optimal regret upper bound for noise-free GP-UCB. Specifically, our analysis provides the first constant cumulative regret bounds in the noise-free setting for both the squared exponential kernel and the Mat ern kernel with some degree of smoothness.


PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Neural Information Processing Systems

A penguin is standing on the lawn, with a giraffe behind it. A young man stands in front of the Statue of Liberty. A man in a tuxedo stands beside Tokyo Tower. A man is standing next to a traditional Japanese lantern. A woman is looking at a small, fluffy dog.


Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

Neural Information Processing Systems

We study the reinforcement learning (RL) problem in a constrained Markov decision process (CMDP), where an agent explores the environment to maximize the expected cumulative reward while satisfying a single constraint on the expected total utility value in every episode. While this problem is well understood in the tabular setting, theoretical results for function approximation remain scarce. This paper closes the gap by proposing an RL algorithm for linear CMDPs that achieves eO( K) regret with an episode-wise zero-violation guarantee. Furthermore, our method is computationally efficient, scaling polynomially with problem-dependent parameters while remaining independent of the state space size. Our results significantly improve upon recent linear CMDP algorithms, which either violate the constraint or incur exponential computational costs.


Self Iterative Label Refinement via Robust Unlabeled Learning

Neural Information Processing Systems

Recent advances in large language models (LLMs) have yielded impressive performance on various tasks, yet they often depend on high-quality feedback that can be costly. Self-refinement methods attempt to leverage LLMs' internal evaluation mechanisms with minimal human supervision; however, these approaches frequently suffer from inherent biases and overconfidence, especially in domains where the models lack sufficient internal knowledge, resulting in performance degradation. As an initial step toward enhancing self-refinement for broader applications, we introduce an iterative refinement pipeline that employs the Unlabeled-Unlabeled learning framework to improve LLM-generated pseudo-labels for classification tasks.


Bandit and Delayed Feedback in Online Structured Prediction

Neural Information Processing Systems

Online structured prediction is a task of sequentially predicting outputs with complex structures based on inputs and past observations, encompassing online classification. Recent studies showed that in the full-information setting, we can achieve finite bounds on the surrogate regret, i.e., the extra target loss relative to the best possible surrogate loss. In practice, however, full-information feedback is often unrealistic as it requires immediate access to the whole structure of complex outputs. Motivated by this, we propose algorithms that work with less demanding feedback, bandit and delayed feedback. For bandit feedback, by using a standard inverseweighted gradient estimator, we achieve a surrogate regret bound of O( KT) for the time horizon T and the size of the output set K. However, K can be extremely large when outputs are highly complex, resulting in an undesirable bound. To address this issue, we propose another algorithm that achieves a surrogate regret bound of O(T2/3), which is independent of K. This is achieved with a carefully designed pseudo-inverse matrix estimator. Furthermore, we numerically compare the performance of these algorithms, as well as existing ones. Regarding delayed feedback, we provide algorithms and regret analyses that cover various scenarios, including full-information and bandit feedback, as well as fixed and variable delays.


Continuous Thought Machines

Neural Information Processing Systems

Biological brains demonstrate complex neural activity, where neural dynamics are critical to how brains process information. Most artificial neural networks ignore the complexity of individual neurons .


The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

Neural Information Processing Systems

We present The Matrix, a foundational realistic world simulator capable of generating infinitely long 720p high-fidelity real-scene video streams with real-time, responsive control in both first-and third-person perspectives. Trained on limited supervised data from video games like Forza Horizon 5 and Cyberpunk 2077, complemented by large-scale unsupervised footage from real-world settings like Tokyo streets, The Matrix allows users to traverse diverse terrains--deserts, grasslands, water bodies, and urban landscapes--in continuous, uncut hour-long sequences. With speeds of up to 16 FPS, the system supports real-time interactivity and demonstrates zero-shot generalization, translating virtual game environments to real-world contexts where collecting continuous movement data is often infeasible. For example, The Matrix can simulate a BMW X3 driving through an office setting--an environment present in neither gaming data nor real-world sources. This approach showcases the potential of game data to advance robust world models, bridging the gap between simulations and real-world applications in scenarios with limited data.


Teenagers in Tokyo allegedly used ChatGPT to decide extortion amount in assault case

The Japan Times

A group of high school students arrested over allegedly trying to extort money from a boy in western Tokyo may have used ChatGPT to decide how much to demand, police said. A group of high school students in Tokyo arrested over allegedly assaulting a boy and trying to extort money from him may have used ChatGPT to decide how much to demand, media reports have recently revealed. Five teenagers, including a 17-year-old girl and four boys ranging in age from 16 to 17, were arrested in January over the alleged assault and attempted extortion of a 17-year-old high school student in the city of Hachioji in western Tokyo, according to the Metropolitan Police Department. Police said the suspects assaulted the boy in a plaza in Hachioji's Shiroyamate district, breaking his nose and causing other injuries, before allegedly trying to extort ¥150,000 ($935) from him. The girl, who was the victim's ex-girlfriend, allegedly first confronted him, accusing him of touching her younger sister's leg. She then challenged him, saying, "Give me the money or fight me one-on-one," according to reports by Fuji TV.