Goto

Collaborating Authors

 Rail


Japanese railway firms adopt AI safety systems at crossings

The Japan Times

An artificial intelligence-equipped camera system shows a person trapped inside a railway crossing gate during a test by Kintetsu Railway. A growing number of Japanese railway operators are introducing artificial intelligence (AI)-based systems to help prevent accidents involving trains at level crossings. The technology can automatically detect and report abnormalities, such as stalled vehicles or people trapped on the tracks, enabling train drivers and other railway staff to respond more quickly. Industry officials view AI as an effective tool for improving crossing safety, while the government has started offering financial support to encourage wider adoption. Kintetsu Railway, based in the city of Osaka, has tested an AI-equipped camera system at a crossing on the Kyoto Line in the town of Seika, Kyoto Prefecture.


On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models

Neural Information Processing Systems

Large vision-language models (LVLMs), which integrate a vision encoder (VE) with a large language model, have achieved remarkable success across various tasks. However, there are still crucial challenges in LVLMs such as object hallucination, generating descriptions of objects that are not in the input image. Here, we argue that uncertain visual tokens within the VE is a key factor that contributes to object hallucination. Our statistical analysis found that there are positive correlations between visual tokens with high epistemic uncertainty and the occurrence of hallucinations. Furthermore, we show theoretically and empirically that visual tokens in early VE layers that exhibit large representation deviations under small adversarial perturbations indicate high epistemic uncertainty. Based on these findings, we propose a simple yet effective strategy to mitigate object hallucination by modifying the VE only. Our method comprises a proxy method with adversarial perturbations for identifying uncertain visual tokens efficiently and a method to mask these uncertain visual tokens during the self-attention process in the middle layers of the VE, suppressing their influence on visual encoding and thus alleviating hallucinations. Extensive experiments show that our method significantly reduces object hallucinations in LVLMs and can synergistically work with other prior arts.


Video shows scene of Bedford train crash as passenger describes aftermath

BBC News

Emergency services are at the scene of a collision involving two trains in the Bedford area, British Transport Police has confirmed. Operator East Midlands Railway has said two of its trains were involved in the crash. Footage taken from the scene shows where the two trains collided and passengers who appear to have been evacuated. Speaking to the BBC, passenger Pete Knapp said the crash felt like [he'd] been in a bomb explosion. The designer behind DR Congo's World Cup suit: 'I wanted to change people's views on Africa' Alvin Junior Mak explains the inspiration behind the stylish suits he designed for DR Congo's World Cup team.


Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLMPlanning with Multifaceted Constraints

Neural Information Processing Systems

Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences.


Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era

Neural Information Processing Systems

Visual place recognition (VPR) is typically regarded as a specific image retrieval task, whose core lies in representing images as global descriptors. Over the past decade, dominant VPR methods (e.g., NetVLAD) have followed a paradigm that first extracts the patch features/tokens of the input image using a backbone, and then aggregates these patch features into a global descriptor via an aggregator. This backbone-plus-aggregator paradigm has achieved overwhelming dominance in the CNN era and remains widely used in transformer-based models. In this paper, however, we argue that a dedicated aggregator is not necessary in the transformer era, that is, we can obtain robust global descriptors only with the backbone. Specifically, we introduce some learnable aggregation tokens, which are prepended to the patch tokens before a particular transformer block. All these tokens will be jointly processed and interact globally via the intrinsic self-attention mechanism, implicitly aggregating useful information within the patch tokens to the aggregation tokens.


Reinforcing Image Generation with Collaborative Semantic level and Token level CoT

Neural Information Processing Systems

Recent advancements in large language models have demonstrated how chain-ofthought (CoT) and reinforcement learning (RL) can improve performance. However, applying such reasoning strategies to the visual generation domain remains largely unexplored. In this paper, we present T2I-R1, a novel reasoning-enhanced text-to-image generation model, powered by RL with a bi-level CoT reasoning process. Specifically, we identify two levels of CoT that can be utilized to enhance different stages of generation: (1) the semantic-level CoT for high-level planning of the prompt and (2) the token-level CoT for low-level pixel processing during patch-by-patch generation. To better coordinate these two levels of CoT, we introduce BiCoT-GRPO with an ensemble of generation rewards, which seamlessly optimizes both generated CoTs within the same training step. By applying our reasoning strategies to the baseline model, Janus-Pro, we achieve superior performance with 13% improvement on T2I-CompBench and 19% improvement on the WISE benchmark, even surpassing the state-of-the-art model FLUX.1. All the training code and data are available at https://github.com/CaraJ7/T2I-R1.


Lessons from the Original Tech Bubble

The New Yorker

The boom-and-bust cycle has always been a feature of capitalism, and--capturing as it does the human traits of creativity, hope, greed,, anxiety, and panic--it always will be. Creativity gives rise to technological progress and transformative inventions, which provide a new driving force for the economy and a focal point for investors. Today, we are living through another speculative boom. This time the transformative invention is, of course, A.I., and last week's SpaceX I.P.O. While Elon Musk's creation is an impressive rocket-and-satellite company, the stunning $1.78-trillion valuation of the I.P.O. was largely based on its ambitions to build A.I. data centers in space, which remain largely untested .


ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

Neural Information Processing Systems

Dataset bias, where data points are skewed to certain concepts, is ubiquitous in machine learning datasets. Yet, systematically identifying these biases is challenging without costly, fine-grained attribute annotations. We present ConceptScope, a scalable and automated framework for analyzing visual datasets by discovering and quantifying human-interpretable concepts using Sparse Autoencoders trained on representations from vision foundation models. ConceptScope categorizes concepts into target, context, and bias types based on their semantic relevance and statistical correlation to class labels, enabling class-level dataset characterization, bias identification, and robustness evaluation through concept-based subgrouping.


Can Americans spell the National Spelling Bee's winning words?

BBC News

Can Americans spell the National Spelling Bee's winning words? The BBC challenged Americans to spell words used in the last three Scripps National Spelling Bee competitions. Shrey Parikh, a 14-year-old, won the competition this year after correctly spelling 32 words in a 90-second lighting round tiebreaker. He defeated 12-year-old Ishaan Gupta, who spelled 25 words correctly. Parikh won out against 247 spellers competing in the annual contest, aged between nine and 15, taking home a $52,000 (£39,000) cash prize.


The world's first 'hovertrain' could reach speeds of 270 mph in the 1960s

Popular Science

The world's first'hovertrain' could reach speeds of 270 mph in the 1960s But the futuristic Aérotrain never saw the light of day. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. This cancelled Mongolian postage stamp shows the Aérotrain Orleans, circa 1979. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .