toma
ToMA: Token Merge with Attention for Diffusion Models
Lu, Wenbo, Zheng, Shaoyi, Xia, Yuxuan, Wang, Shengjie
Diffusion models excel in high-fidelity image generation but face scalability limits due to transformers' quadratic attention complexity. Plug-and-play token reduction methods like ToMeSD and ToFu reduce FLOPs by merging redundant tokens in generated images but rely on GPU-inefficient operations (e.g., sorting, scattered writes), introducing overheads that negate theoretical speedups when paired with optimized attention implementations (e.g., FlashAttention). To bridge this gap, we propose Token Merge with Attention (ToMA), an off-the-shelf method that redesigns token reduction for GPU-aligned efficiency, with three key contributions: 1) a reformulation of token merge as a submodular optimization problem to select diverse tokens; 2) merge/unmerge as an attention-like linear transformation via GPU-friendly matrix operations; and 3) exploiting latent locality and sequential redundancy (pattern reuse) to minimize overhead. ToMA reduces SDXL/Flux generation latency by 24%/23%, respectively (with DINO $Δ< 0.07$), outperforming prior methods. This work bridges the gap between theoretical and practical efficiency for transformers in diffusion. Code available at https://github.com/WenboLuu/ToMA.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Using cognitive psychology to understand GPT-3
We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: it solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multi-armed bandit task, and shows signatures of model-based reinforcement learning. Yet we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. These results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
In-Depth Analysis of Identification Documents Through Vision AI
After many years abroad, Juan finally had a good reason to go back home to Iloilo in the Philippines. His parents and wife are in the vibrant festival city, and staying with them is his only son Kiko. The COVID-19 pandemic has hit his workplace, and the country that has been hosting him for work has mandated that all foreigners be sent home and be on a work-from-home model. This was pronounced by the said country's government in the hope that it would help alleviate its already overburdened healthcare system. Three days before Juan flew back to the Philippines, Juan diligently filled-up what is called an Electronic Case Investigation Form, otherwise known as e-CIF.
- Health & Medicine > Therapeutic Area (0.76)
- Transportation > Air (0.49)
TOMA: Topological Map Abstraction for Reinforcement Learning
Animals are able to discover the topological map (graph) of surrounding environment, which will be used for navigation. Inspired by this biological phenomenon, researchers have recently proposed to generate graph representation for Markov decision process (MDP) and use such graphs for planning in reinforcement learning (RL). However, existing graph generation methods suffer from many drawbacks. One drawback is that existing methods do not learn an abstraction for graphs, which results in high memory and computation cost. This drawback also makes generated graph non-robust, which degrades the planning performance. Another drawback is that existing methods cannot be used for facilitating exploration which is important in RL. In this paper, we propose a new method, called topological map abstraction (TOMA), for graph generation. TOMA can generate an abstract graph representation for MDP, which costs much less memory and computation cost than existing methods. Furthermore, TOMA can be used for facilitating exploration. In particular, we propose planning to explore, in which TOMA is used to accelerate exploration by guiding the agent towards unexplored states. A novel experience replay module called vertex memory is also proposed to improve exploration performance. Experimental results show that TOMA can outperform existing methods to achieve the state-of-the-art performance.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
The art and science of Japan's cherry blossom forecast
As spring approaches in Japan, the country's weather forecasters face one of their biggest missions of the year: predicting exactly when the famed cherry blossoms will bloom. Japan's sakura (cherry blossom) season is feverishly anticipated by locals and visitors alike. Many tourists plan their entire trips around the blooms, and Japanese flock to parks in their millions to enjoy the seasonal spectacle. "People pay more attention to the cherry blossom season than any other flower in Japan," said Ryo Dojo, an official of the statistics unit at the Meteorological Agency. The most basic element of predicting when the delicate pink and white petals will begin to unfurl is a large data set of temperatures.
Want Your Own Personal AI? Meet Hu:toma - Barcinno
One of the startups that impressed a lot of people at this years 4YFN was Hu:toma, who builds emotionally evolved artificial intelligence for both personal and business use. I don't think I saw any stands at the event as crowded as Hu:toma's, but luckily one of the two Italian brothers and co-founders Andrea Cibelli found time to chat with Barcinno. One of the really cool features with Hu:toma's AI is that you can teach and train it by feeding it examples They have also implemented an internal mechanism to simulate emotional states which is supposed to make the AI feel more natural. Other competing companyies is more focused on manually feeding the AI to create a language, and not automating it through the machine itself. Obviously co-founder Cibelli, believes that AI will become a huge part of our lives, only few years into the future.