titan
Titans Revisited: A Lightweight Reimplementation and Critical Analysis of a Test-Time Memory Model
Di Nepi, Gavriel, Siciliano, Federico, Silvestri, Fabrizio
By the end of 2024, Google researchers introduced Titans: Learning at Test Time, a neural memory model achieving strong empirical results across multiple tasks. However, the lack of publicly available code and ambiguities in the original description hinder reproducibility. In this work, we present a lightweight reimplementation of Titans and conduct a comprehensive evaluation on Masked Language Modeling, Time Series Forecasting, and Recommendation tasks. Our results reveal that Titans does not always outperform established baselines due to chunking. However, its Neural Memory component consistently improves performance compared to attention-only models. These findings confirm the model's innovative potential while highlighting its practical limitations and raising questions for future research.
TITAN: A Trajectory-Informed Technique for Adaptive Parameter Freezing in Large-Scale VQE
Peng, Yifeng, Li, Xinyi, Chen, Samuel Yen-Chi, Zhang, Kaining, Liang, Zhiding, Wang, Ying, Du, Yuxuan
Variational quantum Eigensolver (VQE) is a leading candidate for harnessing quantum computers to advance quantum chemistry and materials simulations, yet its training efficiency deteriorates rapidly for large Hamiltonians. Two issues underlie this bottleneck: (i) the no-cloning theorem imposes a linear growth in circuit evaluations with the number of parameters per gradient step; and (ii) deeper circuits encounter barren plateaus (BPs), leading to exponentially increasing measurement overheads. To address these challenges, here we propose a deep learning framework, dubbed Titan, which identifies and freezes inactive parameters of a given ansatze at initialization for a specific class of Hamiltonians, reducing the optimization overhead without sacrificing accuracy. The motivation of Titan starts with our empirical findings that a subset of parameters consistently has a negligible influence on training dynamics. Its design combines a theoretically grounded data construction strategy, ensuring each training example is informative and BP-resilient, with an adaptive neural architecture that generalizes across ansatze of varying sizes. Across benchmark transverse-field Ising models, Heisenberg models, and multiple molecule systems up to 30 qubits, Titan achieves up to 3 times faster convergence and 40% to 60% fewer circuit evaluations than state-of-the-art baselines, while matching or surpassing their estimation accuracy. By proactively trimming parameter space, Titan lowers hardware demands and offers a scalable path toward utilizing VQE to advance practical quantum chemistry and materials science.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Silicon Valley trades researchers like football teams poach players
The tech industry is in a high-flying war over who can dole out more millions to attract artificial intelligence specialists. Individual researchers, most equipped with PhDs in computer science, are commanding giant salaries and mammoth signing bonuses in hiring negotiations. You might call them talent. The Washington Post called them Olympians in a recent headline: "Why AI superathletes could be winning 100 million bonuses in Silicon Valley." These are the most sought-after employees in the world.
- Europe > United Kingdom (0.05)
- North America > United States > Pennsylvania (0.05)
- North America > United States > Hawaii (0.05)
- (4 more...)
- Information Technology (1.00)
- Leisure & Entertainment > Sports > Soccer (0.40)
- Leisure & Entertainment > Sports > Football (0.40)
A Two-Stage Data Selection Framework for Data-Efficient Model Training on Edge Devices
Gong, Chen, Xing, Rui, Zheng, Zhenzhe, Wu, Fan
The demand for machine learning (ML) model training on edge devices is escalating due to data privacy and personalized service needs. However, we observe that current on-device model training is hampered by the under-utilization of on-device data, due to low training throughput, limited storage and diverse data importance. To improve data resource utilization, we propose a two-stage data selection framework {\sf Titan} to select the most important data batch from streaming data for model training with guaranteed efficiency and effectiveness. Specifically, in the first stage, {\sf Titan} filters out a candidate dataset with potentially high importance in a coarse-grained manner.In the second stage of fine-grained selection, we propose a theoretically optimal data selection strategy to identify the data batch with the highest model performance improvement to current training round. To further enhance time-and-resource efficiency, {\sf Titan} leverages a pipeline to co-execute data selection and model training, and avoids resource conflicts by exploiting idle computing resources. We evaluate {\sf Titan} on real-world edge devices and three representative edge computing tasks with diverse models and data modalities. Empirical results demonstrate that {\sf Titan} achieves up to $43\%$ reduction in training time and $6.2\%$ increase in final accuracy with minor system overhead, such as data processing delay, memory footprint and energy consumption.
- North America > Canada > Ontario > Toronto (0.05)
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > New York > New York County > New York City (0.04)
Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives
Li, Jinchao, Wang, Yuejiao, Li, Junan, Kang, Jiawen, Zheng, Bo, Wong, Simon, Mak, Brian, Fung, Helene, Woo, Jean, Mak, Man-Wai, Kwok, Timothy, Mok, Vincent, Gong, Xianmin, Wu, Xixin, Liu, Xunying, Wong, Patrick, Meng, Helen
Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Speech analysis offers a non-intrusive and scalable screening method, particularly through narrative tasks in neuropsychological assessment tools. Traditional narrative analysis often focuses on local indicators in microstructure, such as word usage and syntax. While these features provide insights into language production abilities, they often fail to capture global narrative patterns, or microstructures. Macrostructures include coherence, thematic organization, and logical progressions, reflecting essential cognitive skills potentially critical for recognizing NCDs. Addressing this gap, we propose to investigate specific cognitive and linguistic challenges by analyzing topical shifts, temporal dynamics, and the coherence of narratives over time, aiming to reveal cognitive deficits by identifying narrative impairments, and exploring their impact on communication and cognition. The investigation is based on the CU-MARVEL Rabbit Story corpus, which comprises recordings of a story-telling task from 758 older adults. We developed two approaches: the Dynamic Topic Models (DTM)-based temporal analysis to examine the evolution of topics over time, and the Text-Image Temporal Alignment Network (TITAN) to evaluate the coherence between spoken narratives and visual stimuli. DTM-based approach validated the effectiveness of dynamic topic consistency as a macrostructural metric (F1=0.61, AUC=0.78). The TITAN approach achieved the highest performance (F1=0.72, AUC=0.81), surpassing established microstructural and macrostructural feature sets. Cross-comparison and regression tasks further demonstrated the effectiveness of proposed dynamic macrostructural modeling approaches for NCD detection.
- Asia > China > Hong Kong (0.05)
- North America > United States (0.04)
- Europe > Austria > Vienna (0.04)
- Research Report > New Finding (0.93)
- Overview (0.93)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.75)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.69)
- Health & Medicine > Therapeutic Area > Neurology > Dementia (0.47)
Titans: Learning to Memorize at Test Time
Behrouz, Ali, Zhong, Peilin, Mirrokni, Vahab
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limiting the model to a fixed-length context. We present a new neural long-term memory module that learns to memorize historical context and helps attention to attend to the current context while utilizing long past information. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short-term memory, while neural memory due to its ability to memorize the data, acts as a long-term, more persistent, memory. Based on these two modules, we introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture. Our experimental results on language modeling, common-sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models. They further can effectively scale to larger than 2M context window size with higher accuracy in needle-in-haystack tasks compared to baselines.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (3 more...)
- Health & Medicine (0.66)
- Education (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Web Scale Graph Mining for Cyber Threat Intelligence
Defending against today's increasingly sophisticated and large-scale cyberattacks demands accurate, real-time threat intelligence. Traditional approaches struggle to scale, integrate diverse telemetry, and adapt to a constantly evolving security landscape. We introduce Threat Intelligence Tracking via Adaptive Networks (TITAN), an industry-scale graph mining framework that generates cyber threat intelligence at unprecedented speed and scale. TITAN introduces a suite of innovations specifically designed to address the complexities of the modern security landscape, including: (1) a dynamic threat intelligence graph that maps the intricate relationships between millions of entities, incidents, and organizations; (2) real-time update mechanisms that automatically decay and prune outdated intel; (3) integration of security domain knowledge to bootstrap initial reputation scores; and (4) reputation propagation algorithms that uncover hidden threat actor infrastructure. Integrated into Microsoft Unified Security Operations Platform (USOP), which is deployed across hundreds of thousands of organizations worldwide, TITAN's threat intelligence powers key detection and disruption capabilities. With an impressive average macro-F1 score of 0.89 and a precision-recall AUC of 0.94, TITAN identifies millions of high-risk entities each week, enabling a 6x increase in non-file threat intelligence. Since its deployment, TITAN has increased the product's incident disruption rate by a remarkable 21%, while reducing the time to disrupt by a factor of 1.9x, and maintaining 99% precision, as confirmed by customer feedback and thorough manual evaluation by security experts--ultimately saving customers from costly security breaches.
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > Washington > King County > Redmond (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.67)
Unlocking the Archives: Using Large Language Models to Transcribe Handwritten Historical Documents
Humphries, Mark, Leddy, Lianne C., Downton, Quinn, Legace, Meredith, McConnell, John, Murray, Isabella, Spence, Elizabeth
This study demonstrates that Large Language Models (LLMs) can transcribe historical handwritten documents with significantly higher accuracy than specialized Handwritten Text Recognition (HTR) software, while being faster and more cost-effective. We introduce an open-source software tool called Transcription Pearl that leverages these capabilities to automatically transcribe and correct batches of handwritten documents using commercially available multimodal LLMs from OpenAI, Anthropic, and Google. In tests on a diverse corpus of 18th/19th century English language handwritten documents, LLMs achieved Character Error Rates (CER) of 5.7 to 7% and Word Error Rates (WER) of 8.9 to 15.9%, improvements of 14% and 32% respectively over specialized state-of-the-art HTR software like Transkribus. Most significantly, when LLMs were then used to correct those transcriptions as well as texts generated by conventional HTR software, they achieved near-human levels of accuracy, that is CERs as low as 1.8% and WERs of 3.5%. The LLMs also completed these tasks 50 times faster and at approximately 1/50th the cost of proprietary HTR programs. These results demonstrate that when LLMs are incorporated into software tools like Transcription Pearl, they provide an accessible, fast, and highly accurate method for mass transcription of historical handwritten documents, significantly streamlining the digitization process.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > New York (0.04)
- North America > United States > North Carolina (0.04)
- (4 more...)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.46)
Perception of Emotions in Human and Robot Faces: Is the Eye Region Enough?
Mishra, Chinmaya, Skantze, Gabriel, Hagoort, Peter, Verdonschot, Rinus
The increased interest in developing next-gen social robots has raised questions about the factors affecting the perception of robot emotions. This study investigates the impact of robot appearances (humanlike, mechanical) and face regions (full-face, eye-region) on human perception of robot emotions. A between-subjects user study (N = 305) was conducted where participants were asked to identify the emotions being displayed in videos of robot faces, as well as a human baseline. Our findings reveal three important insights for effective social robot face design in Human-Robot Interaction (HRI): Firstly, robots equipped with a back-projected, fully animated face - regardless of whether they are more human-like or more mechanical-looking - demonstrate a capacity for emotional expression comparable to that of humans. Secondly, the recognition accuracy of emotional expressions in both humans and robots declines when only the eye region is visible. Lastly, within the constraint of only the eye region being visible, robots with more human-like features significantly enhance emotion recognition.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study > Negative Result (0.46)
- Health & Medicine (1.00)
- Media > Film (0.93)
- Leisure & Entertainment (0.68)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
What Went Wrong at Blizzard Entertainment
Over the past three years, as I worked on a book about the history of the video-game company Blizzard Entertainment, a disconcerting question kept popping into my head: Why does success seem so awful? Even typing that out feels almost anti-American, anathema to the ethos of hard work and ambition that has propelled so many of the great minds and ideas that have changed the world. But Blizzard makes a good case for the modest achievement over the astronomical. Founded in Irvine, California, by two UCLA students named Allen Adham and Mike Morhaime, the company quickly became well respected and popular thanks to a series of breakout franchises such as StarCraft and Diablo. But everything changed in 2004 with the launch of World of Warcraft (or WoW), which became an online-gaming juggernaut that made billions of dollars.