luca
U.S. and Japan turn to drones to help offset China's military advantages
U.S. and Japan turn to drones to help offset China's military advantages Low-cost Unmanned Combat Attack System (LUCAS) drones are positioned on the tarmac at a base in the U.S. Central Command operating area in November. Just a few years ago, it would have been almost inconceivable for U.S. forces -- the world's most advanced military -- to operate reverse-engineered Iranian drones. But times are changing fast, and so is the nature of warfare, a fact that is also prompting a shift in Japan. As Washington and its allies scramble for combat-proven and low-cost drones, the U.S. Central Command recently announced the launch of a squadron based on the LUCAS kamikaze drone, a system derived from Iran's Shahed-136 loitering munition, versions of which are being used by Russia in Ukraine. The autonomous LUCAS, which is also being tested by the U.S. Navy and Marines, is part of a broader Pentagon push to fast-track the adoption of various small drones across the military, treating them as "consumable or expendable" capabilities similar to bullets, hand grenades and other munitions.
ChatGPT Needs More Cowbell
AI struggles to write a good jingle. You'd be forgiven if you can't hum the 18th-century Cumbrian folk song "Do Ye Ken John Peel." But in 1942, a version of that tune, reworked with lyrics about Pepsi-Cola, was the most recognized melody in America. Three years earlier, two men walked into the office of Pepsi-Cola's president, carrying a phonograph. They played a demo of what would become one of America's earliest advertising jingles.
LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing
Yang, Zhiying, Liu, Fang, Zhang, Wei, Lou, Xin, Low, Malcolm Yoke Hean, Gan, Boon Ping
This paper presents \textsc{Luca}, a \underline{l}arge language model (LLM)-\underline{u}pgraded graph reinforcement learning framework for \underline{c}arbon-\underline{a}ware flexible job shop scheduling. \textsc{Luca} addresses the challenges of dynamic and sustainable scheduling in smart manufacturing systems by integrating a graph neural network and an LLM, guided by a carefully designed in-house prompting strategy, to produce a fused embedding that captures both structural characteristics and contextual semantics of the latest scheduling state. This expressive embedding is then processed by a deep reinforcement learning policy network, which generates real-time scheduling decisions optimized for both makespan and carbon emission objectives. To support sustainability goals, \textsc{Luca} incorporates a dual-objective reward function that encourages both energy efficiency and scheduling timeliness. Experimental results on both synthetic and public datasets demonstrate that \textsc{Luca} consistently outperforms comparison algorithms. For instance, on the synthetic dataset, it achieves an average of 4.1\% and up to 12.2\% lower makespan compared to the best-performing comparison algorithm while maintaining the same emission level. On public datasets, additional gains are observed for both makespan and emission. These results demonstrate that \textsc{Luca} is effective and practical for carbon-aware scheduling in smart manufacturing.
Opium may have been a daily habit for Ancient Egyptians
Breakthroughs, discoveries, and DIY tips sent every weekday. Ancient Egyptians may have used opium a . Based on recent examinations, archaeologists now say the drug may even have been a near-daily recreational habit. Opium might have even been widely used across socio-economic classes as long as 3,000 years ago. The evidence is detailed in a study recently published in the, and offers a glimpse into the daily lives of regular Egyptians and royalty alike.
Hybrid Vision Servoing with Depp Alignment and GRU-Based Occlusion Recovery
Lee, Jee Won, Lim, Hansol, Yang, Sooyeun, Choi, Jongseong Brad
Traditional robotic controllers have long relied on proprioceptive sensors such as joint encoders, inertial measurement units, and force - torque sensors to estimate position and motion, but these often suffer from drift, calibration errors, and limited environmental awareness [1]. Image - based visual servoing has therefore been widely adopted for high - precision robotic assembly, aerial vehicle stabilization, and minimally invasive surgery, where direct visual feedback can compensate for model uncertainties an d encoder inaccuracies [2] [3]. In these closed - loop systems, perception must deliver sub - pixel localization accuracy at control rates above 30 Hz while tolerating partial or full occlusions, illumination shifts, and motion blur to maintain loop stability and precision [4]. Even millimeter - level tracking errors can accumulate into significant actuation drift, undermining safety and performance into sub - millimeter surgical targeting or centimeter - scale drone landing [5] [6]. Early IBVS methods emerged in the early 1990s to simplify robot control by directly mapping image features to velocity commands, establishing the foundation for image - space loop closure [2]. Handcrafted detectors such as SIFT [7], which identifies scale - invariant keypoints, SURF [8], which accelerates detection using integral images, and ORB [9], which offers an efficient binary alternative, were paired with RANSASC [10] to filter out mismatches. However, these sparse approaches struggled when keypoints wer e lost to occlusion or blur. To achieve denser alignment, the Lucas - Kanade algorithm was introduced to iteratively minimize photometric error over image patches and enable smooth sub - pixel registration [11].
Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)
Ren, Lin, Xiao, Guohui, Qi, Guilin, Geng, Yishuai, Xue, Haohan
Answer Set Programming (ASP) is a powerful paradigm for non-monotonic reasoning. Recently, large language models (LLMs) have demonstrated promising capabilities in logical reasoning. Despite this potential, current evaluations of LLM capabilities in ASP are often limited. Existing works normally employ overly simplified ASP programs, do not support negation, disjunction, or multiple answer sets. Furthermore, there is a lack of benchmarks that introduce tasks specifically designed for ASP solving. To bridge this gap, we introduce ASPBench, a comprehensive ASP benchmark, including three ASP specific tasks: ASP entailment, answer set verification, and answer set computation. Our extensive evaluations on ASPBench reveal that while 14 state-of-the-art LLMs, including \emph{deepseek-r1}, \emph{o4-mini}, and \emph{gemini-2.5-flash-thinking}, perform relatively well on the first two simpler tasks, they struggle with answer set computation, which is the core of ASP solving. These findings offer insights into the current limitations of LLMs in ASP solving. This highlights the need for new approaches that integrate symbolic reasoning capabilities more effectively. The code and dataset are available at https://github.com/HomuraT/ASPBench.
OneEval: Benchmarking LLM Knowledge-intensive Reasoning over Diverse Knowledge Bases
Chen, Yongrui, Liu, Zhiqiang, Yu, Jing, Ren, Lin, Hu, Nan, Dai, Xinbang, Liu, Jiajun, Kang, Jiazhen, Zhang, Shenyu, Wang, Xinda, Ding, Keyan, Shen, Pengfei, Zhu, Haolei, Deng, Hongjie, Wang, Yisong, Wu, Tongtong, Bi, Sheng, Zhang, Wen, Wu, Tianxing, Ji, Qiu, Wang, Haofen, Chen, Wenliang, Chen, Huajun, Qi, Guilin
Large Language Models (LLMs) have demonstrated substantial progress on reasoning tasks involving unstructured text, yet their capabilities significantly deteriorate when reasoning requires integrating structured external knowledge such as knowledge graphs, code snippets, or formal logic. This limitation is partly due to the absence of benchmarks capable of systematically evaluating LLM performance across diverse structured knowledge modalities. To address this gap, we introduce \textbf{\textsc{OneEval}}, a comprehensive benchmark explicitly designed to assess the knowledge-intensive reasoning capabilities of LLMs across four structured knowledge modalities, unstructured text, knowledge graphs, code, and formal logic, and five critical domains (general knowledge, government, science, law, and programming). \textsc{OneEval} comprises 4,019 carefully curated instances and includes a challenging subset, \textsc{OneEval}\textsubscript{Hard}, consisting of 1,285 particularly difficult cases. Through extensive evaluation of 18 state-of-the-art open-source and proprietary LLMs, we establish three core findings: a) \emph{persistent limitations in structured reasoning}, with even the strongest model achieving only 32.2\% accuracy on \textsc{OneEval}\textsubscript{Hard}; b) \emph{performance consistently declines as the structural complexity of the knowledge base increases}, with accuracy dropping sharply from 53\% (textual reasoning) to 25\% (formal logic); and c) \emph{diminishing returns from extended reasoning chains}, highlighting the critical need for models to adapt reasoning depth appropriately to task complexity. We release the \textsc{OneEval} datasets, evaluation scripts, and baseline results publicly, accompanied by a leaderboard to facilitate ongoing advancements in structured knowledge reasoning.
Seals playing a video game reveal how they find their way
Breakthroughs, discoveries, and DIY tips sent every weekday. The world's harbor seals (Phoca vitulina) are masters in seeing through the cloudy coastal waters they call home. Equipped with dexterous whiskers, these pinnipeds use a suite of senses to navigate their surroundings with ease. Harbor seals may also use an important part of their vision to determine which direction they are moving, even with such an opaque view of the world. Now, we might know a bit more about how they can tell which direction they are heading.
8-year-old kid with a metal detector stumbles upon a 19th century shipwreck
Breakthroughs, discoveries, and DIY tips sent every weekday. A Canadian kid is proof that major scientific discoveries don't always have to come from grizzled researchers with fancy equipment. Two years ago, then-8-year-old Lucas Atchison went on a family trip to Point Farms Provincial Park in Ontario. Armed with a metal detector he had just received as a birthday present, Atchison dutifully scanned the area, hoping to hear that coveted "beep." Eagerly digging into the site, Lucas uncovered a metal spike, which his father initially dismissed as something used to tie up boats.
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions
Rakotonirina, Nathanaël Carraz, Hamdy, Mohammed, Campos, Jon Ander, Weber, Lucas, Testoni, Alberto, Fadaee, Marzieh, Pezzelle, Sandro, Del Tredici, Marco
Large Language Models (LLMs) are increasingly used in working environments for a wide range of tasks, excelling at solving individual problems in isolation. However, are they also able to effectively collaborate over long-term interactions? To investigate this, we introduce MemoryCode, a synthetic multi-session dataset designed to test LLMs' ability to track and execute simple coding instructions amid irrelevant information, simulating a realistic setting. While all the models we tested handle isolated instructions well, even the performance of state-of-the-art models like GPT-4o deteriorates when instructions are spread across sessions. Our analysis suggests this is due to their failure to retrieve and integrate information over long instruction chains. Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.