Government
Nvidia strikes bumper AI deals with Asia tech giants
US chip giant Nvidia will supply more than 260,000 of its most advanced artificial intelligence (AI) chips to South Korea's government, as well as Samsung, LG, and Hyundai. The companies will all deploy the AI chips in factories to make everything from semiconductors and robots to autonomous vehicles and meant that South Korea can now produce intelligence as a new export, chief executive Jensen Huang said. Mr Huang did not disclose the value of the South Korean deals. It caps off a busy week for Nvidia, which on Wednesday became the first company to be valued at $5 trillion and on Thursday saw signs of a thaw in US-China trade relations that may mean it can export more of its chips to China . Speaking at a CEO summit on the sidelines of Asia Pacific Economic Cooperation (Apec) in Gyeongju, South Korea, Mr Huang added that with the chips, companies would be able to create digital twins with other factories around the world.
Our verdict on Our Brains, Our Selves: A mix of praise and misgivings
The New Scientist Book Club has various issues with Masud Husain's prize-winning popular science book about neurology The New Scientist Book Club stepped away from science fiction for our October read, turning to the winner of the Royal Society Trivedi Science Book Prize instead, serendipitously announced just in time for us to start on our next literary adventure. Six books had been up for the award, from Daniel Levitin's to Sadiah Qureshi's . Judges picked Masud Husain's and they praised it effusively, calling it "a beautiful exploration of how problems in the brain can cause people to lose their sense of self", and citing how these medical histories are "skilfully interwoven with Husain's personal story of moving to the UK as an immigrant in the 1960s, where he found himself grappling with his own sense of belonging". Sandra Knapp, chair of the judging panel for the 2025 Royal Society Trivedi Science Book Prize, explains why neurologist Masud Husain's collection of case studies is such an enlightening, compassionate book The first thing to say is: our book club members are much tougher judges than those on the panel for the Royal Society prize! While I think we were excited to get to grips with this book, and to venture into the world of non-fiction for a change, there were many issues that were raised and picked over by our readers. Let's tackle the positives first.
Government to help rural businesses adopt robots amid labor shortage
The Ministry of Economy, Trade and Industry has set up a support organization, in cooperation with local governments, to accelerate robot adoption by small and midsize enterprises (SMEs) in rural areas. Aimed at boosting productivity despite labor shortages, the group will train advisers to help companies introduce and use robotics effectively and will share leading case studies from across the country. Population decline in nonurban areas is accelerating, and labor shortages are becoming more severe, particularly in manufacturing. As a result, many SMEs are struggling to secure new employees. The ministry argues that wider use of robots can help by automating task performance, boosting productivity and reducing the burdens of physically demanding work.
Russia-Ukraine war: List of key events, day 1,345
Trump-Xi meeting: Who has the upper hand? Could Trump go for a third term? Is the US eyeing its next Latin American target? Why is Trump tearing down parts of the White House? Russia's Ministry of Defence said its forces took control of the villages of Krasnohirske in Ukraine's Zaporizhia region and Sadove in the Kharkiv region, Russian state news agencies reported.
LASTIST: LArge-Scale Target-Independent STance dataset
Kim, DongJae, Lee, Yaejin, Park, Minsu, Park, Eunil
Stance detection has emerged as an area of research in the field of artificial intelligence. However, most research is currently centered on the target-dependent stance detection task, which is based on a person's stance in favor of or against a specific target. Furthermore, most benchmark datasets are based on English, making it difficult to develop models in low-resource languages such as Korean, especially for an emerging field such as stance detection. This study proposes the LArge-Scale Target-Independent STance (LASTIST) dataset to fill this research gap. Collected from the press releases of both parties on Korean political parties, the LASTIST dataset uses 563,299 labeled Korean sentences. We provide a detailed description of how we collected and constructed the dataset and trained state-of-the-art deep learning and stance detection models. Our LASTIST dataset is designed for various tasks in stance detection, including target-independent stance detection and diachronic evolution stance detection.
CompoST: A Benchmark for Analyzing the Ability of LLMs To Compositionally Interpret Questions in a QALD Setting
Schmidt, David Maria, Schubert, Raoul, Cimiano, Philipp
Language interpretation is a compositional process, in which the meaning of more complex linguistic structures is inferred from the meaning of their parts. Large language models possess remarkable language interpretation capabilities and have been successfully applied to interpret questions by mapping them to SPARQL queries. An open question is how systematic this interpretation process is. Toward this question, in this paper, we propose a benchmark for investigating to what extent the abilities of LLMs to interpret questions are actually compositional. For this, we generate three datasets of varying difficulty based on graph patterns in DBpedia, relying on Lemon lexica for verbalization. Our datasets are created in a very controlled fashion in order to test the ability of LLMs to interpret structurally complex questions, given that they have seen the atomic building blocks. This allows us to evaluate to what degree LLMs are able to interpret complex questions for which they "understand" the atomic parts. We conduct experiments with models of different sizes using both various prompt and few-shot optimization techniques as well as fine-tuning. Our results show that performance in terms of macro $F_1$ degrades from $0.45$ over $0.26$ down to $0.09$ with increasing deviation from the samples optimized on. Even when all necessary information was provided to the model in the input, the $F_1$ scores do not exceed $0.57$ for the dataset of lowest complexity. We thus conclude that LLMs struggle to systematically and compositionally interpret questions and map them into SPARQL queries.
A geometric framework for momentum-based optimizers for low-rank training
Schotthรถfer, Steffen, Klein, Timon, Kusch, Jonas
Low-rank pre-training and fine-tuning have recently emerged as promising techniques for reducing the computational and storage costs of large neural networks. Training low-rank parameterizations typically relies on conventional optimizers such as heavy ball momentum methods or Adam. In this work, we identify and analyze potential difficulties that these training methods encounter when used to train low-rank parameterizations of weights. In particular, we show that classical momentum methods can struggle to converge to a local optimum due to the geometry of the underlying optimization landscape. To address this, we introduce novel training strategies derived from dynamical low-rank approximation, which explicitly account for the underlying geometric structure. Our approach leverages and combines tools from dynamical low-rank approximation and momentum-based optimization to design optimizers that respect the intrinsic geometry of the parameter space. We validate our methods through numerical experiments, demonstrating faster convergence, and stronger validation metrics at given parameter budgets.
The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs
Liu, Songyang, Li, Chaozhuo, Qiu, Jiameng, Zhang, Xi, Huang, Feiran, Zhang, Litian, Hei, Yiming, Yu, Philip S.
With the rapid advancement of artificial intelligence, Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), including content generation, human-computer interaction, machine translation, and code generation. However, their widespread deployment has also raised significant safety concerns. In particular, LLM-generated content can exhibit unsafe behaviors such as toxicity, bias, or misinformation, especially in adversarial contexts, which has attracted increasing attention from both academia and industry. Although numerous studies have attempted to evaluate these risks, a comprehensive and systematic survey on safety evaluation of LLMs is still lacking. This work aims to fill this gap by presenting a structured overview of recent advances in safety evaluation of LLMs. Specifically, we propose a four-dimensional taxonomy: (i) Why to evaluate, which explores the background of safety evaluation of LLMs, how they differ from general LLMs evaluation, and the significance of such evaluation; (ii) What to evaluate, which examines and categorizes existing safety evaluation tasks based on key capabilities, including dimensions such as toxicity, robustness, ethics, bias and fairness, truthfulness, and related aspects; (iii) Where to evaluate, which summarizes the evaluation metrics, datasets and benchmarks currently used in safety evaluations; (iv) How to evaluate, which reviews existing mainstream evaluation methods based on the roles of the evaluators and some evaluation frameworks that integrate the entire evaluation pipeline. Finally, we identify the challenges in safety evaluation of LLMs and propose promising research directions to promote further advancement in this field. We emphasize the necessity of prioritizing safety evaluation to ensure the reliable and responsible deployment of LLMs in real-world applications.