Oceania
2025 in SPACEFLIGHT: The incredible missions set to take off next year, revealed - from China's daring asteroid retrieval to the first private trip to Venus
From NASA's mission to study Jupiter's icy moon Europa to Elon Musk's SpaceX catching its Starship rocket mid-air, there's no doubt 2024 saw some incredible space feats. 'In 2024, NASA made leap after giant leap to explore, discover, and inspire – all while bringing real, tangible, and substantial benefits to the American people and to all of humanity,' said NASA Administrator Bill Nelson. And 2025 is set to be an even more remarkable year for space agencies and companies around the world, who have an assortment of exciting missions lined up. Among them are NASA, which is sending two twin spacecraft to Mars – although its upcoming return to the moon has been delayed yet again. There's also the European Space Agency, which is set to launch its futuristic'Space Rider' spaceplane – described as a'robotic laboratory the size of two minivans'.
'All people could do was hope the nerds would fix it': the global panic over the millennium bug, 25 years on
Just before midnight on New Year's Eve, 25 years ago, Queen Elizabeth II stepped off a private barge to arrive at London's Millennium Dome for its grand opening ceremony. Dressed in a pumpkin-orange coat, she entered the venue with Prince Philip, taking her place alongside Tony and Cherie Blair and 12,000 guests to celebrate the dawn of a new millennium. At the stroke of midnight, Big Ben began to chime and 40 tonnes of fireworks were launched from 16 barges lined along the river. The crowd joined hands, preparing to sing Auld Lang Syne. For a few long moments, the Queen was neglected – she flapped her arms out like a toddler wanting to be lifted up, before Blair and Philip noticed her, took a hand each, and the singing began. A new century was born. One politician who wasn't in attendance at the glitzy celebration was Paddy Tipping, a Labour MP who spent the night in the Cabinet Office.
Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance
Ma, Hongxu, Tian, Kai, Zhang, Tao, Zhang, Xuefeng, Chen, Chunjie, Li, Han, Guan, Jihong, Zhou, Shuigeng
Watch time prediction (WTP) has emerged as a pivotal task in short video recommendation systems, designed to encapsulate user interests. Predicting users' watch times on videos often encounters challenges, including wide value ranges and imbalanced data distributions, which can lead to significant bias when directly regressing watch time. Recent studies have tried to tackle these issues by converting the continuous watch time estimation into an ordinal classification task. While these methods are somewhat effective, they exhibit notable limitations. Inspired by language modeling, we propose a novel Generative Regression (GR) paradigm for WTP based on sequence generation. This approach employs structural discretization to enable the lossless reconstruction of original values while maintaining prediction fidelity. By formulating the prediction problem as a numerical-to-sequence mapping, and with meticulously designed vocabulary and label encodings, each watch time is transformed into a sequence of tokens. To expedite model training, we introduce the curriculum learning with an embedding mixup strategy which can mitigate training-and-inference inconsistency associated with teacher forcing. We evaluate our method against state-of-the-art approaches on four public datasets and one industrial dataset. We also perform online A/B testing on Kuaishou, a leading video app with about 400 million DAUs, to demonstrate the real-world efficacy of our method. The results conclusively show that GR outperforms existing techniques significantly. Furthermore, we successfully apply GR to another regression task in recommendation systems, i.e., Lifetime Value (LTV) prediction, which highlights its potential as a novel and effective solution to general regression challenges.
An archaeological Catalog Collection Method Based on Large Vision-Language Models
Pang, Honglin, Chang, Yi, Duan, Tianjing, Yang, Xi
Archaeological catalogs, containing key elements such as artifact images, morphological descriptions, and excavation information, are essential for studying artifact evolution and cultural inheritance. These data are widely scattered across publications, requiring automated collection methods. However, existing Large Vision-Language Models (VLMs) and their derivative data collection methods face challenges in accurate image detection and modal matching when processing archaeological catalogs, making automated collection difficult. To address these issues, we propose a novel archaeological catalog collection method based on Large Vision-Language Models that follows an approach comprising three modules: document localization, block comprehension and block matching. Through practical data collection from the Dabagou and Miaozigou pottery catalogs and comparison experiments, we demonstrate the effectiveness of our approach, providing a reliable solution for automated collection of archaeological catalogs.
The Fifth International Verification of Neural Networks Competition (VNN-COMP 2024): Summary and Results
Brix, Christopher, Bak, Stanley, Johnson, Taylor T., Wu, Haoze
This report summarizes the 5th International Verification of Neural Networks Competition (VNN-COMP 2024), held as a part of the The 7th International Symposium on AI Verification (SAIV), that was co-llocated with the 36th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2024 iteration, 8 teams participated on a diverse set of 12 regular and 8 extended benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.
Text2Insight: Transform natural language text into insights seamlessly using multi-model architecture
The growing demand for dynamic, user-centric data analysis and visualization is evident across domains like healthcare, finance, and research. Traditional visualization tools often fail to meet individual user needs due to their static and predefined nature. To address this gap, Text2Insight is introduced as an innovative solution that delivers customized data analysis and visualizations based on user-defined natural language requirements. Leveraging a multi-model architecture, Text2Insight transforms user inputs into actionable insights and dynamic visualizations. The methodology begins with analyzing the input dataset to extract structural details such as columns and values. A pre-trained Llama3 model converts the user's natural language query into an SQL query, which is further refined using a Named Entity Recognition (NER) model for accuracy. A chart predictor determines the most suitable visualization type, while the Llama3 model generates insights based on the SQL query's results. The output is a user-friendly and visually informative chart. To enhance analysis capabilities, the system integrates a question-answering model and a predictive model using the BERT framework. These models provide insights into historical data and predict future trends. Performance evaluation of Text2Insight demonstrates its effectiveness, achieving high accuracy (99%), precision (100%), recall (99%), and F1-score (99%), with a BLEU score of 0.5. The question-answering model attained an accuracy of 89% and the predictive model achieved 70% accuracy. These results validate Text2Insight as a robust and viable solution for transforming natural language text into dynamic, user-specific data analysis and visualizations.
Generate to Discriminate: Expert Routing for Continual Learning
Byun, Yewon, Mehta, Sanket Vaibhav, Garg, Saurabh, Strubell, Emma, Oberst, Michael, Wilder, Bryan, Lipton, Zachary C.
In many real-world settings, regulations and economic incentives permit the sharing of models but not data across institutional boundaries. In such scenarios, practitioners might hope to adapt models to new domains, without losing performance on previous domains (so-called catastrophic forgetting). While any single model may struggle to achieve this goal, learning an ensemble of domain-specific experts offers the potential to adapt more closely to each individual institution. However, a core challenge in this context is determining which expert to deploy at test time. In this paper, we propose Generate to Discriminate (G2D), a domain-incremental continual learning method that leverages synthetic data to train a domain-discriminator that routes samples at inference time to the appropriate expert. Surprisingly, we find that leveraging synthetic data in this capacity is more effective than using the samples to \textit{directly} train the downstream classifier (the more common approach to leveraging synthetic data in the lifelong learning literature). We observe that G2D outperforms competitive domain-incremental learning methods on tasks in both vision and language modalities, providing a new perspective on the use of synthetic data in the lifelong learning literature.
Feedback Design and Implementation for Integrated Posture Manipulation and Thrust Vectoring
This MS thesis outlines my contributions to the closed loop control and system integration of two robotic platforms: 1) Aerobat, a flapping wing robot stabilized by air jets, and 2) Harpy, a bipedal robot equipped with dual thrusters. Both systems share a common theme of the integration of posture manipulation and thrust vectoring to achieve stability and controlled movement. For Aerobat, I developed the software and control architecture that enabled its first untethered flights. The control system combines flapping wing dynamics with multiple air jet stabilization to maintain roll, pitch and yaw stability. These results were published in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). For Harpy, I implemented a closed-loop control framework that incorporates active thruster assisted frontal dynamics stabilization . My work led to preliminary untethered dynamic walking. This approach demonstrates how thrust assisted stability can enhance locomotion in legged robots which has not been explored before.
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating
Deng, Chao, Yuan, Jiale, Bu, Pi, Wang, Peijie, Li, Zhong-Zhi, Xu, Jian, Li, Xiao-Hui, Gao, Yuan, Song, Jun, Zheng, Bo, Liu, Cheng-Lin
Large vision language models (LVLMs) have improved the document understanding capabilities remarkably, enabling the handling of complex document elements, longer contexts, and a wider range of tasks. However, existing document understanding benchmarks have been limited to handling only a small number of pages and fail to provide a comprehensive analysis of layout elements locating. In this paper, we first define three primary task categories: Long Document Understanding, numerical Reasoning, and cross-element Locating, and then propose a comprehensive benchmark, LongDocURL, integrating above three primary tasks and comprising 20 sub-tasks categorized based on different primary tasks and answer evidences. Furthermore, we develop a semi-automated construction pipeline and collect 2,325 high-quality question-answering pairs, covering more than 33,000 pages of documents, significantly outperforming existing benchmarks. Subsequently, we conduct comprehensive evaluation experiments on both open-source and closed-source models across 26 different configurations, revealing critical performance gaps in this field.
Hidformer: Transformer-Style Neural Network in Stock Price Forecasting
Szydłowski, Kamil Ł., Chudziak, Jarosław A.
This paper investigates the application of Transformer-based neural networks to stock price forecasting, with a special focus on the intersection of machine learning techniques and financial market analysis. The evolution of Transformer models, from their inception to their adaptation for time series analysis in financial contexts, is reviewed and discussed. Central to our study is the exploration of the Hidformer model, which is currently recognized for its promising performance in time series prediction. The primary aim of this paper is to determine whether Hidformer will also prove itself in the task of stock price prediction. This slightly modified model serves as the framework for our experiments, integrating the principles of technical analysis with advanced machine learning concepts to enhance stock price prediction accuracy. We conduct an evaluation of the Hidformer model's performance, using a set of criteria to determine its efficacy. Our findings offer additional insights into the practical application of Transformer architectures in financial time series forecasting, highlighting their potential to improve algorithmic trading strategies, including human decision making.