Oceania
Hyperparameter Optimisation with Practical Interpretability and Explanation Methods in Probabilistic Curriculum Learning
Salt, Llewyn, Gallagher, Marcus
Hyperparameter optimisation (HPO) is crucial for achieving strong performance in reinforcement learning (RL), as RL algorithms are inherently sensitive to hyperparameter settings. Probabilistic Curriculum Learning (PCL) is a curriculum learning strategy designed to improve RL performance by structuring the agent's learning process, yet effective hyperparameter tuning remains challenging and computationally demanding. In this paper, we provide an empirical analysis of hyperparameter interactions and their effects on the performance of a PCL algorithm within standard RL tasks, including point-maze navigation and DC motor control. Using the AlgOS framework integrated with Optuna's Tree-Structured Parzen Estimator (TPE), we present strategies to refine hyperparameter search spaces, enhancing optimisation efficiency. Additionally, we introduce a novel SHAP-based interpretability approach tailored specifically for analysing hyperparameter impacts, offering clear insights into how individual hyperparameters and their interactions influence RL performance. Our work contributes practical guidelines and interpretability tools that significantly improve the effectiveness and computational feasibility of hyperparameter optimisation in reinforcement learning.
CDER: Collaborative Evidence Retrieval for Document-level Relation Extraction
Document-level Relation Extraction (DocRE) involves identifying relations between entities across multiple sentences in a document. Evidence sentences, crucial for precise entity pair relationships identification, enhance focus on essential text segments, improving DocRE performance. However, existing evidence retrieval systems often overlook the collaborative nature among semantically similar entity pairs in the same document, hindering the effectiveness of the evidence retrieval task. To address this, we propose a novel evidence retrieval framework, namely CDER. CDER employs an attentional graph-based architecture to capture collaborative patterns and incorporates a dynamic sub-structure for additional robustness in evidence retrieval. Experimental results on the benchmark DocRE dataset show that CDER not only excels in the evidence retrieval task but also enhances overall performance of existing DocRE system.
Query Understanding in LLM-based Conversational Information Seeking
Yuan, Yifei, Abbasiantaeb, Zahra, Deng, Yang, Aliannejadi, Mohammad
Query understanding in Conversational Information Seeking (CIS) involves accurately interpreting user intent through context-aware interactions. This includes resolving ambiguities, refining queries, and adapting to evolving information needs. Large Language Models (LLMs) enhance this process by interpreting nuanced language and adapting dynamically, improving the relevance and precision of search results in real-time. In this tutorial, we explore advanced techniques to enhance query understanding in LLM-based CIS systems. We delve into LLM-driven methods for developing robust evaluation metrics to assess query understanding quality in multi-turn interactions, strategies for building more interactive systems, and applications like proactive query management and query reformulation. We also discuss key challenges in integrating LLMs for query understanding in conversational search systems and outline future research directions. Our goal is to deepen the audience's understanding of LLM-based conversational query understanding and inspire discussions to drive ongoing advancements in this field.
Revealed: Big tech's new datacentres will take water from the world's driest areas
Amazon, Microsoft and Google are operating datacentres that use vast amounts of water in some of the world's driest areas and are building many more, an investigation by SourceMaterial and the Guardian has found. With Donald Trump pledging to support them, the three technology giants are planning hundreds of datacentres in the US and across the globe, with a potentially huge impact on populations already living with water scarcity. "The question of water is going to become crucial," said Lorena Jaume-Palasรญ, founder of the Ethical Tech Society. "Resilience from a resource perspective is going to be very difficult for those communities." Efforts by Amazon, the world's largest online retailer, to mitigate its water use have sparked opposition from inside the company, SourceMaterial's investigation found, with one of its own sustainability experts warning that its plans are "not ethical".
Titanic's Scottish scapegoat is CLEARED after 113 years: 3D scans confirm First Officer William Murdoch did NOT abandon his post as the ship sank
It has been 113 years since the Titanic sank beneath the waves, claiming the lives of more than 1,500 passengers and crew. But new evidence has finally cleared the tragedy's Scottish scapegoat: First Officer William Murdoch. For years, Officer Murdoch has been accused of taking bribes, abandoning his post, and was even depicted shooting a passenger in the James Cameron movie. Now, more than a century later, 3D scans show that Officer Murdoch did not flee his position, but died while helping passengers escape until the very end. Deep sea scanning company Magellan has snapped 715,000 photos of the Titanic wreck 12,500 feet beneath the Atlantic.
Retrieval Augmented Generation with Collaborative Filtering for Personalized Text Generation
Shi, Teng, Xu, Jun, Zhang, Xiao, Zang, Xiaoxue, Zheng, Kai, Song, Yang, Li, Han
Recently, the personalization of Large Language Models (LLMs) to generate content that aligns with individual user preferences has garnered widespread attention. Personalized Retrieval-Augmented Generation (RAG), which retrieves relevant documents from the user's history to reflect their preferences and enhance LLM generation, is one commonly used approach for personalization. However, existing personalized RAG methods do not consider that the histories of similar users can also assist in personalized generation for the current user, meaning that collaborative information between users can also benefit personalized generation. Inspired by the application of collaborative filtering in recommender systems, we propose a method called CFRAG, which adapts Collaborative Filtering to RAG for personalized text generation. However, this presents two challenges: (1)~how to incorporate collaborative information without explicit user similarity labels? (2)~how to retrieve documents that support personalized LLM generation? For Challenge 1, we use contrastive learning to train user embeddings to retrieve similar users and introduce collaborative information. For Challenge 2, we design a personalized retriever and reranker to retrieve the top-$k$ documents from these users' histories. We take into account the user's preference during retrieval and reranking. Then we leverage feedback from the LLM to fine-tune the personalized retriever and reranker, enabling them to retrieve documents that meet the personalized generation needs of the LLM. Experimental results on the Language Model Personalization (LaMP) benchmark validate the effectiveness of CFRAG. Further analysis confirms the importance of incorporating collaborative information.
Probabilistic QoS Metric Forecasting in Delay-Tolerant Networks Using Conditional Diffusion Models on Latent Dynamics
Zhang, Enming, Liu, Zheng, Xiang, Yu, Qu, Yanwen
Probabilistic QoS Metric Forecasting in Delay-T olerant Networks Using Conditional Diffusion Models on Latent Dynamics Enming Zhang School of Computer Science Nanjing University of Posts and T elecommunications Nanjing, China b20060123@njupt.edu.cn Zheng Liu School of Computer Science Nanjing University of Posts and T elecommunications Nanjing, China zliu@njupt.edu.cn Y u Xiang School of Computer Science Nanjing University of Posts and T elecommunications Nanjing, China 1221045920@njupt.edu.cn Abstract --Active QoS metric prediction, commonly employed in the maintenance and operation of DTN, could enhance network performance regarding latency, throughput, energy consumption, and dependability. Naturally formulated as a multivariate time series forecasting problem, it attracts substantial research efforts. Traditional mean regression methods for time series forecasting cannot capture the data complexity adequately, resulting in deteriorated performance in operational tasks in DTNs such as routing. This paper formulates the prediction of QoS metrics in DTN as a probabilistic forecasting problem on multivariate time series, where one could quantify the uncertainty of forecasts by characterizing the distribution of these samples. The proposed approach hires diffusion models and incorporates the latent temporal dynamics of non-stationary and multi-mode data into them.
AEGIS: Human Attention-based Explainable Guidance for Intelligent Vehicle Systems
Zhuang, Zhuoli, Lu, Cheng-You, Chang, Yu-Cheng Fred, Wang, Yu-Kai, Do, Thomas, Lin, Chin-Teng
Improving decision-making capabilities in Autonomous Intelligent Vehicles (AIVs) has been a heated topic in recent years. Despite advancements, training machines to capture regions of interest for comprehensive scene understanding, like human perception and reasoning, remains a significant challenge. This study introduces a novel framework, Human Attention-based Explainable Guidance for Intelligent Vehicle Systems (AEGIS). AEGIS utilizes human attention, converted from eye-tracking, to guide reinforcement learning (RL) models to identify critical regions of interest for decision-making. AEGIS uses a pre-trained human attention model to guide RL models to identify critical regions of interest for decision-making. By collecting 1.2 million frames from 20 participants across six scenarios, AEGIS pre-trains a model to predict human attention patterns.
A Lightweight Large Vision-language Model for Multimodal Medical Images
Alsinglawi, Belal, McCarthy, Chris, Webb, Sara, Fluke, Christopher, Saidy, Navid Toosy
Medical Visual Question Answering (VQA) enhances clinical decision-making by enabling systems to interpret medical images and answer clinical queries. However, developing efficient, high-performance VQA models is challenging due to the complexity of medical imagery and diverse modalities. In this paper, we introduce a lightweight, multimodal VQA model integrating BiomedCLIP for image feature extraction and LLaMA-3 for text processing. Designed for medical VQA tasks, our model achieves state-of-the-art performance on the OmniMedVQA dataset. With approximately 8 billion parameters, it requires only two NVIDIA 40 GB A100 GPUs, demonstrating superior efficiency over larger models. Our results show 73.4% accuracy for open-end questions, surpassing existing models and validating its potential for real-world medical applications. Key contributions include a specialized multimodal VQA model, a resource-efficient architecture, and strong performance in answering open-ended clinical questions.
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Le, Binh M., Xu, Shaoyuan, Fu, Jinmiao, Huang, Zhishen, Li, Moyan, Guo, Yanhui, Li, Hongdong, Ramasinghe, Sameera, Wang, Bryan
In Visual Document Understanding (VDU) tasks, fine-tuning a pre-trained Vision-Language Model (VLM) with new datasets often falls short in optimizing the vision encoder to identify query-specific regions in text-rich document images. Existing methods that directly inject queries into model layers by modifying the network architecture often struggle to adapt to new datasets with limited annotations. T o address this, we introduce QID, a novel, streamlined, architecture-preserving approach that integrates query embeddings into the vision encoder, leading to notable performance gains, particularly in data-scarce fine-tuning scenarios. Specifically, our approach introduces a dual-module framework: a query-aware module that generates a unique query vector to precisely guide the model's focus, as well as a query-agnostic module that captures the positional relationships among tokens, ensuring robust spatial understanding. Notably, both modules operate independently of the vision attention blocks, facilitating targeted learning of query embeddings and enhancing visual semantic identification. Experiments with OCR-free VLMs across multiple datasets demonstrate significant performance improvements using our method, especially in handling text-rich documents in data-scarce environments.