Goto

Collaborating Authors

 Oceania


Bayesian Optimization for Controlled Image Editing via LLMs

arXiv.org Artificial Intelligence

In the rapidly evolving field of image generation, achieving precise control over generated content and maintaining semantic consistency remain significant limitations, particularly concerning grounding techniques and the necessity for model fine-tuning. To address these challenges, we propose BayesGenie, an off-the-shelf approach that integrates Large Language Models (LLMs) with Bayesian Optimization to facilitate precise and user-friendly image editing. Our method enables users to modify images through natural language descriptions without manual area marking, while preserving the original image's semantic integrity. Unlike existing techniques that require extensive pre-training or fine-tuning, our approach demonstrates remarkable adaptability across various LLMs through its model-agnostic design. BayesGenie employs an adapted Bayesian optimization strategy to automatically refine the inference process parameters, achieving high-precision image editing with minimal user intervention. Through extensive experiments across diverse scenarios, we demonstrate that our framework significantly outperforms existing methods in both editing accuracy and semantic preservation, as validated using different LLMs including Claude3 and GPT-4.


MC2SleepNet: Multi-modal Cross-masking with Contrastive Learning for Sleep Stage Classification

arXiv.org Artificial Intelligence

Sleep profoundly affects our health, and sleep deficiency or disorders can cause physical and mental problems. Despite significant findings from previous studies, challenges persist in optimizing deep learning models, especially in multi-modal learning for high-accuracy sleep stage classification. Our research introduces MC2SleepNet (Multi-modal Cross-masking with Contrastive learning for Sleep stage classification Network). It aims to facilitate the effective collaboration between Convolutional Neural Networks (CNNs) and Transformer architectures for multi-modal training with the help of contrastive learning and cross-masking. Raw single channel EEG signals and corresponding spectrogram data provide differently characterized modalities for multi-modal learning. Our MC2SleepNet has achieved state-of-the-art performance with an accuracy of both 84.6% on the SleepEDF-78 and 88.6% accuracy on the Sleep Heart Health Study (SHHS). These results demonstrate the effective generalization of our proposed network across both small and large datasets.


Image Fusion for Cross-Domain Sequential Recommendation

arXiv.org Artificial Intelligence

Cross-Domain Sequential Recommendation (CDSR) aims to predict future user interactions based on historical interactions across multiple domains. The key challenge in CDSR is effectively capturing cross-domain user preferences by fully leveraging both intra-sequence and inter-sequence item interactions. In this paper, we propose a novel method, Image Fusion for Cross-Domain Sequential Recommendation (IFCDSR), which incorporates item image information to better capture visual preferences. Our approach integrates a frozen CLIP model to generate image embeddings, enriching original item embeddings with visual data from both intra-sequence and inter-sequence interactions. Additionally, we employ a multiple attention layer to capture cross-domain interests, enabling joint learning of single-domain and cross-domain user preferences. To validate the effectiveness of IFCDSR, we re-partitioned four e-commerce datasets and conducted extensive experiments. Results demonstrate that IFCDSR significantly outperforms existing methods.


On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers

arXiv.org Artificial Intelligence

Every uncalibrated classifier has a corresponding true calibration map that calibrates its confidence. Deviations of this idealistic map from the identity map reveal miscalibration. Such calibration errors can be reduced with many post-hoc calibration methods which fit some family of calibration maps on a validation dataset. In contrast, evaluation of calibration with the expected calibration error (ECE) on the test set does not explicitly involve fitting. However, as we demonstrate, ECE can still be viewed as if fitting a family of functions on the test data. This motivates the fit-on-the-test view on evaluation: first, approximate a calibration map on the test data, and second, quantify its distance from the identity. Exploiting this view allows us to unlock missed opportunities: (1) use the plethora of post-hoc calibration methods for evaluating calibration; (2) tune the number of bins in ECE with cross-validation. Furthermore, we introduce: (3) benchmarking on pseudo-real data where the true calibration map can be estimated very precisely; and (4) novel calibration and evaluation methods using new calibration map families PL and PL3.


Multiview graph dual-attention deep learning and contrastive learning for multi-criteria recommender systems

arXiv.org Machine Learning

Recommender systems leveraging deep learning models have been crucial for assisting users in selecting items aligned with their preferences and interests. However, a significant challenge persists in single-criteria recommender systems, which often overlook the diverse attributes of items that have been addressed by Multi-Criteria Recommender Systems (MCRS). Shared embedding vector for multi-criteria item ratings but have struggled to capture the nuanced relationships between users and items based on specific criteria. In this study, we present a novel representation for Multi-Criteria Recommender Systems (MCRS) based on a multi-edge bipartite graph, where each edge represents one criterion rating of items by users, and Multiview Dual Graph Attention Networks (MDGAT). Employing MDGAT is beneficial and important for adequately considering all relations between users and items, given the presence of both local (criterion-based) and global (multi-criteria) relations. Additionally, we define anchor points in each view based on similarity and employ local and global contrastive learning to distinguish between positive and negative samples across each view and the entire graph. We evaluate our method on two real-world datasets and assess its performance based on item rating predictions. The results demonstrate that our method achieves higher accuracy compared to the baseline method for predicting item ratings on the same datasets. MDGAT effectively capture the local and global impact of neighbours and the similarity between nodes.


Assassin's Creed maker confirms leaked game footage is real

BBC News

Assassin's Creed maker confirms leaked game footage is real 38 minutes agoTom RichardsonBBC NewsbeatUbisoftAssassin's Creed Shadows is seen as a pivotal release for Ubisoft The makers of Assassin's Creed Shadows - the forthcoming entry in one of video gaming's biggest franchises - have confirmed footage leaked online is real. Some players managed to get their hands on the game - due to be released on 20 March - ahead of its official release. Developer Ubisoft said gameplay videos shared online "did not represent the final quality of the game". In a statement posted online, the company said it was "still working on patches" and urged fans not to share spoilers. Shadows will be the first Assassin's Creed instalment set in Japan - something fans have long been asking for.


RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) enhances the performance of Large Language Models (LLMs) by incorporating external knowledge. However, LLMs still encounter challenges in effectively utilizing the knowledge from retrieved documents, often being misled by irrelevant or noisy information. To address this issue, we introduce RankCoT, a knowledge refinement method that incorporates reranking signals in generating CoT-based summarization for knowledge refinement based on given query and all retrieval documents. During training, RankCoT prompts the LLM to generate Chain-of-Thought (CoT) candidates based on the query and individual documents. It then fine-tunes the LLM to directly reproduce the best CoT from these candidate outputs based on all retrieved documents, which requires LLM to filter out irrelevant documents during generating CoT-style summarization. Additionally, RankCoT incorporates a self-reflection mechanism that further refines the CoT outputs, resulting in higher-quality training data. Our experiments demonstrate the effectiveness of RankCoT, showing its superior performance over other knowledge refinement models. Further analysis reveals that RankCoT can provide shorter but effective refinement results, enabling the generator to produce more accurate answers. All code and data are available at https://github.com/NEUIR/RankCoT.


How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

arXiv.org Artificial Intelligence

The performance of Large Language Models (LLMs) on many tasks is greatly limited by the knowledge learned during pre-training and stored in the model's parameters. Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of LLMs. In this study, we investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge. We fine-tuned Llama-3.1-8B-instruct using LoRA with varying amounts of new knowledge. Our experiments have shown that the best results are obtained when the training data contains a mixture of known and new facts. However, this approach is still potentially harmful because the model's performance on external question-answering benchmarks declines after such fine-tuning. When the training data is biased towards certain entities, the model tends to regress to few overrepresented answers. In addition, we found that the model becomes more confident and refuses to provide an answer in only few cases. These findings highlight the potential pitfalls of LoRA-based LLM updates and underscore the importance of training data composition and tuning parameters to balance new knowledge integration and general model capabilities.


Allocating Variance to Maximize Expectation

arXiv.org Machine Learning

We design efficient approximation algorithms for maximizing the expectation of the supremum of families of Gaussian random variables. In particular, let $\mathrm{OPT}:=\max_{\sigma_1,\cdots,\sigma_n}\mathbb{E}\left[\sum_{j=1}^{m}\max_{i\in S_j} X_i\right]$, where $X_i$ are Gaussian, $S_j\subset[n]$ and $\sum_i\sigma_i^2=1$, then our theoretical results include: - We characterize the optimal variance allocation -- it concentrates on a small subset of variables as $|S_j|$ increases, - A polynomial time approximation scheme (PTAS) for computing $\mathrm{OPT}$ when $m=1$, and - An $O(\log n)$ approximation algorithm for computing $\mathrm{OPT}$ for general $m>1$. Such expectation maximization problems occur in diverse applications, ranging from utility maximization in auctions markets to learning mixture models in quantitative genetics.


QueryAdapter: Rapid Adaptation of Vision-Language Models in Response to Natural Language Queries

arXiv.org Artificial Intelligence

A domain shift exists between the large-scale, internet data used to train a Vision-Language Model (VLM) and the raw image streams collected by a robot. Existing adaptation strategies require the definition of a closed-set of classes, which is impractical for a robot that must respond to diverse natural language queries. In response, we present QueryAdapter; a novel framework for rapidly adapting a pre-trained VLM in response to a natural language query. QueryAdapter leverages unlabelled data collected during previous deployments to align VLM features with semantic classes related to the query. By optimising learnable prompt tokens and actively selecting objects for training, an adapted model can be produced in a matter of minutes. We also explore how objects unrelated to the query should be dealt with when using real-world data for adaptation. In turn, we propose the use of object captions as negative class labels, helping to produce better calibrated confidence scores during adaptation. Extensive experiments on ScanNet++ demonstrate that QueryAdapter significantly enhances object retrieval performance compared to state-of-the-art unsupervised VLM adapters and 3D scene graph methods. Furthermore, the approach exhibits robust generalization to abstract affordance queries and other datasets, such as Ego4D.