learning objective
One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective
A deep hashing model typically has two main learning objectives: to make the learned binary hash codes discriminative and to minimize a quantization error. With further constraints such as bit balance and code orthogonality, it is not uncommon for existing models to employ a large number (> 4) of losses. This leads to difficulties in model training and subsequently impedes their effectiveness. In this work, we propose a novel deep hashing model with only $\textit{a single learning objective}$. Specifically, we show that maximizing the cosine similarity between the continuous codes and their corresponding $\textit{binary orthogonal codes}$ can ensure both hash code discriminativeness and quantization error minimization. Further, with this learning objective, code balancing can be achieved by simply using a Batch Normalization (BN) layer and multi-label classification is also straightforward with label smoothing. The result is a one-loss deep hashing model that removes all the hassles of tuning the weights of various losses. Importantly, extensive experiments show that our model is highly effective, outperforming the state-of-the-art multi-loss hashing models on three large-scale instance retrieval benchmarks, often by significant margins.
Preference Based Adaptation for Learning Objectives
In many real-world learning tasks, it is hard to directly optimize the true performance measures, meanwhile choosing the right surrogate objectives is also difficult. Under this situation, it is desirable to incorporate an optimization of objective process into the learning loop based on weak modeling of the relationship between the true measure and the objective. In this work, we discuss the task of objective adaptation, in which the learner iteratively adapts the learning objective to the underlying true objective based on the preference feedback from an oracle. We show that when the objective can be linearly parameterized, this preference based learning problem can be solved by utilizing the dueling bandit model. A novel sampling based algorithm DL^2M is proposed to learn the optimal parameter, which enjoys strong theoretical guarantees and efficient empirical performance. To avoid learning a hypothesis from scratch after each objective function update, a boosting based hypothesis adaptation approach is proposed to efficiently adapt any pre-learned element hypothesis to the current objective. We apply the overall approach to multi-label learning, and show that the proposed approach achieves significant performance under various multi-label performance measures.
Co-Designing Interdisciplinary Design Projects with AI
Liow, Wei Ting, Khan, Sumbul, Ang, Lay Kee
T his work has been submitted to the IEEE for possible publication. ORCID: 0000 -0003-2811-1194 Abstract --Creating interdisciplinary design projects is time-consuming and cognitively demanding for teachers, requiring curriculum alignment, cross -subject integration, and careful sequencing. This paper presents the Interdisciplinary Design Project Planner (IDPplanner), a GPT -based planning assistant grounded in Design Innovation principles, al ignment with Singapore secondary school's syllabuses, and 21st -century competencies. In a within -subject, counterbalanced workshop with 33 in -service teachers, participants produced two versions of the same project: manual and AI -assisted, followed by self - and peer-evaluations using a six -dimensional rubric. AI -assisted version received higher scores for Curriculum Alignment, Design Thinking Application, and Coherence & Flow, with a marginal advantage for Assessment Strategies. Teacher reflections indicated that AI -assisted planning improved structure, sequencing, and idea generation, while contextualization to local syllabuses, class profiles, and student needs remained teacher-led. Contributions include (1) a purpose-built planning tool that organizes ideas into a ten - component flow with ready-to -adapt prompts, templates, and assessment suggestions; (2) an empirical, rubric -based comparison of plan ning quality; and (3) evidence that AI can function as a pedagogical planning partner . Recommendations emphasize hybrid teacher-AI workflows to enhance curriculum alignment and reduce planning complexity, and design suggestions for developers to strengthen contextual customization, iterative design support, and l ocalized rubrics. Although instantiated with a Singapore -based curriculum, the planning flow and rubric are framework -agnostic and can be parameterized for other systems. Interdisciplinary learning approaches have gained prominence globally, particularly as countries prioritize 21st-century competencies (21CC) such as creativity, problem - solving, collaboration, and adaptive thinking.
- Asia > Singapore (0.48)
- Europe > United Kingdom > England > Leicestershire > Loughborough (0.04)
- Research Report > New Finding (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Research Report > Experimental Study > Negative Result (0.68)
- Education > Educational Setting > Higher Education (1.00)
- Education > Curriculum (1.00)
- Education > Educational Setting > K-12 Education > Secondary School (0.34)
Instructional Agents: LLM Agents on Automated Course Material Generation for Teaching Faculties
Yao, Huaiyuan, Xu, Wanpeng, Turnau, Justin, Kellam, Nadia, Wei, Hua
Preparing high-quality instructional materials remains a labor-intensive process that often requires extensive coordination among teaching faculty, instructional designers, and teaching assistants. In this work, we present Instructional Agents, a multi-agent large language model (LLM) framework designed to automate end-to-end course material generation, including syllabus creation, lecture scripts, LaTeX-based slides, and assessments. Unlike existing AI-assisted educational tools that focus on isolated tasks, Instructional Agents simulates role-based collaboration among educational agents to produce cohesive and pedagogically aligned content. The system operates in four modes: Autonomous, Catalog-Guided, Feedback-Guided, and Full Co-Pilot mode, enabling flexible control over the degree of human involvement. We evaluate Instructional Agents across five university-level computer science courses and show that it produces high-quality instructional materials while significantly reducing development time and human workload. By supporting institutions with limited instructional design capacity, Instructional Agents provides a scalable and cost-effective framework to democratize access to high-quality education, particularly in underserved or resource-constrained settings.
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Education > Educational Setting > Higher Education (0.50)
- Education > Curriculum > Subject-Specific Education (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization
Wang, Xiaoqi, Wang, Yi, Chau, Lap-Pui
Egocentric video-language understanding demands both high efficiency and accurate spatial-temporal modeling. Existing approaches face three key challenges: 1) Excessive pre-training cost arising from multi-stage pre-training pipelines, 2) Ineffective spatial-temporal encoding due to manually split 3D rotary positional embeddings that hinder feature interactions, and 3) Imprecise learning objectives in soft-label multi-instance retrieval, which neglect negative pair correlations. In this paper, we introduce EVA02-AT, a suite of EVA02-based video-language foundation models tailored to egocentric video understanding tasks. EVA02-AT first efficiently transfers an image-based CLIP model into a unified video encoder via a single-stage pretraining. Second, instead of applying rotary positional embeddings to isolated dimensions, we introduce spatial-temporal rotary positional embeddings along with joint attention, which can effectively encode both spatial and temporal information on the entire hidden dimension. This joint encoding of spatial-temporal features enables the model to learn cross-axis relationships, which are crucial for accurately modeling motion and interaction in videos. Third, focusing on multi-instance video-language retrieval tasks, we introduce the Symmetric Multi-Similarity (SMS) loss and a novel training framework that advances all soft labels for both positive and negative pairs, providing a more precise learning objective. Extensive experiments on Ego4D, EPIC-Kitchens-100, and Charades-Ego under zero-shot and fine-tuning settings demonstrate that EVA02-AT achieves state-of-the-art performance across diverse egocentric video-language tasks with fewer parameters. Models with our SMS loss also show significant performance gains on multi-instance retrieval benchmarks. Our code and models are publicly available at https://github.com/xqwang14/EVA02-AT .
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Fairness in Survival Analysis: A Novel Conditional Mutual Information Augmentation Approach
ABSTRACT Survival analysis, a vital tool for predicting the time to event, has been used in many domains such as healthcare, criminal justice, and finance. Like classification tasks, survival analysis can exhibit bias against disadvantaged groups, often due to biases inherent in data or algorithms . Several studies in both the IS and CS communities have attempted to address fairness in survival analysis . However, existing methods often overlook the importance of prediction fairness at pre - defined evaluation time points, which is crucial in real - world applications where decision making often hinge s on specific time frames . To address this critical research gap, we introduce a new fairness concept: equalized odds (EO) in survival analysis, which emphasize s prediction fairness at pre - defined time points . To achieve th e EO fairness in survival analysis, we propose a Conditional Mutual Information Augmentation ( CMIA) approach, which features a novel fairness regularization term based on conditional mutual information and a n innovative censored data augmentation technique. Our CMIA approach can effectively balance prediction accuracy and fairness, and it is applicable to various survival models. W e evaluate the CMIA approach against several state - of - the - art methods within three different application domains, and the results demonstrate that CMIA consistently reduces prediction disparit y while maintaining good accuracy and significantly outperform s the other competing methods across multiple datasets and survival models (e.g., linear COX, deep AFT) . Keywords: survival analysis, equalized odds, fairness, pre - defined evaluation time points, conditional mutual information, cen sore d data augmentation 2 Fairness in Survival Analysis: A Novel Conditional Mutual Information Augmentation Approach 1. INTRODUCTION Survival analysis is a set of statistical methods designed to model data where the outcome of interest is the time to the occurrence of a particular event (P . It is widely applied across many domains, such as healthcare (Khuri et al., 2005; Reddy et al., 2015), education (Ameri et al., 2016), business intelligence (Li et al., 2016; Rakesh et al., 2016), etc . In these applications, survival analysis provide s likelihood estimation for the occurrence of event s over time, which is useful for a lot of crucial decision making.
- North America > United States > Maryland (0.04)
- North America > United States > Arizona (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Banking & Finance (1.00)
Reviews: PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Limited novelty: The proposed approach is closely related to two lines of related work: 1) sg2im [4] which generates images from scene graph representations, and 2) semi-parametric image synthesis [3], which leverages semantic layouts and training images to generate novel images. The key difference to sg2im is the use of image crops in order to perform semi-parametric synthesis; however, in comparison to prior work on semi-parametric methods [3], as suggested by the authors (Line 82-83) the primary difference is the use of graph convolution architecture, where a similar graph convolution method has been introduced in [4]. I'd like to see more justifications from the authors regarding the technical novelty of this approach in presence of these two lines of work. Limited resolution: My concern about the limited novelty is exacerbated by the fact that the generated images are still in low-resolution (64x64) as prior work [4], even though high-resolution image crops are used to aid the image generation process. In contrast, related work [3] is able to generate images of much higher resolutions, e.g., 512x1024, using their semi-parametric method (which was not compared in the experiment).
One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective
A deep hashing model typically has two main learning objectives: to make the learned binary hash codes discriminative and to minimize a quantization error. With further constraints such as bit balance and code orthogonality, it is not uncommon for existing models to employ a large number ( 4) of losses. This leads to difficulties in model training and subsequently impedes their effectiveness. In this work, we propose a novel deep hashing model with only \textit{a single learning objective} . Specifically, we show that maximizing the cosine similarity between the continuous codes and their corresponding \textit{binary orthogonal codes} can ensure both hash code discriminativeness and quantization error minimization.
Reviews: Preference Based Adaptation for Learning Objectives
Summary: The authors consider the problem of optimizing the linear combination of multiple objective functions, where these objective functions are typically surrogate loss functions for machine learning tasks. In the problem setting, the decision maker explore-while-exploit the linear combination in a duel bandit setting, where in each time step the decision maker tests the two hypotheses generated from two linear combinations, and then the decision maker would receive the feedback on whether the first hypothesis is better or the second is better. The main contributions of the paper is the proposal of online algorithms for the duel bandit problem, where the preference on two tested hypotheses is modeled by a binary logistic choice model. In order to avoid retraining the hypothesis for every different linear combination, the authors adapt the boosting algorithm, which focuses on optimizing the mixture of K different hypotheses, where each hypothesis stem from optimizing one surrogate function. Major Comment: I find the paper quite interesting in terms of problem model and the analysis, and I am more inclined towards acceptance than rejection.
Accessing Vision Foundation Models at ImageNet-level Costs
Zhang, Yitian, Ma, Xu, Bai, Yue, Wang, Huan, Fu, Yun
Vision foundation models are renowned for their generalization ability due to massive training data. Nevertheless, they demand tremendous training resources, and the training data is often inaccessible, e.g., CLIP, DINOv2, posing great challenges to developing derivatives that could advance research in this field. In this work, we offer a very simple and general solution, named Proteus, to distill foundation models into smaller equivalents on ImageNet-1K without access to the original training data. Specifically, we remove the designs from conventional knowledge distillation settings that result in dataset bias and present three levels of training objectives, i.e., token, patch, and feature, to maximize the efficacy of knowledge transfer. In this manner, Proteus is trained at ImageNet-level costs with surprising ability, facilitating the accessibility of training foundation models for the broader research community. Leveraging DINOv2-g/14 as the teacher, Proteus-L/14 matches the performance of the Oracle method DINOv2-L/14 (142M training data) across 15 benchmarks and outperforms other vision foundation models including CLIP-L/14 (400M), OpenCLIP-L/14 (400M/2B) and SynCLR-L/14 (600M). Code is available at here.
- Instructional Material (0.54)
- Research Report (0.50)