Goto

Collaborating Authors

 dcl



Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning Supplementary Material Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

Moreover, we show more visualization results in experiments. To ensure a fair comparison, we used the fusion and optimization method as same as Latefusion. When k=1, it means that the object's physical properties are only related to itself, while As described in Section 3.1 in our paper, we represent audio Table 2: Performance comparison between our proposed DSE-audio and existing baseline methods. As shown in Table 2, we compare our method with other baseline methods. In Figure 6, we show a few additional examples of clustering using dynamic factors.


LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) have shown significant progress in open-set challenges. However, the limited availability of 3D datasets hinders their effective application in 3D scene understanding. We propose LOC, a general language-guided framework adaptable to various occupancy networks, supporting both supervised and self-supervised learning paradigms. For self-supervised tasks, we employ a strategy that fuses multi-frame LiDAR points for dynamic/static scenes, using Poisson reconstruction to fill voids, and assigning semantics to voxels via K-Nearest Neighbor (KNN) to obtain comprehensive voxel representations. To mitigate feature over-homogenization caused by direct high-dimensional feature distillation, we introduce Densely Contrastive Learning (DCL). DCL leverages dense voxel semantic information and predefined textual prompts. This efficiently enhances open-set recognition without dense pixel-level supervision, and our framework can also leverage existing ground truth to further improve performance. Our model predicts dense voxel features embedded in the CLIP feature space, integrating textual and image pixel information, and classifies based on text and semantic similarity. Experiments on the nuScenes dataset demonstrate the method's superior performance, achieving high-precision predictions for known classes and distinguishing unknown classes without additional training data.


Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning Supplementary Material Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

Moreover, we show more visualization results in experiments. To ensure a fair comparison, we used the fusion and optimization method as same as Latefusion. When k=1, it means that the object's physical properties are only related to itself, while As described in Section 3.1 in our paper, we represent audio Table 2: Performance comparison between our proposed DSE-audio and existing baseline methods. As shown in Table 2, we compare our method with other baseline methods. In Figure 6, we show a few additional examples of clustering using dynamic factors.


Learning to Negotiate via Voluntary Commitment

arXiv.org Artificial Intelligence

The partial alignment and conflict of autonomous agents lead to mixed-motive scenarios in many real-world applications. However, agents may fail to cooperate in practice even when cooperation yields a better outcome. One well known reason for this failure comes from non-credible commitments. To facilitate commitments among agents for better cooperation, we define Markov Commitment Games (MCGs), a variant of commitment games, where agents can voluntarily commit to their proposed future plans. Based on MCGs, we propose a learnable commitment protocol via policy gradients. We further propose incentive-compatible learning to accelerate convergence to equilibria with better social welfare. Experimental results in challenging mixed-motive tasks demonstrate faster empirical convergence and higher returns for our method compared with its counterparts. Our code is available at https://github.com/shuhui-zhu/DCL.


Coherent Hierarchical Probabilistic Forecasting of Electric Vehicle Charging Demand

arXiv.org Artificial Intelligence

The growing penetration of electric vehicles (EVs) significantly changes typical load curves in smart grids. With the development of fast charging technology, the volatility of EV charging demand is increasing, which requires additional flexibility for real-time power balance. The forecasting of EV charging demand involves probabilistic modeling of high dimensional time series dynamics across diverse electric vehicle charging stations (EVCSs). This paper studies the forecasting problem of multiple EVCS in a hierarchical probabilistic manner. For each charging station, a deep learning model based on a partial input convex neural network (PICNN) is trained to predict the day-ahead charging demand's conditional distribution, preventing the common quantile crossing problem in traditional quantile regression models. Then, differentiable convex optimization layers (DCLs) are used to reconcile the scenarios sampled from the distributions to yield coherent scenarios that satisfy the hierarchical constraint. It learns a better weight matrix for adjusting the forecasting results of different targets in a machine-learning approach compared to traditional optimization-based hierarchical reconciling methods. Numerical experiments based on real-world EV charging data are conducted to demonstrate the efficacy of the proposed method.


Solving Dual Sourcing Problems with Supply Mode Dependent Failure Rates

arXiv.org Artificial Intelligence

This paper investigates dual sourcing problems with supply mode dependent failure rates, particularly relevant in managing spare parts for downtime-critical assets. To enhance resilience, businesses increasingly adopt dual sourcing strategies using both conventional and additive manufacturing techniques. This paper explores how these strategies can optimise sourcing by addressing variations in part properties and failure rates. A significant challenge is the distinct failure characteristics of parts produced by these methods, which influence future demand. To tackle this, we propose a new iterative heuristic and several reinforcement learning techniques combined with an endogenous parameterised learning (EPL) approach. This EPL approach - compatible with any learning method - allows a single policy to handle various input parameters for multiple items. In a stylised setting, our best policy achieves an average optimality gap of 0.4%. In a case study within the energy sector, our policies outperform the baseline in 91.1% of instances, yielding average cost savings up to 22.6%.


Dilated Convolution with Learnable Spacings makes visual models more aligned with humans: a Grad-CAM study

arXiv.org Artificial Intelligence

Dilated Convolution with Learnable Spacing (DCLS) is a recent advanced convolution method that allows enlarging the receptive fields (RF) without increasing the number of parameters, like the dilated convolution, yet without imposing a regular grid. DCLS has been shown to outperform the standard and dilated convolutions on several computer vision benchmarks. Here, we show that, in addition, DCLS increases the models' interpretability, defined as the alignment with human visual strategies. To quantify it, we use the Spearman correlation between the models' GradCAM heatmaps and the ClickMe dataset heatmaps, which reflect human visual attention. We took eight reference models - ResNet50, ConvNeXt (T, S and B), CAFormer, ConvFormer, and FastViT (sa 24 and 36) - and drop-in replaced the standard convolution layers with DCLS ones. This improved the interpretability score in seven of them. Moreover, we observed that Grad-CAM generated random heatmaps for two models in our study: CAFormer and ConvFormer models, leading to low interpretability scores. We addressed this issue by introducing Threshold-Grad-CAM, a modification built on top of Grad-CAM that enhanced interpretability across nearly all models. The code and checkpoints to reproduce this study are available at: https://github.com/rabihchamas/DCLS-GradCAM-Eval.


Deep Companion Learning: Enhancing Generalization Through Historical Consistency

arXiv.org Artificial Intelligence

We propose Deep Companion Learning (DCL), a novel training method for Deep Neural Networks (DNNs) that enhances generalization by penalizing inconsistent model predictions compared to its historical performance. To achieve this, we train a deep-companion model (DCM), by using previous versions of the model to provide forecasts on new inputs. This companion model deciphers a meaningful latent semantic structure within the data, thereby providing targeted supervision that encourages the primary model to address the scenarios it finds most challenging.


Visualizing High-Dimensional Temporal Data Using Direction-Aware t-SNE

arXiv.org Artificial Intelligence

Many real-world data sets contain a temporal component or involve transitions from state to state. For exploratory data analysis, we can represent these high-dimensional data sets in two-dimensional maps, using embeddings of the data objects under exploration and representing their temporal relationships with directed edges. Most existing dimensionality reduction techniques, such as t-SNE and UMAP, do not take into account the temporal or relational nature of the data when constructing the embeddings, resulting in temporally cluttered visualizations that obscure potentially interesting patterns. To address this problem, we propose two complementary, direction-aware loss terms in the optimization function of t-SNE that emphasize the temporal aspects of the data, guiding the optimization and the resulting embedding to reveal temporal patterns that might otherwise go unnoticed. The Directional Coherence Loss (DCL) encourages nearby arrows connecting two adjacent time series points to point in the same direction, while the Edge Length Loss (ELL) penalizes arrows - which effectively represent time gaps in the visualized embedding - based on their length. Both loss terms are differentiable and can be easily incorporated into existing dimensionality reduction techniques. By promoting local directionality of the directed edges, our procedure produces more temporally meaningful and less cluttered visualizations. We demonstrate the effectiveness of our approach on a toy dataset and two real-world datasets.