Goto

Collaborating Authors

 url


Mesh-TensorFlow: Deep Learning for Supercomputers

Neural Information Processing Systems

However,batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately,efficient model-parallel algorithms tend tobe complicated todiscover, describe, and to implement, particularly on large clusters.


Cross-videoIdentityCorrelatingforPerson Re-identificationPre-training

Neural Information Processing Systems

However, these researches are mostly confined to pre-training at the instance-level or single-video tracklet-level. They ignore the identity-invariance in images of the same person across different videos, which is a key focus in person re-identification. To address this issue, we propose a Cross-video Identity-cOrrelating pre-traiNing (CION) framework.



URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates

Neural Information Processing Systems

Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate ten uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is available at https://github.com/mkirchhof/url .


DCMM-SQL: Automated Data-Centric Pipeline and Multi-Model Collaboration Training for Text-to-SQL Model

arXiv.org Artificial Intelligence

Text-to-SQL tasks have gained attractive improvements since the release of ChatGPT. Among them, agent-based frameworks have been widely used in this field. However, the impact of data-centric strategies on text-to-SQL tasks has rarely been explored. In this paper, we systemically design a fully automated data-centric pipeline for text-to-SQL tasks, including \emph{adaptive data repair}, which can automatically find and fix errors in the training dataset; and \emph{error data augmentation}, where we specifically diffuse and enhance erroneous data predicted by the initially trained models. Meanwhile, we propose a Multi-Model collaboration training schema, aiming to train multiple models with different augmented data, enabling them to possess distinct capabilities and work together to complement each other, because it has been found that the capability of a single fine-tuned model is very limited. Furthermore, we utilize an ensemble strategy to integrate the capabilities of multiple models to solve a multiple-choice question, aiming to further improve the accuracy of text-to-SQL tasks. The experiment results and ablation study have demonstrated the effectiveness of data-centric pipeline and Multi-Model(MM) interactive iterative strategies, achieving first place in lightweight text-to-SQL models (within 70B).





On the retraining frequency of global forecasting models

arXiv.org Machine Learning

In an era of increasing computational capabilities and growing environmental consciousness, organizations face a critical challenge in balancing the accuracy of forecasting models with computational efficiency and sustainability. Global forecasting models, lowering the computational time, have gained significant attention over the years. However, the common practice of retraining these models with new observations raises important questions about the costs of forecasting. Using ten different machine learning and deep learning models, we analyzed various retraining scenarios, ranging from continuous updates to no retraining at all, across two large retail datasets. We showed that less frequent retraining strategies maintain the forecast accuracy while reducing the computational costs, providing a more sustainable approach to large-scale forecasting. We also found that machine learning models are a marginally better choice to reduce the costs of forecasting when coupled with less frequent model retraining strategies as the frequency of the data increases. Our findings challenge the conventional belief that frequent retraining is essential for maintaining forecasting accuracy. Instead, periodic retraining offers a good balance between predictive performance and efficiency, both in the case of point and probabilistic forecasting. These insights provide actionable guidelines for organizations seeking to optimize forecasting pipelines while reducing costs and energy consumption.


Variational Autoencoder for Generating Broader-Spectrum prior Proposals in Markov chain Monte Carlo Methods

arXiv.org Machine Learning

This study uses a Variational Autoencoder method to enhance the efficiency and applicability of Markov Chain Monte Carlo (McMC) methods by generating broader-spectrum prior proposals. Traditional approaches, such as the Karhunen-Loève Expansion (KLE), require previous knowledge of the covariance function, often unavailable in practical applications. The VAE framework enables a data-driven approach to flexibly capture a broader range of correlation structures in Bayesian inverse problems, particularly subsurface flow modeling. The methodology is tested on a synthetic groundwater flow inversion problem, where pressure data is used to estimate permeability fields. Numerical experiments demonstrate that the VAE-based parameterization achieves comparable accuracy to KLE when the correlation length is known and outperforms KLE when the assumed correlation length deviates from the true value. Moreover, the VAE approach significantly reduces stochastic dimensionality, improving computational efficiency. The results suggest that leveraging deep generative models in McMC methods can lead to more adaptable and efficient Bayesian inference in high-dimensional problems.