Goto

Collaborating Authors

 Uppsala County







On the Pareto Front of Multilingual Neural Machine Translation Liang Chen 1 Shuming Ma

Neural Information Processing Systems

In this work, we study how the performance of a given direction changes with its sampling ratio in Multilingual Neural Machine Translation (MNMT). By training over 200 multilingual models with various model sizes, data sizes, and language directions, we find it interesting that the performance of certain translation direction does not always improve with the increase of its weight in the multi-task optimization objective. Accordingly, scalarization method leads to a multitask trade-off front that deviates from the traditional Pareto front when there exists data imbalance in the training corpus, which poses a great challenge to improve the overall performance of all directions. Based on our observations, we propose the Double Power Law to predict the unique performance trade-off front in MNMT, which is robust across various languages, data adequacy, and the number of tasks. Finally, we formulate the sample ratio selection problem in MNMT as an optimization problem based on the Double Power Law. In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget.



Why some animals eat their babies

Popular Science

Animal filial cannibalism has been documented in fish, insects, even domestic pets. Scientists still don't fully understand why some animals eat their own offspring. Breakthroughs, discoveries, and DIY tips sent every weekday. "In general, cannibalism of offspring is super widespread," says Aneesh Bose, a behavioral ecologist at the Swedish University of Agricultural Sciences in Uppsala, Sweden. Bose has long studied the phenomenon of animals who turn from child-rearing to child-eating, and in 2022, he authored a review of prior research on the topic .

  Country:
  Genre: Research Report > New Finding (0.35)
  Industry: Food & Agriculture > Agriculture (0.35)

Diffusion differentiable resampling

Andersson, Jennifer Rosina, Zhao, Zheng

arXiv.org Machine Learning

This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). We propose a new informative resampling method that is instantly pathwise differentiable, based on an ensemble score diffusion model. We prove that our diffusion resampling method provides a consistent estimate to the resampling distribution, and we show by experiments that it outperforms the state-of-the-art differentiable resampling methods when used for stochastic filtering and parameter estimation.


Forests of Uncertaint(r)ees: Using tree-based ensembles to estimate probability distributions of future conflict

Mittermaier, Daniel, Bohne, Tobias, Hofer, Martin, Racek, Daniel

arXiv.org Artificial Intelligence

Predictions of fatalities from violent conflict on the PRIO-GRID-month (pgm) level are characterized by high levels of uncertainty, limiting their usefulness in practical applications. We discuss the two main sources of uncertainty for this prediction task, the nature of violent conflict and data limitations, embedding this in the wider literature on uncertainty quantification in machine learning. We develop a strategy to quantify uncertainty in conflict forecasting, shifting from traditional point predictions to full predictive distributions. Our approach compares and combines multiple tree-based classifiers and distributional regressors in a custom auto-ML setup, estimating distributions for each pgm individually. We also test the integration of regional models in spatial ensembles as a potential avenue to reduce uncertainty. The models are able to consistently outperform a suite of benchmarks derived from conflict history in predictions up to one year in advance, with performance driven by regions where conflict was observed. With our evaluation, we emphasize the need to understand how a metric behaves for a given prediction problem, in our case characterized by extremely high zero-inflatedness. While not resulting in better predictions, the integration of smaller models does not decrease performance for this prediction task, opening avenues to integrate data sources with less spatial coverage in the future.