tau 0
Analysing the SEDs of protoplanetary disks with machine learning
Kaeufer, T., Woitke, P., Min, M., Kamp, I., Pinte, C.
ABRIDGED. The analysis of spectral energy distributions (SEDs) of protoplanetary disks to determine their physical properties is known to be highly degenerate. Hence, a Bayesian analysis is required to obtain parameter uncertainties and degeneracies. The challenge here is computational speed, as one radiative transfer model requires a couple of minutes to compute. We performed a Bayesian analysis for 30 well-known protoplanetary disks to determine their physical disk properties, including uncertainties and degeneracies. To circumvent the computational cost problem, we created neural networks (NNs) to emulate the SED generation process. We created two sets of radiative transfer disk models to train and test two NNs that predict SEDs for continuous and discontinuous disks. A Bayesian analysis was then performed on 30 protoplanetary disks with SED data collected by the DIANA project to determine the posterior distributions of all parameters. We ran this analysis twice, (i) with old distances and additional parameter constraints as used in a previous study, to compare results, and (ii) with updated distances and free choice of parameters to obtain homogeneous and unbiased model parameters. We evaluated the uncertainties in the determination of physical disk parameters from SED analysis, and detected and quantified the strongest degeneracies. The NNs are able to predict SEDs within 1ms with uncertainties of about 5% compared to the true SEDs obtained by the radiative transfer code. We find parameter values and uncertainties that are significantly different from previous values obtained by $\chi^2$ fitting. Comparing the global evidence for continuous and discontinuous disks, we find that 26 out of 30 objects are better described by disks that have two distinct radial zones. Also, we created an interactive tool that instantly returns the SED predicted by our NNs for any parameter combination.
Clustering with missing data: which equivalent for Rubin's rules?
Audigier, Vincent, Niang, Ndรจye
Multiple imputation (MI) is a popular method for dealing with missing values. However, the suitable way for applying clustering after MI remains unclear: how to pool partitions? How to assess the clustering instability when data are incomplete? By answering both questions, this paper proposed a complete view of clustering with missing data using MI. The problem of partitions pooling is here addressed using consensus clustering while, based on the bootstrap theory, we explain how to assess the instability related to observed and missing data. The new rules for pooling partitions and instability assessment are theoretically argued and extensively studied by simulation. Partitions pooling improves accuracy while measuring instability with missing data enlarges the data analysis possibilities: it allows assessment of the dependence of the clustering to the imputation model, as well as a convenient way for choosing the number of clusters when data are incomplete, as illustrated on a real data set.
Censored Quantile Regression Forests
Li, Alexander Hanbo, Bradic, Jelena
In many applications, we want to predict and estimate the effect of a covariate on survival timeof interests. Examples include treatment, surgical procedure, or immunization on survival time of patients, who for example, could be individuals who have metastatic breast cancer, military casualties suffering from various injuries, or survival time of infectious diseases.Classically, most datasets have been too small to meaningfully examine the heterogeneity of the data beyond dividing them into a few subpopulations. In the past few years, however, there has been an explosion of experimental settings where it is potentially feasible to explore heterogeneity to its full extent. An impediment to exploring heterogeneous effects is the fear that scientists with two opposite agendas could hypothetically string together two opposite but coherent results by searching through many different possible models and then reporting only the very extreme ones - highlighting solely spurious results (Olken, 2015). Thus, protocols for clinical trials must specify in advance the pre-analysis plans and then learn from the data.
99% of Parallel Optimization is Inevitably a Waste of Time
Mishchenko, Konstantin, Hanzely, Filip, Richtรกrik, Peter
It is well known that many optimization methods, including SGD, SAGA, and Accelerated SGD for over-parameterized models, do not scale linearly in the parallel setting. In this paper, we present a new version of block coordinate descent that solves this issue for a number of methods. The core idea is to make the sampling of coordinate blocks on each parallel unit independent of the others. Surprisingly, we prove that the optimal number of blocks to be updated by each of $n$ units in every iteration is equal to $m/n$, where $m$ is the total number of blocks. As an illustration, this means that when $n=100$ parallel units are used, $99\%$ of work is a waste of time. We demonstrate that with $m/n$ blocks used by each unit the iteration complexity often remains the same. Among other applications which we mention, this fact can be exploited in the setting of distributed optimization to break the communication bottleneck. Our claims are justified by numerical experiments which demonstrate almost a perfect match with our theory on a number of datasets.
TAPAS: Train-less Accuracy Predictor for Architecture Search
Istrate, R., Scheidegger, F., Mariani, G., Nikolopoulos, D., Bekas, C., Malossi, A. C. I.
In recent years an increasing number of researchers and practitioners have been suggesting algorithms for large-scale neural network architecture search: genetic algorithms, reinforcement learning, learning curve extrapolation, and accuracy predictors. None of them, however, demonstrated high-performance without training new experiments in the presence of unseen datasets. We propose a new deep neural network accuracy predictor, that estimates in fractions of a second classification performance for unseen input datasets, without training. In contrast to previously proposed approaches, our prediction is not only calibrated on the topological network information, but also on the characterization of the dataset-difficulty which allows us to re-tune the prediction without any training. Our predictor achieves a performance which exceeds 100 networks per second on a single GPU, thus creating the opportunity to perform large-scale architecture search within a few minutes. We present results of two searches performed in 400 seconds on a single GPU. Our best discovered networks reach 93.67% accuracy for CIFAR-10 and 81.01% for CIFAR-100, verified by training. These networks are performance competitive with other automatically discovered state-of-the-art networks however we only needed a small fraction of the time to solution and computational resources.
Russell Stewart
Training Tensorflow's large language model on the Penn Tree Bank yields a test perplexity of 82. It depends on your personal taste. The high temperature sample displays greater linguistic variety, but the low temperature sample is more grammatically correct. Such is the world of temperature sampling - lowering the temperature allows you to focus on higher probability output sequences and smooth over deficiencies of the model. Temperature sampling works by increasing the probability of the most likely words before sampling. Suppose I ask you what day of the week it is, and you have a 70% chance of knowing the answer.
An SVM-like Approach for Expectile Regression
Farooq, Muhammad, Steinwart, Ingo
In standard nonparametric regression analysis, most of the methods developed so far are based on the least square loss function for estimating conditional expectations. In many applications, however, it is required to study conditional distributions beyond means. A nice tool for this purpose was offered by [20] in the form of quantile regression, which allows both the location and the spread of the response variable to be studied by using asymmetric least absolute deviation loss function (ALAD). We refer the reader to [19, 37, 9, 33] and references therein, for details description and different estimation methods for quantile regression.