Plotting

FastSurvival: Hidden Computational Blessings in Training Cox Proportional Hazards Models

Neural Information Processing Systems

Survival analysis is an important research topic with applications in healthcare, business, and manufacturing. One essential tool in this area is the Cox proportional hazards (CPH) model, which is widely used for its interpretability, flexibility, and predictive performance. However, for modern data science challenges such as high dimensionality (both n and p) and high feature correlations, current algorithms to train the CPH model have drawbacks, preventing us from using the CPH model at its full potential. The root cause is that the current algorithms, based on the Newton method, have trouble converging due to vanishing second order derivatives when outside the local region of the minimizer. To circumvent this problem, we propose new optimization methods by constructing and minimizing surrogate functions that exploit hidden mathematical structures of the CPH model. Our new methods are easy to implement and ensure monotonic loss decrease and global convergence. Empirically, we verify the computational efficiency of our methods. As a direct application, we show how our optimization methods can be used to solve the cardinality-constrained CPH problem, producing very sparse high-quality models that were not previously practical to construct. We list several extensions that our breakthrough enables, including optimization opportunities, theoretical questions on CPH's mathematical structure, as well as other CPH-related applications.


Instruction Embedding: Latent Representations of Instructions Towards Task Identification

Neural Information Processing Systems

Instruction data is crucial for improving the capability of Large Language Models (LLMs) to align with human-level performance. Recent research LIMA demonstrates that alignment is essentially a process where the model adapts instructions' interaction style or format to solve various tasks, leveraging pre-trained knowledge and skills. Therefore, for instructional data, the most important aspect is the task it represents, rather than the specific semantics and knowledge information. The latent representations of instructions play roles for some instruction-related tasks like data selection and demonstrations retrieval. However, they are always derived from text embeddings, encompass overall semantic information that influences the representation of task categories. In this work, we introduce a new concept, instruction embedding, and construct Instruction Embedding Benchmark (IEB) for its training and evaluation. Then, we propose a baseline Prompt-based Instruction Embedding (PIE) method to make the representations more attention on tasks. The evaluation of PIE, alongside other embedding methods on IEB with two designed tasks, demonstrates its superior performance in accurately identifying task categories.


A Derivation of the Score Function Estimator Given K samples, the objective being maximized is h i X (x): = E log แบ แบ: = 1 K w

Neural Information Processing Systems

The term (b) is " # (b) = E r log 1 X The derivation yields a factorized expression of the gradients " # X We present here a short derivation and direct the reader to [23] for the fine prints of the proof. This means that in a Taylor expansion in แบ Z higher order terms will be suppressed. We derive here the asymptotic expectation and variance of the gradient estimator g in the limit K!1. In the limit K!1,each term of the sum can be expanded with the approximation (29) and simplified: apple w Here we show that their overall contribution is the same order as the l = k term. These results are not in contradiction because here we are only discussing orders and not the size of terms.


Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds Valentin Liรฉvin Section for Cognitive Systems, Technical University of Denmark Andrea Dittadi 1

Neural Information Processing Systems

This paper introduces novel results for the score function gradient estimator of the importance weighted variational bound (IWAE). We prove that in the limit of large K (number of importance samples) one can choose the control variate such that the Signal-to-Noise ratio (SNR) of the estimator grows as p K. This is in contrast to the standard pathwise gradient estimator where the SNR decreases as 1/ p K. Based on our theoretical findings we develop a novel control variate that extends on VIMCO. Empirically, for the training of both continuous and discrete generative models, the proposed method yields superior variance reduction, resulting in an SNR for IWAE that increases with K without relying on the reparameterization trick. The novel estimator is competitive with state-of-the-art reparameterization-free gradient estimators such as Reweighted Wake-Sleep (RWS) and the thermodynamic variational objective (TVO) when training generative models.


Learning Diffusion Priors from Observations by Expectation Maximization

Neural Information Processing Systems

Diffusion models recently proved to be remarkable priors for Bayesian inverse problems. However, training these models typically requires access to large amounts of clean data, which could prove difficult in some settings. In this work, we present a novel method based on the expectation-maximization algorithm for training diffusion models from incomplete and noisy observations only. Unlike previous works, our method leads to proper diffusion models, which is crucial for downstream tasks. As part of our method, we propose and motivate an improved posterior sampling scheme for unconditional diffusion models.


MedJourney: Benchmark and Evaluation of Large Language Models over Patient Clinical Journey

Neural Information Processing Systems

Large language models (LLMs) have demonstrated remarkable capabilities in language understanding and generation, leading to their widespread adoption across various fields. Among these, the medical field is particularly well-suited for LLM applications, as many medical tasks can be enhanced by LLMs. Despite the existence of benchmarks for evaluating LLMs in medical question-answering and exams, there remains a notable gap in assessing LLMs' performance in supporting patients throughout their entire hospital visit journey in real-world clinical practice. In this paper, we address this gap by dividing a typical patient's clinical journey into four stages: planning, access, delivery and ongoing care. For each stage, we introduce multiple tasks and corresponding datasets, resulting in a comprehensive benchmark comprising 12 datasets, of which five are newly introduced, and seven are constructed from existing datasets. This proposed benchmark facilitates a thorough evaluation of LLMs' effectiveness across the entire patient journey, providing insights into their practical application in clinical settings. Additionally, we evaluate three categories of LLMs against this benchmark: 1) proprietary LLM services such as GPT-4; 2) public LLMs like QWen; and 3) specialized medical LLMs, like HuatuoGPT2. Through this extensive evaluation, we aim to provide a better understanding of LLMs' performance in the medical domain, ultimately contributing to their more effective deployment in healthcare settings.


50d2d2262762648589b1943078712aa6-AuthorFeedback.pdf

Neural Information Processing Systems

We thank our reviewers for taking the time to critique and improve our paper. Reviewer 1 (R1) suggests comparing with related 3D graphics synthesis work, particularly Tian et al. 2019. However, we disagree that our approach is a "small variation" of this work, as In particular, Tian et al. takes a sophisticated approach that is nonetheless specialized to graphics programs, In Tian et al., the goal is to decompose everyday objects into sub-parts and symmetries; all components are'unioned' Instead, we feel the most appropriate related work to experimentally compare with is CSGNet (Sharma et al. 2018) for We are able to successfully use a 0/1 reward during training because we bootstrap our policy with imitation learning. Reviewer 2 (R2) suggests trying more sophisticated RL training procedures. Learning the policy ฯ€ without imitation would be difficult due to the large action space (> 1.3 million R2 asks several questions about our SMC sampler that will be clarified in the revision.


Self-Supervised Visual Representation Learning from Hierarchical Grouping

Neural Information Processing Systems

We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy.


c1502ae5a4d514baec129f72948c266e-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for valuable feedback. Before addressing individual comments, we clarify common concerns. Moreover, "image-level" vs "pixel-level" training has no bearing on the validity of evaluating with Any method that uses a CNN learns more than just "image-level" representations; for Results are: ours 47.2 vs MoCo 46.9 mIOU. Suggested by R4, we retrain our model on COCO+VOC with HED edges and achieve 49.9 mIOU in above mentioned Our task is to learn pixel-wise semantic-aware embeddings from scratch. We will update the final version to reflect the full 200 training epochs.


NAOMI: Non-Autoregressive Multiresolution Sequence Imputation

Neural Information Processing Systems

Missing value imputation is a fundamental problem in spatiotemporal modeling, from motion tracking to the dynamics of physical systems. Deep autoregressive models suffer from error propagation which becomes catastrophic for imputing long-range sequences. In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-AutOregressive Multiresolution Imputation (NAOMI) to impute long-range sequences given arbitrary missing patterns. NAOMI exploits the multiresolution structure of spatiotemporal data and decodes recursively from coarse to fine-grained resolutions using a divide-andconquer strategy. We further enhance our model with adversarial training. When evaluated extensively on benchmark datasets from systems of both deterministic and stochastic dynamics. In our experiments, NAOMI demonstrates significant improvement in imputation accuracy (reducing average error by 60% compared to autoregressive counterparts) and generalization for long-range sequences.