Personal
Reviews: Piecewise Strong Convexity of Neural Networks
Originality: I am not convinced that the contributions of this paper are more significant than that of [1], which have been cited in this paper already. Specifically, in comparison with [1] in Line 82, the authors state that these conclusions apply to a smaller set in weight space. I would appreciate it if the authors could quantify the difference here and have a discussion section to show the comparison with some form of mathematical comparison. Further, there have been quite a few papers that show convergence of GD on neural networks using something like strong convexity. Clarity The paper is written quite clearly and it is easy enough to follow the paper.
Why are comedians trending toward Catholicism? One quirky comic offers a surprising explanation
Comedian Anthony Rodia discusses the comedy industry and talks about the inspiration behind his jokes on'One Nation.' Though he may be covered in tattoos from head to toe -- quite literally -- the only thing more obvious than comedian Shayne Smith's body art lately might be his newfound Catholicism. And the former motorcycle gang member is certainly in good company. Jim Gaffigan, Kevin James, Stephen Colbert, Tom Leopold, Russell Brand, and Rob Schneider are just a few other comedians who share in the same faith -- the latter half of the boisterous bunch having converted to Catholicism in their adulthood. The former half has been just as busy keeping Catholicism alive: Gaffigan recently performed at The Sheen Center for Thought & Culture, at which Cardinal Timothy Dolan is a board member; Kevin James reportedly hosted a Catholic retreat before the pandemic; and Stephen Colbert is known for teaching Sunday school.
Reviews: DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization
In this paper, the authors propose a distributed Newton method for gradient-norm optimization. The method does not impose any specific form on the underlying objective function. The authors present convergence analysis for the method and illustrate the performance of the method on a convex problem (in the main paper). Originality: The topic of the paper, in my opinion, is very interesting. The paper presents an efficient Newton method that is motivated via the optimization of the norm of the gradient.
Reviews: Adaptive Density Estimation for Generative Models
Summary: The authors propose a hybrid method that combines VAEs with adversarial training and flow based models. In particular, they derive an explicit density function p(x) where the likelihood can be evaluated, the corresponding components p(x z) are more flexible than the standard VAE that utilizes diagonal Gaussians, and the generated samples have better quality than a standard VAE. The basic idea of the proposed model is that the VAE is defined between a latent space and an intermediate representation space, and then, the representation space is connected with the data space through an invertible non-linear flow. In general, I think the paper is quite well written, but on the same time I believe that there is a lot of compressed information, and the consequence is that in some parts it is not even clear what the authors want to say (see Clarity comments). The proposed idea of the paper seems quite interesting, but on the same time I have some doubts (see Quality comments).
Review for NeurIPS paper: Continual Learning in Low-rank Orthogonal Subspaces
Weaknesses: Despite having a novel core idea, I think this paper is not ready for publication and needs substantial improvement before publication: 1. Currently it seems that you need to know T because projection matrices P_t should be constructed before starting continual learning. This is a huge limitation because the very notion of "continual learning" implies that T is not known a priori because the learning agent supposedly is learning over unlimited time periods (i.e., we may even have T\rightarrow\infty) . Currently, learning task T 1 is going to invalidate your core idea because building an orthogonal P_{T 1} does not seem to be trivial. In my opinion, this constraint should be removed. But I think this is a highly slippery assumption.
Reviews: Infra-slow brain dynamics as a marker for cognitive function and decline
The authors provide a new integrated analysis approach (allowing for simultaneous dimensionality reduction and the possibility of de-noising/artifact correction) to assess slow and infra-slow fluctuations of functional MRI data. They evaluate their approach in a very representative sample and show its potential utility by decoding the task that participants were asked to perform, while being scanned, as well as by predicting behavioral scores from the newly derived latent components as well as clinically-relevant outcomes in a clinical sample. In the following sections, I provide specific feedback with respect to originality, quality, clarity and significance. I hope you will find my comments helpful and constructive. Originality To my knowledge the proposed approach is a novel and innovative way of assessing (task-related or task-free) functional connectivity in the brain in a data-driven manner.
How to Mitigate Information Loss in Knowledge Graphs for GraphRAG: Leveraging Triple Context Restoration and Query-Driven Feedback
Huang, Manzong, Bu, Chenyang, He, Yi, Wu, Xindong
Knowledge Graph (KG)-augmented Large Language Models (LLMs) have recently propelled significant advances in complex reasoning tasks, thanks to their broad domain knowledge and contextual awareness. Unfortunately, current methods often assume KGs to be complete, which is impractical given the inherent limitations of KG construction and the potential loss of contextual cues when converting unstructured text into entity-relation triples. In response, this paper proposes the Triple Context Restoration and Query-driven Feedback (TCR-QF) framework, which reconstructs the textual context underlying each triple to mitigate information loss, while dynamically refining the KG structure by iteratively incorporating query-relevant missing knowledge. Experiments on five benchmark question-answering datasets substantiate the effectiveness of TCR-QF in KG and LLM integration, where itachieves a 29.1% improvement in Exact Match and a 15.5% improvement in F1 over its state-of-the-art GraphRAG competitors.
Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection
Yang, Bo, Guo, Jiaxian, Iwasawa, Yusuke, Matsuo, Yutaka
Recent studies have increasingly demonstrated that large language models (LLMs) possess significant theory of mind (ToM) capabilities, showing the potential for simulating the tracking of mental states in generative agents. In this study, we propose a novel paradigm called ToM-agent, designed to empower LLMs-based generative agents to simulate ToM in open-domain conversational interactions. ToM-agent disentangles the confidence from mental states, facilitating the emulation of an agent's perception of its counterpart's mental states, such as beliefs, desires, and intentions (BDIs). Using past conversation history and verbal reflections, ToM-Agent can dynamically adjust counterparts' inferred BDIs, along with related confidence levels. We further put forth a counterfactual intervention method that reflects on the gap between the predicted responses of counterparts and their real utterances, thereby enhancing the efficiency of reflection. Leveraging empathetic and persuasion dialogue datasets, we assess the advantages of implementing the ToM-agent with downstream tasks, as well as its performance in both the first-order and the \textit{second-order} ToM. Our findings indicate that the ToM-agent can grasp the underlying reasons for their counterpart's behaviors beyond mere semantic-emotional supporting or decision-making based on common sense, providing new insights for studying large-scale LLMs-based simulation of human social behaviors.
Review for NeurIPS paper: Multi-Fidelity Bayesian Optimization via Deep Neural Networks
Additional Feedback: POST-REBUTTAL: Thank you for addressing some of my concerns. I am still very keen on seeing larger scale experiments, but appreciate the novelty and technical methodology, which will be useful to the community. Overall, my sentiment of the paper has not changed and I am keeping my score at 6 -- I am still in favour of seeing it accepted, although I am not overly enthusiastic due to the concerns mentioned. In any case, I strongly encourage the authors to continue working on what seems to be a very promising research direction, and to take into account all feedback in order to improve their work. Questions: - in the experiments, why did you use different kernels for the different competing methods?
Reviews: Group Retention when Using Machine Learning in Sequential Decision Making: the Interplay between User Dynamics and Fairness
Originality: To the best of my knowledge the model of general user retention dynamics and corresponding statements evidencing negative feedback loops are novel contributions to the literature in sequential fairness works. The contributions of the paper would be clearer if citations were provided for methods and models introduced in earlier works (for example, I suggest adding citations for the fairness criteria in lines 149-158, for user departure models in 197-208, and for the statement in lines 173-174, if applicable). Since the full related work is deferred to the appendix, I see no need to cite [2, 3, 7, 10, 15, 16] without distinction between them. More context on what these works do and how they relate to your work is useful for readers to contextualize your contributions; please expand on the discussion of these papers. Quality: The simple and unifying model of sequential decision making presented is very valuable in my opinion.