Goto

Collaborating Authors

 closeness




Testing Probabilistic Circuits

Neural Information Processing Systems

Probabilistic circuits (PCs) are a powerful modeling framework for representing tractable probability distributions over combinatorial spaces. In machine learning and probabilistic programming, one is often interested in understanding whether the distributions learned using PCs are close to the desired distribution. Thus, given two probabilistic circuits, a fundamental problem of interest is to determine whether their distributions are close to each other.The primary contribution of this paper is a closeness test for PCs with respect to the total variation distance metric.


Learning Tractable Probabilistic Models from Inconsistent Local Estimates

Neural Information Processing Systems

Tractable probabilistic models such as cutset networks which admit exact linear time posterior marginal inference are often preferred in practice over intractable models such as Bayesian and Markov networks. This is because although tractable models, when learned from data, are slightly inferior to the intractable ones in terms of goodness-of-fit measures such as log-likelihood, they do not use approximate inference at prediction time and as a result exhibit superior predictive performance. In this paper, we consider the problem of improving a tractable model using a large number of local probability estimates, each defined over a small subset of variables that are either available from experts or via an external process. Given a model learned from fully-observed, but small amount of possibly noisy data, the key idea in our approach is to update the parameters of the model via a gradient descent procedure that seeks to minimize a convex combination of two quantities: one that enforces closeness via KL divergence to the local estimates and another that enforces closeness to the given model. We show that although the gradients are NP-hard to compute on arbitrary graphical models, they can be efficiently computed over tractable models. We show via experiments that our approach yields tractable models that are significantly superior to the ones learned from small amount of possibly noisy data, even when the local estimates are inconsistent.


A Robot That Listens: Enhancing Self-Disclosure and Engagement Through Sentiment-based Backchannels and Active Listening

Tran, Hieu, Cha, Go-Eum, Jeong, Sooyeon

arXiv.org Artificial Intelligence

As social robots get more deeply integrated intoour everyday lives, they will be expected to engage in meaningful conversations and exhibit socio-emotionally intelligent listening behaviors when interacting with people. Active listening and backchanneling could be one way to enhance robots' communicative capabilities and enhance their effectiveness in eliciting deeper self-disclosure, providing a sense of empathy,and forming positive rapport and relationships with people.Thus, we developed an LLM-powered social robot that can exhibit contextually appropriate sentiment-based backchannelingand active listening behaviors (active listening+backchanneling) and compared its efficacy in eliciting people's self-disclosurein comparison to robots that do not exhibit any of these listening behaviors (control) and a robot that only exhibitsbackchanneling behavior (backchanneling-only). Through ourexperimental study with sixty-five participants, we found theparticipants who conversed with the active listening robot per-ceived the interactions more positively, in which they exhibited the highest self-disclosures, and reported the strongest senseof being listened to. The results of our study suggest that the implementation of active listening behaviors in social robotshas the potential to improve human-robot communication andcould further contribute to the building of deeper human-robot relationships and rapport.


Private Testing of Distributions via Sample Permutations

Maryam Aliakbarpour, Ilias Diakonikolas, Daniel Kane, Ronitt Rubinfeld

Neural Information Processing Systems

Statistical tests are at the heart of many scientific tasks. To validate their hypotheses, researchers in medical and social sciences use individuals' data. The sensitivity of participants' data requires the design of statistical tests that ensure the privacy of the individuals in the most efficient way. In this paper, we use the framework of property testing to design algorithms to test the properties of the distribution that the data is drawn from with respect to differential privacy. In particular, we investigate testing two fundamental properties of distributions: (1) testing the equivalence of two distributions when we have unequal numbers of samples from the two distributions.



Method MAE(R) R2 (R) MAE(t) R2 (t) Random sampling 1.689 0.927 0.011 0.997 Closeness to other points 2.109 0.861 0.013 0.995 L

Neural Information Processing Systems

We thank reviewers for taking the time to consider our NeurIPS submission. Table 2 shows PRNet consistently outperforms PointNetLK in all settings. PRNet is on a par with PointNetLK while being slower than DCP . We will add "Deep Part Induction from Articulated Object Pairs" to related works and discuss We believe these comments will help to make the work stronger.



Disentangling Codemixing in Chats: The NUS ABC Codemixed Corpus

Churina, Svetlana, Gupta, Akshat, Mujtahid, Insyirah, Jaidka, Kokil

arXiv.org Artificial Intelligence

Code-mixing involves the seamless integration of linguistic elements from multiple languages within a single discourse, reflecting natural multilingual communication patterns. Despite its prominence in informal interactions such as social media, chat messages and instant - messaging exchanges, there has been a lack of publicly available corpora that are author - labeled and suitable for modeling human conversations and relationships. This study intro - duces the first labeled and general-purpose corpus for understanding code - mixing in context while maintaining rigorous privacy and ethi - cal standards. Our live project will continu - ously gather, verify, and integrate code - mixed messages into a structured dataset released in JSON format, accompanied by detailed metadata and linguistic statistics. To date, it includes over 355,641 messages spanning various code - mixing patterns, with a primary focus on English, Mandarin, and other languages. We expect the Codemix Corpus to serve as a foun - dational dataset for research in computational linguistics, sociolinguistics, and NLP applica - tions. Code and dataset sample can be found here.