Goto

Collaborating Authors

 nbc


Direct Alignment with Heterogeneous Preferences

Neural Information Processing Systems

Alignment with human preferences is commonly framed using a universal reward function, even though human preferences are inherently heterogeneous. We formalize this heterogeneity by introducing user types and examine the limits of the homogeneity assumption. We show that aligning to heterogeneous preferences with a single policy is best achieved using the average reward across user types. However, this requires additional information about annotators. We examine improvements under different information settings, focusing on direct alignment methods. We find that minimal information can yield first-order improvements, while full feedback from each user type leads to consistent learning of the optimal policy. Surprisingly, however, no sample-efficient consistent direct loss exists in this latter setting. These results reveal a fundamental tension between consistency and sample efficiency in direct policy alignment.


eccd2a86bae4728b38627162ba297828-Paper.pdf

Neural Information Processing Systems

In contrast, we show that the computation of one PI-explanation for an NBC can be achieved in log-linear time, and that the same result also applies to the more general class of linear classifiers. Furthermore, we show that the enumeration ofPI-explanations can beobtained with polynomial delay. Experimental results demonstrate the performance gains ofthe newalgorithms when compared with earlierwork.


eccd2a86bae4728b38627162ba297828-AuthorFeedback.pdf

Neural Information Processing Systems

First, LCs and2 NBCs are extensively used in different settings, with NBCs being deemed by some as one of the top algorithms in3 datamining. Q3: We will cover the references mentioned by the reviewer. However, that is orthogonal to our work.37 If one fixes the linear model we will compute rigorous PI-explanation in log-linear time.


Explaining Naive Bayes and Other Linear Classifiers with Polynomial Time and Delay Joao Marques-Silva

Neural Information Processing Systems

In contrast, we show that the computation of one PI-explanation for an NBC can be achieved in log-linear time, and that the same result also applies to the more general class of linear classifiers. Furthermore, we show that the enumeration of PI-explanations can be obtained with polynomial delay.



Direct Alignment with Heterogeneous Preferences

arXiv.org Artificial Intelligence

This tension in assumptions is readily apparent in standard human-AI alignment methods--such as reinforcement learning from human feedback (RLHF) [6, 7, 8] and direct preference optimization (DPO) [9]--which assume a single reward function captures the interests of the entire population. We examine the limits of the preference homogeneity assumption when individuals belong to user types, each characterized by a specific reward function. Recent work has shown that in this setting, the homogeneity assumption can lead to unexpected behavior [10, 11, 12]. One challenge is that, under this assumption, learning from human preferences becomes unrealizable, as a single reward function cannot capture the complexity of population preferences with multiple reward functions [13, 14]. Both RLHF and DPO rely on maximum likelihood estimation (MLE) to optimize the reward or policy. Unrealizability implies their likelihood functions cannot fully represent the underlying preference data distribution, resulting in a nontrivial optimal MLE solution. From another perspective, learning a universal reward or policy from a heterogeneous population inherently involves an aggregation of diverse interests, and this aggregation is nontrivial. In the quest for a single policy that accommodates a heterogeneous population with multiple user types, we show that the only universal reward yielding a well-defined alignment problem is an affine Equal contribution Work done while visiting Harvard Equal advising 1 arXiv:2502.16320v1


From N-grams to Pre-trained Multilingual Models For Language Identification

arXiv.org Artificial Intelligence

In this paper, we investigate the use of N-gram models and Large Pre-trained Multilingual models for Language Identification (LID) across 11 South African languages. For N-gram models, this study shows that effective data size selection remains crucial for establishing effective frequency distributions of the target languages, that efficiently model each language, thus, improving language ranking. For pre-trained multilingual models, we conduct extensive experiments covering a diverse set of massively pre-trained multilingual (PLM) models -- mBERT, RemBERT, XLM-r, and Afri-centric multilingual models -- AfriBERTa, Afro-XLMr, AfroLM, and Serengeti. We further compare these models with available large-scale Language Identification tools: Compact Language Detector v3 (CLD V3), AfroLID, GlotLID, and OpenLID to highlight the importance of focused-based LID. From these, we show that Serengeti is a superior model across models: N-grams to Transformers on average. Moreover, we propose a lightweight BERT-based LID model (za_BERT_lid) trained with NHCLT + Vukzenzele corpus, which performs on par with our best-performing Afri-centric models.


Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems

arXiv.org Artificial Intelligence

The rapid advance of deep reinforcement learning techniques enables the oversight of safety-critical systems through the utilization of Deep Neural Networks (DNNs). This underscores the pressing need to promptly establish certified safety guarantees for such DNN-controlled systems. Most of the existing verification approaches rely on qualitative approaches, predominantly employing reachability analysis. However, qualitative verification proves inadequate for DNN-controlled systems as their behaviors exhibit stochastic tendencies when operating in open and adversarial environments. In this paper, we propose a novel framework for unifying both qualitative and quantitative safety verification problems of DNN-controlled systems. This is achieved by formulating the verification tasks as the synthesis of valid neural barrier certificates (NBCs). Initially, the framework seeks to establish almost-sure safety guarantees through qualitative verification. In cases where qualitative verification fails, our quantitative verification method is invoked, yielding precise lower and upper bounds on probabilistic safety across both infinite and finite time horizons. To facilitate the synthesis of NBCs, we introduce their $k$-inductive variants. We also devise a simulation-guided approach for training NBCs, aiming to achieve tightness in computing precise certified lower and upper bounds. We prototype our approach into a tool called $\textsf{UniQQ}$ and showcase its efficacy on four classic DNN-controlled systems.


Dean Phillips distances himself from campaign operative who reportedly paid 1 for AI-generated Biden deepfake

FOX News

Longshot Democratic presidential candidate Rep. Dean Phillips, D-Minn., is distancing himself from a report that one of his campaign's former consultants hired a magician to create a deepfake of President Biden urging New Hampshire voters not to participate in last month's primary. Paul Carpenter, a magician from New Orleans, came forward and said he had made the deepfake for 1 and that a Democratic consultant Steve Kramer had paid him 150 to do it, according to an NBC report. Kramer is a get-out-the-vote specialist who worked on ballot access for the Phillips campaign and also worked on Kanye West's unsuccessful 2020 presidential campaign. "I'm disgusted that a consultant hired to assist my campaign [with] ballot access is alleged to have faked a robocall impersonating Joe Biden," Phillips wrote on X on Friday. "While I don't know the person, such behavior is despicable and I trust will be investigated by authorities. It's also despicable that the Party actively limits access to state ballots and blackballs reputable consultants who would otherwise work with challengers like me. The corruption in politics is pervasive and must be exposed and addressed."


Israeli army appears to change tack on strike that killed Gaza journalists

Al Jazeera

The Israeli military has seemingly walked back its justification for targeting a vehicle in Gaza last week, killing two Al Jazeera journalists, United States broadcaster NBC reported. Hamza Dahdouh, the eldest son of Al Jazeera's Gaza bureau chief Wael Dahdouh, was killed in an Israeli missile strike on Sunday in Khan Younis, southern Gaza. Journalist Mustafa Thuraya was also killed in the attack, while a third passenger, journalist Hazem Rajab, was seriously injured. At the time of the attack, the Israeli army said it was targeting a "terrorist" in the vehicle. It confirmed in a statement that a military aircraft "identified and struck a terrorist who operated an aircraft that posed a threat to (Israeli) troops," adding that "we are aware of the reports that during the strike, two other suspects who were in the same vehicle as the terrorist were also hit".