Goto

Collaborating Authors

 asymmetry


Lifted Weighted Mini-Bucket

Nicholas Gallo, Alexander T. Ihler

Neural Information Processing Systems

Many applications require computing likelihoods and marginal probabilities over a distribution defined by a graphical model, tasks which are intractable in general [24].



Rot-Pro: ModelingTransitivitybyProjectionin KnowledgeGraphEmbedding

Neural Information Processing Systems

Inthispaper,we first theoretically showthat the transitive relations can be modeled with projections. Wethen propose the Rot-Pro model which combines the projection and relational rotation together. We prove that Rot-Pro can infer all the aboverelation patterns.



662a2e96162905620397b19c9d249781-Supplemental.pdf

Neural Information Processing Systems

However,itseffectonknowledgegraph completion task remains unknown. We further compare the performance of ConE with one that does not use cone restricted rotation for modeling hierarchical relations, which we name asRotC. ConE w/o rotation is the model that applies restricted rotation in the whole embedding space for hierarchical relations. Due to larger number ofdimensions used persubspace, weuseoverlapping subspace strategytoassign relation-specific subspaces. One of the main benefits of learning embeddings in hyperbolic space is that it can model well even in low embedding dimensionalities.




On scalable oversight with weak LLMs judging strong LLMs

Neural Information Processing Systems

Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions;and compare to a baseline of direct question-answering, where the judge just answers outright without the AI.We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed.Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy.Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies.


Tanh Works Better with Asymmetry

Neural Information Processing Systems

Batch Normalization is commonly located in front of activation functions, as proposed by the original paper. Swapping the order, i.e., using Batch Normalization after activation functions, has also been attempted, but its performance is generally not much different from the conventional order when ReLU or a similar activation function is used. However, in the case of bounded activation functions like Tanh, we discovered that the swapped order achieves considerably better performance than the conventional order on various benchmarks and architectures. This paper reports this remarkable phenomenon and closely examines what contributes to this performance improvement. By looking at the output distributions of individual activation functions, not the whole layers, we found that many of them are asymmetrically saturated.


Causal vs. Anticausal merging of predictors

Neural Information Processing Systems

We study the differences arising from merging predictors in the causal and anticausal directions using the same data.In particular we study the asymmetries that arise in a simple model where we merge the predictors using one binary variable as target and two continuous variables as predictors.We use Causal Maximum Entropy (CMAXENT) as inductive bias to merge the predictors, however, we expect similar differences to hold also when we use other merging methods that take into account asymmetries between cause and effect.We show that if we observe all bivariate distributions, the CMAXENT solution reduces to a logistic regression in the causal direction and Linear Discriminant Analysis (LDA) in the anticausal direction.Furthermore, we study how the decision boundaries of these two solutions differ whenever we observe only some of the bivariate distributions implications for Out-Of-Variable (OOV) generalisation.