Supervised Learning
Distinguishability Calibration to In-Context Learning
Li, Hongjing, Yan, Hanqi, Li, Yanran, Qian, Li, He, Yulan, Gui, Lin
Recent years have witnessed increasing interests in prompt-based learning in which models can be trained on only a few annotated instances, making them suitable in low-resource settings. When using prompt-based learning for text classification, the goal is to use a pre-trained language model (PLM) to predict a missing token in a pre-defined template given an input text, which can be mapped to a class label. However, PLMs built on the transformer architecture tend to generate similar output embeddings, making it difficult to discriminate between different class labels. The problem is further exacerbated when dealing with classification tasks involving many fine-grained class labels. In this work, we alleviate this information diffusion issue, i.e., different tokens share a large proportion of similar information after going through stacked multiple self-attention layers in a transformer, by proposing a calibration method built on feature transformations through rotation and scaling to map a PLM-encoded embedding into a new metric space to guarantee the distinguishability of the resulting embeddings. Furthermore, we take the advantage of hyperbolic embeddings to capture the hierarchical relations among fine-grained class-associated token embedding by a coarse-to-fine metric learning strategy to enhance the distinguishability of the learned output embeddings. Extensive experiments on the three datasets under various settings demonstrate the effectiveness of our approach. Our code can be found at https://github.com/donttal/TARA.
Metric Space Magnitude and Generalisation in Neural Networks
Andreeva, Rayna, Limbeck, Katharina, Rieck, Bastian, Sarkar, Rik
Deep learning models have seen significant successes in numerous applications, but their inner workings remain elusive. The purpose of this work is to quantify the learning process of deep neural networks through the lens of a novel topological invariant called magnitude. Magnitude is an isometry invariant; its properties are an active area of research as it encodes many known invariants of a metric space. We use magnitude to study the internal representations of neural networks and propose a new method for determining their generalisation capabilities. Moreover, we theoretically connect magnitude dimension and the generalisation error, and demonstrate experimentally that the proposed framework can be a good indicator of the latter.
Spectral clustering in the Gaussian mixture block model
Li, Shuangping, Schramm, Tselil
Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex $i$ with a latent feature vector $u_i \in \mathbb{R}^d$ sampled from a mixture of Gaussians, and we add edge $(i,j)$ if and only if the feature vectors are sufficiently similar, in that $\langle u_i,u_j \rangle \ge \tau$ for a pre-specified threshold $\tau$. The different components of the Gaussian mixture represent the fact that there may be different types of nodes with different distributions over features -- for example, in a social network each component represents the different attributes of a distinct community. Natural algorithmic tasks associated with these networks are embedding (recovering the latent feature vectors) and clustering (grouping nodes by their mixture component). In this paper we initiate the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors $d\to \infty$ as the size of the network $n \to \infty$. This high-dimensional setting is most appropriate in the context of modern networks, in which we think of the latent feature space as being high-dimensional. We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs in the case of 2-component spherical Gaussian mixtures, and begin to sketch out the information-computation landscape for clustering and embedding in these models.
Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space
Klubiฤka, Filip, Nedumpozhimana, Vasudevan, Kelleher, John D.
The goal of this paper is to learn more about how idiomatic information is structurally encoded in embeddings, using a structural probing method. We repurpose an existing English verbal multi-word expression (MWE) dataset to suit the probing framework and perform a comparative probing study of static (GloVe) and contextual (BERT) embeddings. Our experiments indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm, leaving this an open question. We also identify some limitations of the used dataset and highlight important directions for future work in improving its suitability for a probing analysis.
Virus2Vec: Viral Sequence Classification Using Machine Learning
Ali, Sarwan, Bello, Babatunde, Chourasia, Prakash, Punathil, Ria Thazhe, Chen, Pin-Yu, Khan, Imdad Ullah, Patterson, Murray
Understanding the host-specificity of different families of viruses sheds light on the origin of, e.g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans. It enables epidemiologists, medical professionals, and policymakers to curb existing epidemics and prevent future ones promptly. In the family Coronaviridae (of which SARS-CoV-2 is a member), it is well-known that the spike protein is the point of contact between the virus and the host cell membrane. On the other hand, the two traditional mammalian orders, Carnivora (carnivores) and Chiroptera (bats) are recognized to be responsible for maintaining and spreading the Rabies Lyssavirus (RABV). We propose Virus2Vec, a feature-vector representation for viral (nucleotide or amino acid) sequences that enable vector-space-based machine learning models to identify viral hosts. Virus2Vec generates numerical feature vectors for unaligned sequences, allowing us to forego the computationally expensive sequence alignment step from the pipeline. Virus2Vec leverages the power of both the \emph{minimizer} and position weight matrix (PWM) to generate compact feature vectors. Using several classifiers, we empirically evaluate Virus2Vec on real-world spike sequences of Coronaviridae and rabies virus sequence data to predict the host (identifying the reservoirs of infection). Our results demonstrate that Virus2Vec outperforms the predictive accuracies of baseline and state-of-the-art methods.
Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2$ Normal-Form Games
Marris, Luke, Gemp, Ian, Piliouras, Georgios
Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study $2\times2$ games. The equilibrium-invariant embedding of $2\times2$ games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of $2\times2$ games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in $2\times2$ games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for $2\times2$ games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within.
Human-machine cooperation for semantic feature listing
Mukherjee, Kushin, Suresh, Siddharth, Rogers, Timothy T.
A central goal in cognitive science is to characterize human knowledge of concepts and their properties. Many have used human-generated feature lists as norms for establishing the structural relationship between concepts in the human mind (McRae et al., 2005; Devereux et al., 2014; De Deyne et al., 2008; Buchanan et al., 2019), but this requires extensive human labor. Large language models (LLMs) have recently shown impressive capabilities when generating properties of objects (Hansen & Hebart, 2022) or answering questions(Ouyang et al., 2022; Brown et al., 2020; Hoffmann et al., 2022; Chowdhery et al., 2022; Wei et al., 2021) and thus suggest an avenue for more efficient characterization of human knowledge structures, but even state-of-the-art models can routinely fail on many common-sense questions of fact. GTP3-davinci, for instance, will deny that alligators are green, while asserting that they can be used to suck dust up from surfaces. Thus, human effort can generate high-quality norms, but with prohibitive costs, while LLMs can produce norms with little human effort, but with considerably less accuracy. This paper considers whether human and machine effort can combine to efficiently estimate high-quality semantic feature vectors.
Incorporating Structured Sentences with Time-enhanced BERT for Fully-inductive Temporal Relation Prediction
Chen, Zhongwu, Xu, Chengjin, Su, Fenglong, Huang, Zhen, Dou, Yong
Temporal relation prediction in incomplete temporal knowledge graphs (TKGs) is a popular temporal knowledge graph completion (TKGC) problem in both transductive and inductive settings. Traditional embedding-based TKGC models (TKGE) rely on structured connections and can only handle a fixed set of entities, i.e., the transductive setting. In the inductive setting where test TKGs contain emerging entities, the latest methods are based on symbolic rules or pre-trained language models (PLMs). However, they suffer from being inflexible and not time-specific, respectively. In this work, we extend the fully-inductive setting, where entities in the training and test sets are totally disjoint, into TKGs and take a further step towards a more flexible and time-sensitive temporal relation prediction approach SST-BERT, incorporating Structured Sentences with Time-enhanced BERT. Our model can obtain the entity history and implicitly learn rules in the semantic space by encoding structured sentences, solving the problem of inflexibility. We propose to use a time masking MLM task to pre-train BERT in a corpus rich in temporal tokens specially generated for TKGs, enhancing the time sensitivity of SST-BERT. To compute the probability of occurrence of a target quadruple, we aggregate all its structured sentences from both temporal and semantic perspectives into a score. Experiments on the transductive datasets and newly generated fully-inductive benchmarks show that SST-BERT successfully improves over state-of-the-art baselines.
Extensions of a Theory of Networks for Approximation and Learning: Outliers and Negative Examples
Learning an input-output mapping from a set of examples can be regarded as synthesizing an approximation of a multi-dimensional function. From this point of view, this form of learning is closely related to regularization theory, and we have previously shown (Poggio and Girosi, 1990a, 1990b) the equivalence between reglilari at.ioll and a. class of three-layer networks that we call regularization networks.
Induction of Finite-State Automata Using Second-Order Recurrent Networks
Second-order recurrent networks that recognize simple finite state lan(cid:173) guages over {0,1}* are induced from positive and negative examples. Us(cid:173) ing the complete gradient of the recurrent network and sufficient training examples to constrain the definition of the language to be induced, solu(cid:173) tions are obtained that correctly recognize strings of arbitrary length. A method for extracting a finite state automaton corresponding to an opti(cid:173) mized network is demonstrated.