punct
SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models
Rathore, Vipul, Deb, Aniruddha, Chandresh, Ankish, Singla, Parag, Mausam, null
Recently, very large language models (LLMs) have shown exceptional performance on several English NLP tasks with just in-context learning (ICL), but their utility in other languages is still underexplored. We investigate their effectiveness for NLP tasks in low-resource languages (LRLs), especially in the setting of zero-labelled cross-lingual transfer (0-CLT), where no labelled training data for the target language is available -- however training data from one or more related medium-resource languages (MRLs) is utilized, alongside the available unlabeled test data for a target language. We introduce Self-Supervised Prompting (SSP), a novel ICL approach tailored for the 0-CLT setting. SSP is based on the key observation that LLMs output more accurate labels if in-context exemplars are from the target language (even if their labels are slightly noisy). To operationalize this, since target language training data is not available in 0-CLT, SSP operates in two stages. In Stage I, using source MRL training data, target language's test data is noisily labeled. In Stage II, these noisy test data points are used as exemplars in ICL for further improved labelling. Additionally, our implementation of SSP uses a novel Integer Linear Programming (ILP)-based exemplar selection that balances similarity, prediction confidence (when available) and label coverage. Experiments on three tasks and eleven LRLs (from three regions) demonstrate that SSP strongly outperforms existing SOTA fine-tuned and prompting-based baselines in 0-CLT setup.
- North America > United States (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Indonesia > Bali (0.04)
- (6 more...)
Critical Phase Transition in a Large Language Model
Nakaishi, Kai, Nishikawa, Yoshihiko, Hukushima, Koji
The performance of large language models (LLMs) strongly depends on the \textit{temperature} parameter. Empirically, at very low temperatures, LLMs generate sentences with clear repetitive structures, while at very high temperatures, generated sentences are often incomprehensible. In this study, using GPT-2, we numerically demonstrate that the difference between the two regimes is not just a smooth change but a phase transition with singular, divergent statistical quantities. Our extensive analysis shows that critical behaviors, such as a power-law decay of correlation in a text, emerge in the LLM at the transition temperature as well as in a natural language dataset. We also discuss that several statistical quantities characterizing the criticality should be useful to evaluate the performance of LLMs.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
An Improved Baseline for Sentence-level Relation Extraction
Sentence-level relation extraction (RE) aims at identifying the relationship between two entities in a sentence. Many efforts have been devoted to this problem, while the best performing methods are still far from perfect. In this paper, we revisit two problems that affect the performance of existing RE models, namely entity representation and noisy or ill-defined labels. Our improved RE baseline, incorporated with entity representations with typed markers, achieves an F1 of 74.6% on TACRED, significantly outperforms previous SOTA methods. Furthermore, the presented new baseline achieves an F1 of 91.1% on the refined Re-TACRED dataset, demonstrating that the pretrained language models (PLMs) achieve high performance on this task. We release our code to the community for future research.
- North America > United States > California (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- (4 more...)
PDT Logic: A Probabilistic Doxastic Temporal Logic for Reasoning about Beliefs in Multi-agent Systems
Martiny, Karsten, Möller, Ralf
We present Probabilistic Doxastic Temporal (PDT) Logic, a formalism to represent and reason about probabilistic beliefs and their temporal evolution in multi-agent systems. This formalism enables the quantification of agents beliefs through probability intervals and incorporates an explicit notion of time. We discuss how over time agents dynamically change their beliefs in facts, temporal rules, and other agents beliefs with respect to any new information they receive. We introduce an appropriate formal semantics for PDT Logic and show that it is decidable. Alternative options of specifying problems in PDT Logic are possible. For these problem specifications, we develop different satisfiability checking algorithms and provide complexity results for the respective decision problems. The use of probability intervals enables a formal representation of probabilistic knowledge without enforcing (possibly incorrect) exact probability values. By incorporating an explicit notion of time, PDT Logic provides enriched possibilities to represent and reason about temporal relations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Castile and León > Burgos Province > Burgos (0.04)
- Europe > Germany > Schleswig-Holstein > Lübeck (0.04)