ja 0
Supplement: Scalable and Stable Surrogates for Flexible Classifiers with Fairness Constraints
All relaxations are optimized via our Lagrangian framework. All code was implemented using PyTorch, and optimized using L-BFGS. On the right, the difference framework is used to achieve equality of opportunity on COMP AS. We set the initial learning rate 0.1, which was Here we define equality of opportunity on false negative rates, i.e. predicting that someone Setting s = b, however, causes the linear relaxation to degenerate. For our deep learning experiments, we used the approach of Sec.
Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance
Ozaki, Shintaro, Hiraoka, Tatsuya, Otake, Hiroto, Ouchi, Hiroki, Isonuma, Masaru, Heinzerling, Benjamin, Inui, Kentaro, Watanabe, Taro, Miyao, Yusuke, Oseki, Yohei, Takagi, Yu
Large Language Models (LLMs) are known to process information using a proficient internal language consistently, referred to as latent language, which may differ from the input or output languages. However, how the discrepancy between the latent language and the input and output language affects downstream task performance remains largely unexplored. While many studies research the latent language of LLMs, few address its importance in influencing task performance. In our study, we hypothesize that thinking in latent language consistently enhances downstream task performance. To validate this, our work varies the input prompt languages across multiple downstream tasks and analyzes the correlation between consistency in latent language and task performance. We create datasets consisting of questions from diverse domains such as translation and geo-culture, which are influenced by the choice of latent language. Experimental results across multiple LLMs on translation and geo-culture tasks, which are sensitive to the choice of language, indicate that maintaining consistency in latent language is not always necessary for optimal downstream task performance. This is because these models adapt their internal representations near the final layers to match the target language, reducing the impact of consistency on overall performance.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
Kojima, Takeshi, Okimura, Itsuki, Iwasawa, Yusuke, Yanaka, Hitomi, Matsuo, Yutaka
Current decoder-based pre-trained language models (PLMs) successfully demonstrate multilingual capabilities. However, it is unclear how these models handle multilingualism. We analyze the neuron-level internal behavior of multilingual decoder-based PLMs, Specifically examining the existence of neurons that fire ``uniquely for each language'' within decoder-only multilingual PLMs. We analyze six languages: English, German, French, Spanish, Chinese, and Japanese, and show that language-specific neurons are unique, with a slight overlap (< 5%) between languages. These neurons are mainly distributed in the models' first and last few layers. This trend remains consistent across languages and models. Additionally, we tamper with less than 1% of the total neurons in each model during inference and demonstrate that tampering with a few language-specific neurons drastically changes the probability of target language occurrence in text generation.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
When Is It Acceptable to Break the Rules? Knowledge Representation of Moral Judgement Based on Empirical Data
Awad, Edmond, Levine, Sydney, Loreggia, Andrea, Mattei, Nicholas, Rahwan, Iyad, Rossi, Francesca, Talamadupula, Kartik, Tenenbaum, Joshua, Kleiman-Weiner, Max
One of the most remarkable things about the human moral mind is its flexibility. We can make moral judgments about cases we have never seen before. We can decide that pre-established rules should be broken. We can invent novel rules on the fly. Capturing this flexibility is one of the central challenges in developing AI systems that can interpret and produce human-like moral judgment. This paper details the results of a study of real-world decision makers who judge whether it is acceptable to break a well-established norm: ``no cutting in line.'' We gather data on how human participants judge the acceptability of line-cutting in a range of scenarios. Then, in order to effectively embed these reasoning capabilities into a machine, we propose a method for modeling them using a preference-based structure, which captures a novel modification to standard ``dual process'' theories of moral judgment.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (0.46)
- Consumer Products & Services (0.46)
Language Transfer for Early Warning of Epidemics from Social Media
Appelgren, Mattias, Schrempf, Patrick, Falis, Matúš, Ikeda, Satoshi, O'Neil, Alison Q
Statements on social media can be analysed to identify individuals who are experiencing red flag medical symptoms, allowing early detection of the spread of disease such as influenza. Since disease does not respect cultural borders and may spread between populations speaking different languages, we would like to build multilingual models. However, the data required to train models for every language may be difficult, expensive and time-consuming to obtain, particularly for low-resource languages. Taking Japanese as our target language, we explore methods by which data in one language might be used to build models for a different language. We evaluate strategies of training on machine translated data and of zero-shot transfer through the use of multilingual models. We find that the choice of source language impacts the performance, with Chinese-Japanese being a better language pair than English-Japanese. Training on machine translated data shows promise, especially when used in conjunction with a small amount of target language data.
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe (0.04)