different country
Rethinking Causal Discovery Through the Lens of Exchangeability
Brogueira, Tiago, Figueiredo, Mário
Causal discovery methods have traditionally been developed under two distinct regimes: independent and identically distributed (i.i.d.) and timeseries data, each governed by separate modelling assumptions. In this paper, we argue that the i.i.d. setting can and should be reframed in terms of exchangeability, a strictly more general symmetry principle. We present the implications of this reframing, alongside two core arguments: (1) a conceptual argument, based on extending the dependency of experimental causal inference on exchangeability to causal discovery; and (2) an empirical argument, showing that many existing i.i.d. causal-discovery methods are predicated on exchangeability assumptions, and that the sole extensive widely-used real-world "i.i.d." benchmark (the Tübingen dataset) consists mainly of exchangeable (and not i.i.d.) examples. Building on this insight, we introduce a novel synthetic dataset that enforces only the exchangeability assumption, without imposing the stronger i.i.d. assumption. We show that our exchangeable synthetic dataset mirrors the statistical structure of the real-world "i.i.d." dataset more closely than all other i.i.d. synthetic datasets. Furthermore, we demonstrate the predictive capability of this dataset by proposing a neural-network-based causal-discovery algorithm trained exclusively on our synthetic dataset, and which performs similarly to other state-of-the-art i.i.d. methods on the real-world benchmark.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.24)
- Europe > Switzerland (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (2 more...)
- Energy (0.92)
- Health & Medicine > Public Health (0.46)
On the Alignment of Large Language Models with Global Human Opinion
Liu, Yang, Kaneko, Masahiro, Chu, Chenhui
Today's large language models (LLMs) are capable of supporting multilingual scenarios, allowing users to interact with LLMs in their native languages. When LLMs respond to subjective questions posed by users, they are expected to align with the views of specific demographic groups or historical periods, shaped by the language in which the user interacts with the model. Existing studies mainly focus on researching the opinions represented by LLMs among demographic groups in the United States or a few countries, lacking worldwide country samples and studies on human opinions in different historical periods, as well as lacking discussion on using language to steer LLMs. Moreover, they also overlook the potential influence of prompt language on the alignment of LLMs' opinions. In this study, our goal is to fill these gaps. To this end, we create an evaluation framework based on the World Values Survey (WVS) to systematically assess the alignment of LLMs with human opinions across different countries, languages, and historical periods around the world. We find that LLMs appropriately or over-align the opinions with only a few countries while under-aligning the opinions with most countries. Furthermore, changing the language of the prompt to match the language used in the questionnaire can effectively steer LLMs to align with the opinions of the corresponding country more effectively than existing steering methods. At the same time, LLMs are more aligned with the opinions of the contemporary population. To our knowledge, our study is the first comprehensive investigation of the topic of opinion alignment in LLMs across global, language, and temporal dimensions. Our code and data are publicly available at https://github.com/ku-nlp/global-opinion-alignment and https://github.com/nlply/global-opinion-alignment.
- Asia > Vietnam (0.14)
- Europe > Czechia (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (65 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
Musical ethnocentrism in Large Language Models
Large Language Models (LLMs) reflect the biases in their training data and, by extension, those of the people who created this training data. Detecting, analyzing, and mitigating such biases is becoming a focus of research. One type of bias that has been understudied so far are geocultural biases. Those can be caused by an imbalance in the representation of different geographic regions and cultures in the training data, but also by value judgments contained therein. In this paper, we make a first step towards analyzing musical biases in LLMs, particularly ChatGPT and Mixtral. We conduct two experiments. In the first, we prompt LLMs to provide lists of the "Top 100" musical contributors of various categories and analyze their countries of origin. In the second experiment, we ask the LLMs to numerically rate various aspects of the musical cultures of different countries. Our results indicate a strong preference of the LLMs for Western music cultures in both experiments.
- Africa (0.15)
- South America (0.05)
- Europe > Spain (0.05)
- (5 more...)
- Media > Music (0.48)
- Leisure & Entertainment (0.48)
A longitudinal sentiment analysis of Sinophobia during COVID-19 using large language models
The COVID-19 pandemic has exacerbated xenophobia, particularly Sinophobia, leading to widespread discrimination against individuals of Chinese descent. Large language models (LLMs) are pre-trained deep learning models used for natural language processing (NLP) tasks. The ability of LLMs to understand and generate human-like text makes them particularly useful for analysing social media data to detect and evaluate sentiments. We present a sentiment analysis framework utilising LLMs for longitudinal sentiment analysis of the Sinophobic sentiments expressed in X (Twitter) during the COVID-19 pandemic. The results show a significant correlation between the spikes in Sinophobic tweets, Sinophobic sentiments and surges in COVID-19 cases, revealing that the evolution of the pandemic influenced public sentiment and the prevalence of Sinophobic discourse. Furthermore, the sentiment analysis revealed a predominant presence of negative sentiments, such as annoyance and denial, which underscores the impact of political narratives and misinformation shaping public opinion. The lack of empathetic sentiment which was present in previous studies related to COVID-19 highlights the way the political narratives in media viewed the pandemic and how it blamed the Chinese community. Our study highlights the importance of transparent communication in mitigating xenophobic sentiments during global crises.
- Asia > China > Hubei Province > Wuhan (0.06)
- Asia > India (0.06)
- Asia > Japan (0.05)
- (12 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Proximity Matters: Analyzing the Role of Geographical Proximity in Shaping AI Research Collaborations
Toobaee, Mohammadmahdi, Schiffauerova, Andrea, Ebadi, Ashkan
The role of geographical proximity in facilitating inter-regional or inter-organizational collaborations has been studied thoroughly in recent years. However, the effect of geographical proximity on forming scientific collaborations at the individual level still needs to be addressed. Using publication data in the field of artificial intelligence from 2001 to 2019, in this work, the effect of geographical proximity on the likelihood of forming future scientific collaborations among researchers is studied. In addition, the interaction between geographical and network proximities is examined to see whether network proximity can substitute geographical proximity in encouraging long-distance scientific collaborations. Employing conventional and machine learning techniques, our results suggest that geographical distance impedes scientific collaboration at the individual level despite the tremendous improvements in transportation and communication technologies during recent decades. Moreover, our findings show that the effect of network proximity on the likelihood of scientific collaboration increases with geographical distance, implying that network proximity can act as a substitute for geographical proximity.
- Europe (0.28)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (6 more...)
- Government > Regional Government (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning
Chen, Zhongzhi, Sun, Xingwu, Jiao, Xianfeng, Lian, Fengzong, Kang, Zhanhui, Wang, Di, Xu, Cheng-Zhong
Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Africa > Middle East > Egypt (0.14)
- (85 more...)
- Research Report > New Finding (1.00)
- Personal > Honors (1.00)
- Transportation > Air (1.00)
- Media > Film (1.00)
- Leisure & Entertainment > Sports (1.00)
- (29 more...)
Which country is this picture from? New data and methods for DNN-based country recognition
Alamayreh, Omran, Dimitri, Giovanna Maria, Wang, Jun, Tondi, Benedetta, Barni, Mauro
Recognizing the country where a picture has been taken has many potential applications, such as identification of fake news and prevention of disinformation campaigns. Previous works focused on the estimation of the geo-coordinates where a picture has been taken. Yet, recognizing in which country an image was taken could be more critical, from a semantic and forensic point of view, than estimating its spatial coordinates. In the above framework, this paper provides two contributions. First, we introduce the VIPPGeo dataset, containing 3.8 million geo-tagged images. Secondly, we used the dataset to train a model casting the country recognition problem as a classification problem. The experiments show that our model provides better results than the current state of the art. Notably, we found that asking the network to identify the country provides better results than estimating the geo-coordinates and then tracing them back to the country where the picture was taken.
- North America > United States (0.14)
- Asia > Japan (0.05)
- Europe > Italy (0.05)
- (16 more...)
- Government (1.00)
- Media > News (0.86)
How to Do Twitter Sentiment Analysis with a Pre-Trained Language Model
Thus, the winning strategy has been to first pre-train a transformer-based model with vast amounts of unlabelled and, consequentially, fine-tune the model to make it perform better at a specific task. This second step is usually accomplished with labeled data -- though much fewer learning examples are required in comparison to training the model from scratch. Natural Language Processing (NLP) has a large variety of tasks and applications, including Automatic, or Machine Translation, Text Summarization, Text Generation, Text Classification, Question Answering, and Named Entity Recognition (NER). The ability to develop and improve these very different types of tasks have wide-reaching possibilities for developing NLP. Recurrent Neural Networks (RNNs) got very popular in sequence modeling for supervised NLP tasks like classification and regression.
Top challenge to internet health is AI power disparity and harm, Mozilla says
The top challenge for the health of the internet is the power disparity between who benefits from AI and who is harmed by AI, Mozilla's new 2022 Internet Health reveals. Once again, this new report puts AI under the spotlight for how companies and governments use the technology. Mozilla's report scrutinized the nature of the AI-driven world citing real examples from different countries. TechRepublic spoke to Solana Larsen, Mozilla's Internet Health report editor, to shed light on the concept of "Responsible AI from the Start," black box AI, the future of regulations and how some AI projects lead by example. Larsen explains that AI systems should be built from the start considering ethics and responsibility, not tacked on at a later date when the harms begin to emerge.
- Europe (0.06)
- Oceania > Australia (0.05)
- North America > United States > New York (0.05)
- (2 more...)
- Information Technology > Security & Privacy (0.49)
- Health & Medicine > Therapeutic Area (0.31)
Top challenge to internet health is AI power disparity and harm, Mozilla says
The top challenge for the health of the internet is the power disparity between who benefits from AI and who is harmed by AI, Mozilla's new 2022 Internet Health reveals. Once again, this new report puts AI under the spotlight for how companies and governments use the technology. Mozilla's report scrutinized the nature of the AI-driven world citing real examples from different countries. TechRepublic spoke to Solana Larsen, Mozilla's Internet Health report editor, to shed light on the concept of "Responsible AI from the Start," black box AI, the future of regulations and how some AI projects lead by example. Larsen explains that AI systems should be built from the start considering ethics and responsibility, not tacked on at a later date when the harms begin to emerge.
- Europe (0.06)
- Oceania > Australia (0.05)
- North America > United States > New York (0.05)
- (2 more...)
- Information Technology (0.49)
- Health & Medicine > Therapeutic Area (0.31)