rubia
RuBia: A Russian Language Bias Detection Dataset
Grigoreva, Veronika, Ivanova, Anastasiia, Alimova, Ilseyar, Artemova, Ekaterina
Warning: this work contains upsetting or disturbing content. Large language models (LLMs) tend to learn the social and cultural biases present in the raw pre-training data. To test if an LLM's behavior is fair, functional datasets are employed, and due to their purpose, these datasets are highly language and culture-specific. In this paper, we address a gap in the scope of multilingual bias evaluation by presenting a bias detection dataset specifically designed for the Russian language, dubbed as RuBia. The RuBia dataset is divided into 4 domains: gender, nationality, socio-economic status, and diverse, each of the domains is further divided into multiple fine-grained subdomains. Every example in the dataset consists of two sentences with the first reinforcing a potentially harmful stereotype or trope and the second contradicting it. These sentence pairs were first written by volunteers and then validated by native-speaking crowdsourcing workers. Overall, there are nearly 2,000 unique sentence pairs spread over 19 subdomains in RuBia. To illustrate the dataset's purpose, we conduct a diagnostic evaluation of state-of-the-art or near-state-of-the-art LLMs and discuss the LLMs' predisposition to social biases.
- Asia > Russia (0.28)
- Europe > Ukraine (0.14)
- North America > United States > Washington > King County > Seattle (0.14)
- (9 more...)
Artificial intelligence sparks new arms race
WHAT: The United States is in a new arms race with Russia and China; the Pentagon just released (Feb. EXPERT: Tomás Díaz de la Rubia, vice president for Purdue University's Discovery Park, is an expert in emerging technologies, including AI and quantum information systems. He was also a speaker at a May 2018 White House summit, "Artificial Intelligence for American Industry." QUOTE from de la Rubia: "We live today again in a world of great power competition. Those groups and nations that innovate most effectively and dominate the AI technology landscape will not only control commercial markets but will also hold a very significant advantage in future warfare and defense. In many respects, the threat of AI-based weapons is perhaps as existential a threat to the future national security of the United States and its allies as nuclear weapons were at the end of World War II."
- Europe > Russia (0.27)
- Asia > Russia (0.27)
- Asia > China (0.27)
- North America > United States > California (0.07)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.37)
Improving Zillow Zestimate with 36 Lines of Code
Zillow and Kaggle recently started a $1 million competition to improve the Zestimate. We are releasing a public Domino project that uses H2O's AutoML to generate a solution. The new Kaggle Zillow Price competition received a significant amount of press, and for good reason. Zillow has put $1 million on the line if you can improve the accuracy of their Zestimate feature. This is Zillow's estimation as to the value of a home.