Goto

Collaborating Authors

 rubia


RuBia: A Russian Language Bias Detection Dataset

Grigoreva, Veronika, Ivanova, Anastasiia, Alimova, Ilseyar, Artemova, Ekaterina

arXiv.org Artificial Intelligence

Warning: this work contains upsetting or disturbing content. Large language models (LLMs) tend to learn the social and cultural biases present in the raw pre-training data. To test if an LLM's behavior is fair, functional datasets are employed, and due to their purpose, these datasets are highly language and culture-specific. In this paper, we address a gap in the scope of multilingual bias evaluation by presenting a bias detection dataset specifically designed for the Russian language, dubbed as RuBia. The RuBia dataset is divided into 4 domains: gender, nationality, socio-economic status, and diverse, each of the domains is further divided into multiple fine-grained subdomains. Every example in the dataset consists of two sentences with the first reinforcing a potentially harmful stereotype or trope and the second contradicting it. These sentence pairs were first written by volunteers and then validated by native-speaking crowdsourcing workers. Overall, there are nearly 2,000 unique sentence pairs spread over 19 subdomains in RuBia. To illustrate the dataset's purpose, we conduct a diagnostic evaluation of state-of-the-art or near-state-of-the-art LLMs and discuss the LLMs' predisposition to social biases.


Artificial intelligence sparks new arms race

#artificialintelligence

WHAT: The United States is in a new arms race with Russia and China; the Pentagon just released (Feb. EXPERT: Tomás Díaz de la Rubia, vice president for Purdue University's Discovery Park, is an expert in emerging technologies, including AI and quantum information systems. He was also a speaker at a May 2018 White House summit, "Artificial Intelligence for American Industry." QUOTE from de la Rubia: "We live today again in a world of great power competition. Those groups and nations that innovate most effectively and dominate the AI technology landscape will not only control commercial markets but will also hold a very significant advantage in future warfare and defense. In many respects, the threat of AI-based weapons is perhaps as existential a threat to the future national security of the United States and its allies as nuclear weapons were at the end of World War II."


Improving Zillow Zestimate with 36 Lines of Code

@machinelearnbot

Zillow and Kaggle recently started a $1 million competition to improve the Zestimate. We are releasing a public Domino project that uses H2O's AutoML to generate a solution. The new Kaggle Zillow Price competition received a significant amount of press, and for good reason. Zillow has put $1 million on the line if you can improve the accuracy of their Zestimate feature. This is Zillow's estimation as to the value of a home.