Romanov, Alexey
A Machine Learning Approach That Beats Large Rubik's Cubes
Chervov, Alexander, Khoruzhii, Kirill, Bukhal, Nikita, Naghiyev, Jalal, Zamkovoy, Vladislav, Koltsov, Ivan, Cheldieva, Lyudmila, Sychev, Arsenii, Lenin, Arsenii, Obozov, Mark, Urvanov, Egor, Romanov, Alexey
The paper proposes a novel machine learning-based approach to the pathfinding problem on extremely large graphs. This method leverages diffusion distance estimation via a neural network and uses beam search for pathfinding. We demonstrate its efficiency by finding solutions for 4x4x4 and 5x5x5 Rubik's cubes with unprecedentedly short solution lengths, outperforming all available solvers and introducing the first machine learning solver beyond the 3x3x3 case. In particular, it surpasses every single case of the combined best results in the Kaggle Santa 2023 challenge, which involved over 1,000 teams. For the 3x3x3 Rubik's cube, our approach achieves an optimality rate exceeding 98%, matching the performance of task-specific solvers and significantly outperforming prior solutions such as DeepCubeA (60.3%) and EfficientCube (69.6%). Additionally, our solution is more than 26 times faster in solving 3x3x3 Rubik's cubes while requiring up to 18.5 times less model training time than the most efficient state-of-the-art competitor.
Revealing the Dark Secrets of BERT
Kovaleva, Olga, Romanov, Alexey, Rogers, Anna, Rumshisky, Anna
BERT -based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT's heads. Our findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametriza-tion. While different heads consistently use the same attention patterns, they have varying impact on performance across different tasks. We show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models. 1 Introduction Over the past year, models based on the Transformer architecture (V aswani et al., 2017) have become the de-facto standard for state-of-the-art performance on many natural language processing (NLP) tasks (Radford et al., 2018; Devlin et al., 2018). Their key feature is the self-attention mechanism that provides an alternative to conventionally used recurrent neural networks (RNN).
What's in a Name? Reducing Bias in Bios without Access to Protected Attributes
Romanov, Alexey, De-Arteaga, Maria, Wallach, Hanna, Chayes, Jennifer, Borgs, Christian, Chouldechova, Alexandra, Geyik, Sahin, Kenthapadi, Krishnaram, Rumshisky, Anna, Kalai, Adam Tauman
There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These methods typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) protected attributes may not be available or it may not be legal to use them, and (2) it is often desirable to simultaneously consider multiple protected attributes, as well as their intersections. In the context of mitigating bias in occupation classification, we propose a method for discouraging correlation between the predicted probability of an individual's true occupation and a word embedding of their name. This method leverages the societal biases that are encoded in word embeddings, eliminating the need for access to protected attributes. Crucially, it only requires access to individuals' names at training time and not at deployment time. We evaluate two variations of our proposed method using a large-scale dataset of online biographies. We find that both variations simultaneously reduce race and gender biases, with almost no reduction in the classifier's overall true positive rate.
Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
De-Arteaga, Maria, Romanov, Alexey, Wallach, Hanna, Chayes, Jennifer, Borgs, Christian, Chouldechova, Alexandra, Geyik, Sahin, Kenthapadi, Krishnaram, Kalai, Adam Tauman
We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples' lives. We analyze the potential allocation harms that can result from semantic representation bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in different semantic representations of online biographies. Additionally, we quantify the bias that remains when these indicators are "scrubbed," and describe proxy behavior that occurs in the absence of explicit gender indicators. As we demonstrate, differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances.