Accuracy
A conversation on artificial intelligence and gender bias
The world celebrated Women's History Month in March, and it is a timely moment for us to look at the forces that will shape gender parity in the future. Even as the pandemic accelerates digitization and the future of work, artificial intelligence (AI) stands out as a potentially helpful--or hurtful--tool in the equity agenda. McKinsey recorded a podcast in collaboration with Citi that dives into how gender bias is reflected in AI, why we must consciously debias our machine-human interfaces, and how AI can be a positive force for gender parity. Ioana Niculcea: Before we start the conversation, I think it's important for us to spend a moment assessing the amount of change that has taken place with regard to AI, and how the pace of that change has accelerated over the past few years. And many people argue that in light of the current COVID-19 circumstance, we'll feel further acceleration as people move toward digitization. I spent the past eight years in financial services, and it all started with data. Datafication of the industry was sort of the point of origin. And we hear often that over 90 percent of the data that we have today was created over the past two years. You hear things like every minute, there's over one million Facebook logins and 4.5 million YouTube videos being streamed, or 17,000 different Uber rides. There's a lot of data, and only 1 percent of that is being analyzed, as said today.
Study suggests that AI model selection might introduce bias
Register for a free or VIP pass today. The past several years have made it clear that AI and machine learning are not a panacea when it comes to fair outcomes. Applying algorithmic solutions to social problems can magnify biases against marginalized peoples; undersampling populations always results in worse predictive accuracy. But bias in AI doesn't arise from the datasets alone. Problem formulation, or the way researchers fit tasks to AI techniques, can contribute.
Handling Climate Change Using Counterfactuals: Using Counterfactuals in Data Augmentation to Predict Crop Growth in an Uncertain Climate Future
Temraz, Mohammed, Kenny, Eoin, Ruelle, Elodie, Shalloo, Laurence, Smyth, Barry, Keane, Mark T
Climate change poses a major challenge to humanity, especially in its impact on agriculture, a challenge that a responsible AI should meet. In this paper, we examine a CBR system (PBI-CBR) designed to aid sustainable dairy farming by supporting grassland management, through accurate crop growth prediction. As climate changes, PBI-CBR's historical cases become less useful in predicting future grass growth. Hence, we extend PBI-CBR using data augmentation, to specifically handle disruptive climate events, using a counterfactual method (from XAI). Study 1 shows that historical, extreme climate-events (climate outlier cases) tend to be used by PBI-CBR to predict grass growth during climate disrupted periods. Study 2 shows that synthetic outliers, generated as counterfactuals on a outlier-boundary, improve the predictive accuracy of PBI-CBR, during the drought of 2018. This study also shows that an instance-based counterfactual method does better than a benchmark, constraint-guided method.
Towards End-to-End Neural Face Authentication in the Wild -- Quantifying and Compensating for Directional Lighting Effects
Varkarakis, Viktor, Yao, Wang, Corcoran, Peter
The recent availability of low-power neural accelerator hardware, combined with improvements in end-to-end neural facial recognition algorithms provides, enabling technology for on-device facial authentication. The present research work examines the effects of directional lighting on a State-of-Art(SoA) neural face recognizer. A synthetic re-lighting technique is used to augment data samples due to the lack of public data-sets with sufficient directional lighting variations. Top lighting and its variants (top-left, top-right) are found to have minimal effect on accuracy, while bottom-left or bottom-right directional lighting has the most pronounced effects. Following the fine-tuning of network weights, the face recognition model is shown to achieve close to the original Receiver Operating Characteristic curve (ROC)performance across all lighting conditions and demonstrates an ability to generalize beyond the lighting augmentations used in the fine-tuning data-set. This work shows that an SoA neural face recognition model can be tuned to compensate for directional lighting effects, removing the need for a pre-processing step before applying facial recognition.
AI Product Manager
I recently completed the Artificial Intelligence Product Manager Nanodegree Program on Udacity and I'd like to share a summary of everything I learned with you. This also includes bits from my experience as a technical product manager. This all a huge dump from my mind, written from the first stroke to last on my keyboard so kindly excuse any details I may miss or depths I didn't hit. It would be great to start with "why" and what motivated me to complete this program. In the past year, I've been working as a full-time product manager, sitting at the intersection of engineering and business and it's been fun. However, I'd recently been thinking deeply about the future of technology and what turns it could take.
Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking
Clรฉmenรงon, Stรฉphan, Limnios, Myrto, Vayatis, Nicolas
The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be viewed as summaries of the ROC curve. In this paper, the fact that most of these empirical criteria can be expressed as two-sample linear rank statistics is highlighted and concentration inequalities for collections of such random variables, referred to as two-sample rank processes here, are proved, when indexed by VC classes of scoring functions. Based on these nonasymptotic bounds, the generalization capacity of empirical maximizers of a wide class of ranking performance criteria is next investigated from a theoretical perspective. It is also supported by empirical evidence through convincing numerical experiments.
Bootstrapping of memetic from genetic evolution via inter-agent selection pressures
Guttenberg, Nicholas, Rosa, Marek
We create an artificial system of agents (attention-based neural networks) which selectively exchange messages with each-other in order to study the emergence of memetic evolution and how memetic evolutionary pressures interact with genetic evolution of the network weights. We observe that the ability of agents to exert selection pressures on each-other is essential for memetic evolution to bootstrap itself into a state which has both high-fidelity replication of memes, as well as continuing production of new memes over time. However, in this system there is very little interaction between this memetic 'ecology' and underlying tasks driving individual fitness - the emergent meme layer appears to be neither helpful nor harmful to agents' ability to learn to solve tasks. Sourcecode for these experiments is available at https://github.com/GoodAI/memes
Active learning using weakly supervised signals for quality inspection
Cordier, Antoine, Das, Deepan, Gutierrez, Pierre
Because manufacturing processes evolve fast, and since production visual aspect can vary significantly on a daily basis, the ability to rapidly update machine vision based inspection systems is paramount. Unfortunately, supervised learning of convolutional neural networks requires a significant amount of annotated images for being able to learn effectively from new data. Acknowledging the abundance of continuously generated images coming from the production line and the cost of their annotation, we demonstrate it is possible to prioritize and accelerate the annotation process. In this work, we develop a methodology for learning actively, from rapidly mined, weakly (i.e. partially) annotated data, enabling a fast, direct feedback from the operators on the production line and tackling a big machine vision weakness: false positives. We also consider the problem of covariate shift, which arises inevitably due to changing conditions during data acquisition. In that regard, we show domain-adversarial training to be an efficient way to address this issue.
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data
Wanyan, Tingyi, Zhang, Jing, Ding, Ying, Azad, Ariful, Wang, Zhangyang, Glicksberg, Benjamin S
Electronic Health Record (EHR) data has been of tremendous utility in Artificial Intelligence (AI) for healthcare such as predicting future clinical events. These tasks, however, often come with many challenges when using classical machine learning models due to a myriad of factors including class imbalance and data heterogeneity (i.e., the complex intra-class variances). To address some of these research gaps, this paper leverages the exciting contrastive learning framework and proposes a novel contrastive regularized clinical classification model. The contrastive loss is found to substantially augment EHR-based prediction: it effectively characterizes the similar/dissimilar patterns (by its "push-and-pull" form), meanwhile mitigating the highly skewed class distribution by learning more balanced feature spaces (as also echoed by recent findings). In particular, when naively exporting the contrastive learning to the EHR data, one hurdle is in generating positive samples, since EHR data is not as amendable to data augmentation as image data. To this end, we have introduced two unique positive sampling strategies specifically tailored for EHR data: a feature-based positive sampling that exploits the feature space neighborhood structure to reinforce the feature learning; and an attribute-based positive sampling that incorporates pre-generated patient similarity metrics to define the sample proximity. Both sampling approaches are designed with an awareness of unique high intra-class variance in EHR data. Our overall framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data with a total of 5,712 patients admitted to a large, urban health system. Specifically, our method reaches a high AUROC prediction score of 0.959, which outperforms other baselines and alternatives: cross-entropy(0.873) and focal loss(0.931).
Towards measuring fairness in AI: the Casual Conversations dataset
Hazirbas, Caner, Bitton, Joanna, Dolhansky, Brian, Pan, Jacqueline, Gordo, Albert, Ferrer, Cristian Canton
This paper introduces a novel dataset to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions. Our dataset is composed of 3,011 subjects and contains over 45,000 videos, with an average of 15 videos per person. The videos were recorded in multiple U.S. states with a diverse set of adults in various age, gender and apparent skin tone groups. A key feature is that each subject agreed to participate for their likenesses to be used. Additionally, our age and gender annotations are provided by the subjects themselves. A group of trained annotators labeled the subjects' apparent skin tone using the Fitzpatrick skin type scale. Moreover, annotations for videos recorded in low ambient lighting are also provided. As an application to measure robustness of predictions across certain attributes, we provide a comprehensive study on the top five winners of the DeepFake Detection Challenge (DFDC). Experimental evaluation shows that the winning models are less performant on some specific groups of people, such as subjects with darker skin tones and thus may not generalize to all people. In addition, we also evaluate the state-of-the-art apparent age and gender classification methods. Our experiments provides a through analysis on these models in terms of fair treatment of people from various backgrounds.