data and model
A survey on bias in machine learning research
Mikołajczyk-Bareła, Agnieszka, Grochowski, Michał
Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias. However, bias was originally defined as a "systematic error," often caused by humans at different stages of the research process. This article aims to bridge the gap between past literature on bias in research by providing taxonomy for potential sources of bias and errors in data and models. The paper focus on bias in machine learning pipelines. Survey analyses over forty potential sources of bias in the machine learning (ML) pipeline, providing clear examples for each. By understanding the sources and consequences of bias in machine learning, better methods can be developed for its detecting and mitigating, leading to fairer, more transparent, and more accurate ML models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Greenland (0.04)
- Europe > Poland > Pomerania Province > Gdańsk (0.04)
- (11 more...)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (0.93)
- (7 more...)
Complying with the EU AI Act
Walters, Jacintha, Dey, Diptish, Bhaumik, Debarati, Horsman, Sophie
The EU AI Act is the proposed EU legislation concerning AI systems. This paper identifies several categories of the AI Act. Based on this categorization, a questionnaire is developed that serves as a tool to offer insights by creating quantitative data. Analysis of the data shows various challenges for organizations in different compliance categories. The influence of organization characteristics, such as size and sector, is examined to determine the impact on compliance. The paper will also share qualitative data on which questions were prevalent among respondents, both on the content of the AI Act as the application. The paper concludes by stating that there is still room for improvement in terms of compliance with the AIA and refers to a related project that examines a solution to help these organizations.
- Research Report (0.64)
- Questionnaire & Opinion Survey (0.59)
- Information Technology > Security & Privacy (1.00)
- Government (0.90)
- Law > Statutes (0.67)
Data 4 ML (Part 1): Introduction to Data Pipeline
Before diving into the art of feature engineering on the next set of articles, let's take a moment to take a look at the over‐all machine learning pipeline. In this article, we will take a look over the larger picture of the application. To that end, we'll begin with a little musing on the basic concepts like data and models. What we call data are observations of real-world phenomena. Data is an unorganized and unprocessed fact, it might be raw numbers, figures, images, words, sounds, derived from observations or measurements.
PhD student in Computing Science with focus on responsible machine learning
The Department of Computer Science, characterized by world-leading research in several scientific fields and a multitude of educations ranked highly in international comparison, is looking for a Doctoral student in computing science with a focus on responsible AI with learning from multiple representations. The Department of Computing science has been growing rapidly in recent years where focus on an inclusive and bottom-up driven environment are key elements in our sustainable growth. The 60 Doctoral students within the department consists of a diverse group from different nationalities, background and fields. If you work as a Doctoral student with us you receive the benefits of support in career development, networking, administrative and technical support functions along with good employment conditions. Is this interesting for you?
- North America > United States > Virginia (0.06)
- Europe > Sweden > Västerbotten County > Umeå (0.06)
Re-contextualizing Fairness in NLP: The Case of India
Bhatt, Shaily, Dev, Sunipa, Talukdar, Partha, Dave, Shachi, Prabhakaran, Vinodkumar
Recent research has revealed undesirable biases in NLP data and models. However, these efforts focus on social disparities in West, and are not directly portable to other geo-cultural contexts. In this paper, we focus on NLP fair-ness in the context of India. We start with a brief account of the prominent axes of social disparities in India. We build resources for fairness evaluation in the Indian context and use them to demonstrate prediction biases along some of the axes. We then delve deeper into social stereotypes for Region andReligion, demonstrating its prevalence in corpora and models. Finally, we outline a holistic research agenda to re-contextualize NLP fairness research for the Indian context, ac-counting for Indian societal context, bridging technological gaps in NLP capabilities and re-sources, and adapting to Indian cultural values. While we focus on India, this framework can be generalized to other geo-cultural contexts.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > India > Rajasthan (0.04)
- Asia > India > Mizoram (0.04)
- (12 more...)
- Health & Medicine (0.93)
- Government (0.68)
Distributed Training in Deep Learning using PyTorch: A Handy Tutorial
PyTorch has built-in packages which support distributed training. There are two approaches for running a distributed training in PyTorch. DDP always trains models faster than DP; however, it requires more lines of code change to the single-GPU code, namely, code change for the model, optimizer, and the backpropagation step. Based on our experience, the good news is that DDP could save a significant amount of train time by utilizing all GPUs at almost 100% of memory usage across multiple nodes. In the following paragraphs, we elaborate on how to use DP and DDP by providing an example for each method.
A Guide to Parallel and Distributed Deep Learning for Beginners
In recent years, we have witnessed the success of deep learning across multiple domains. But we have also seen that due to the large size and computational complexities of the models and data, the performance of the deep learning procedures is reduced. To improve the performance of these models, parallel and distributed deep learning approaches have been introduced. In this article, we are going to discuss parallel and distributed deep learning methods in detail and will try to understand how they help in speeding up the deep learning process. The major points to be discussed in this article are listed below.
Postdoc in Machine Learning and Environmental Modeling
During the past decade, the RL has envisioned and built the ARIES (ARtificial Intelligence for Environment and Sustainability) platform, a technology that integrates network-available data and model components through semantics and machine reasoning. Its underlying open-source software (k.LAB) handles the full end-to-end process of integrating data and with multiple model integration types to predict complex change. It also supports selection of the most appropriate data and models using cloud technology and following an open data paradigm: the resulting insight remains open and available to society at large, and becomes a base for further computations, contributing to an ever-increasing knowledge base. For the first time, it is possible to consistently characterize and publish data and models for their integration in predictive models, building and field-testing technologies that have eluded researchers to date. We are looking for an individual who can support strategic activities related to integrated data science and collaborative, integrated modelling on the semantic web (semantic meta-modelling).