AITopics

2210.11612

Country:

North America > United States > California (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Badar, Maryam, Fisichella, Marco, Iosifidis, Vasileios, Nejdl, Wolfgang

Discrimination and Class Imbalance Aware Online Naive Bayes

arXiv.org Artificial IntelligenceNov-9-2022

Fairness-aware mining of massive data streams is a growing and challenging concern in the contemporary domain of machine learning. Many stream learning algorithms are used to replace humans at critical decision-making points e.g., hiring staff, assessing credit risk, etc. This calls for handling massive incoming information with minimum response delay while ensuring fair and high quality decisions. Recent discrimination-aware learning methods are optimized based on overall accuracy. However, the overall accuracy is biased in favor of the majority class; therefore, state-of-the-art methods mainly diminish discrimination by partially or completely ignoring the minority class. In this context, we propose a novel adaptation of Na\"ive Bayes to mitigate discrimination embedded in the streams while maintaining high predictive performance for both the majority and minority classes. Our proposed algorithm is simple, fast, and attains multi-objective optimization goals. To handle class imbalance and concept drifts, a dynamic instance weighting module is proposed, which gives more importance to recent instances and less importance to obsolete instances based on their membership in minority or majority class. We conducted experiments on a range of streaming and static datasets and deduced that our proposed methodology outperforms existing state-of-the-art fairness-aware methods in terms of both discrimination score and balanced accuracy.

artificial intelligence, dataset, machine learning, (16 more...)

2211.04812

Country:

North America > United States (0.46)
Europe > Germany > Lower Saxony (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Government > Regional Government (1.00)
Banking & Finance (0.86)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceNov-9-2022

AI-Bind: Improving Binding Predictions for Novel Protein Targets and Ligands

Chatterjee, Ayan, Walters, Robin, Shafi, Zohair, Ahmed, Omair Shafi, Sebek, Michael, Gysi, Deisy, Yu, Rose, Eliassi-Rad, Tina, Barabási, Albert-László, Menichetti, Giulia

Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We first unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Then, we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. We also validate these predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.

artificial intelligence, deep learning, machine learning, (17 more...)

2112.13168

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

#artificialintelligenceNov-8-2022, 17:58:42 GMT

AI and Unintended Consequences for Human Decision Making

In my last post, I argued that AI has serious implications for choice architecture. At its most extreme, so-called hypernudging has the potential to continually adapt in ways that make it more difficult for human decision-makers to eschew the preferences of the choice architect. But might AI, even when there is no obvious attempt to nudge, potentially present undesired consequences for human decision-making? Let's start with a simple example. Many people rely on GPS to help them get from Point A to Point B, especially in unfamiliar areas.

algorithm, decision-making, recommendation, (11 more...)

#artificialintelligence

Country: North America > United States > Kansas (0.05)

Industry:

Transportation > Passenger (0.40)
Transportation > Marine (0.40)
Information Technology > Services (0.30)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.33)

Scriven, Alexander, Kedziora, David Jacob, Musial, Katarzyna, Gabrys, Bogdan

The Technological Emergence of AutoML: A Survey of Performant Software and Applications in the Context of Industry

With most technical fields, there exists a delay between fundamental academic research and practical industrial uptake. Whilst some sciences have robust and well-established processes for commercialisation, such as the pharmaceutical practice of regimented drug trials, other fields face transitory periods in which fundamental academic advancements diffuse gradually into the space of commerce and industry. For the still relatively young field of Automated/Autonomous Machine Learning (AutoML/AutonoML), that transitory period is under way, spurred on by a burgeoning interest from broader society. Yet, to date, little research has been undertaken to assess the current state of this dissemination and its uptake. Thus, this review makes two primary contributions to knowledge around this topic. Firstly, it provides the most up-to-date and comprehensive survey of existing AutoML tools, both open-source and commercial. Secondly, it motivates and outlines a framework for assessing whether an AutoML solution designed for real-world application is 'performant'; this framework extends beyond the limitations of typical academic criteria, considering a variety of stakeholder needs and the human-computer interactions required to service them. Thus, additionally supported by an extensive assessment and comparison of academic and commercial case-studies, this review evaluates mainstream engagement with AutoML in the early 2020s, identifying obstacles and opportunities for accelerating future uptake.

data mining, evolutionary algorithm, programming language, (30 more...)

2211.04148

Genre:

Overview (1.00)
Research Report > Experimental Study (0.92)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > New Finding (0.67)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Software Engineering (1.00)
Information Technology > Security & Privacy (1.00)
(21 more...)

Sensitivity Estimation for Dark Matter Subhalos in Synthetic Gaia DR2 using Deep Learning

Bazarov, Abdullah, Benito, María, Hütsi, Gert, Kipper, Rain, Pata, Joosep, Põder, Sven

The abundance of dark matter (DM) subhalos orbiting a host galaxy is a generic prediction of the cosmological framework, and is a promising way to constrain the nature of DM. In this paper, we investigate the use of machine learning-based tools to quantify the magnitude of phase-space perturbations caused by the passage of DM subhalos. A simple binary classifier and an anomaly detection model are proposed to estimate if stars or star particles close to DM subhalos are statistically detectable in simulations. The simulated datasets are three Milky Way-like galaxies and nine synthetic Gaia DR2 surveys derived from these. Firstly, we find that the anomaly detection algorithm, trained on a simulated galaxy with full 6D kinematic observables and applied on another galaxy, is nontrivially sensitive to the DM subhalo population. On the other hand, the classification-based approach is not sufficiently sensitive due to the extremely low statistics of signal stars for supervised training. Finally, the sensitivity of both algorithms in the Gaia-like surveys is negligible. The enormous size of the Gaia dataset motivates the further development of scalable and accurate data analysis methods that could be used to select potential regions of interest for DM searches to ultimately constrain the Milky Way's subhalo mass function, as well as simulations where to study the sensitivity of such methods under different signal hypotheses.

data mining, machine learning, subhalo, (20 more...)

doi: 10.1016/j.ascom.2022.100667

2203.08161

Country:

Europe > Estonia > Tartu County > Tartu (0.04)
Europe > Estonia > Harju County > Tallinn (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

COPEN: Probing Conceptual Knowledge in Pre-trained Language Models

Peng, Hao, Wang, Xiaozhi, Hu, Shengding, Jin, Hailong, Hou, Lei, Li, Juanzi, Liu, Zhiyuan, Liu, Qun

Conceptual knowledge is fundamental to human cognition and knowledge bases. However, existing knowledge probing works only focus on evaluating factual knowledge of pre-trained language models (PLMs) and ignore conceptual knowledge. Since conceptual knowledge often appears as implicit commonsense behind texts, designing probes for conceptual knowledge is hard. Inspired by knowledge representation schemata, we comprehensively evaluate conceptual knowledge of PLMs by designing three tasks to probe whether PLMs organize entities by conceptual similarities, learn conceptual properties, and conceptualize entities in contexts, respectively. For the tasks, we collect and annotate 24k data instances covering 393 concepts, which is COPEN, a COnceptual knowledge Probing bENchmark. Extensive experiments on different sizes and types of PLMs show that existing PLMs systematically lack conceptual knowledge and suffer from various spurious correlations. We believe this is a critical bottleneck for realizing human-like cognition in PLMs. COPEN and our codes are publicly released at https://github.com/THU-KEG/COPEN.

artificial intelligence, knowledge, machine learning, (19 more...)

2211.04079

Country:

Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
North America > United States > California (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Minimalist Data Wrangling with Python

Gagolewski, Marek

Minimalist Data Wrangling with Python is envisaged as a student's first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different sources, transforming, selecting, and extracting features, performing exploratory data analysis and dimensionality reduction, identifying naturally occurring data clusters, modelling patterns in data, comparing data between groups, and reporting the results. This textbook is a non-profit project. Its online and PDF versions are freely available at https://datawranglingpy.gagolewski.com/.

machine learning, pattern recognition, programming language, (25 more...)

doi: 10.5281/zenodo.6451068

2211.0463

Country:

Africa > Kenya (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)
Africa > Ethiopia (0.04)
(55 more...)

Genre:

Summary/Review (1.00)
Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.67)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology (1.00)
(7 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (1.00)
Information Technology > Data Science > Data Mining (1.00)
(7 more...)

Hawkins, Richard, Picardi, Chiara, Donnell, Lucy, Ireland, Murray

Creating a Safety Assurance Case for an ML Satellite-Based Wildfire Detection and Alert System

Wildfires are a common problem in many areas of the world with often catastrophic consequences. A number of systems have been created to provide early warnings of wildfires, including those that use satellite data to detect fires. The increased availability of small satellites, such as CubeSats, allows the wildfire detection response time to be reduced by deploying constellations of multiple satellites over regions of interest. By using machine learned components on-board the satellites, constraints which limit the amount of data that can be processed and sent back to ground stations can be overcome. There are hazards associated with wildfire alert systems, such as failing to detect the presence of a wildfire, or detecting a wildfire in the incorrect location. It is therefore necessary to be able to create a safety assurance case for the wildfire alert ML component that demonstrates it is sufficiently safe for use. This paper describes in detail how a safety assurance case for an ML wildfire alert system is created. This represents the first fully developed safety case for an ML component containing explicit argument and evidence as to the safety of the machine learning.

artificial intelligence, machine learning, requirement, (12 more...)

2211.0453

Country:

Oceania > Australia (0.14)
North America > United States > Oregon (0.04)
North America > United States > North Dakota (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Mackie, Iain, Dalton, Jeffrey

Query-Specific Knowledge Graphs for Complex Finance Topics

Across the financial domain, researchers answer complex questions by extensively "searching" for relevant information to generate long-form reports. This workshop paper discusses automating the construction of query-specific document and entity knowledge graphs (KGs) for complex research topics. We focus on the CODEC dataset, where domain experts (1) create challenging questions, (2) construct long natural language narratives, and (3) iteratively search and assess the relevance of documents and entities. For the construction of query-specific KGs, we show that state-of-the-art ranking systems have headroom for improvement, with specific failings due to a lack of context or explicit knowledge representation. We demonstrate that entity and document relevance are positively correlated, and that entity-based query feedback improves document ranking effectiveness. Furthermore, we construct query-specific KGs using retrieval and evaluate using CODEC's "ground-truth graphs", showing the precision and recall trade-offs. Lastly, we point to future work, including adaptive KG retrieval algorithms and GNN-based weighting methods, while highlighting key challenges such as high-quality data, information extraction recall, and the size and sparsity of complex topic graphs.

information retrieval, machine learning, natural language, (17 more...)

2211.04142

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (0.84)

Industry: Banking & Finance > Trading (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)