Goto

Collaborating Authors

 challenge and pitfall


Developing Machine-Learned Potentials for Coarse-Grained Molecular Simulations: Challenges and Pitfalls

arXiv.org Artificial Intelligence

Machine learning (ML) is having increasing impact in the physical sciences, engineering, and technology, addressing research problems that range from molecular reaction mechanisms to high-throughput screening of functional materials. One strategy for representing molecules mathematically is through the use of graphs, whose nodes and edges correspond to atoms and bonds or interatomic distances, respectively. By performing multiple convolution operations on a graph, each node can influence other, increasingly distant, nodes. The use of graph neural networks has recently shown great promise in the development of improved atomistic force fields, trained on quantum mechanical calculations [1]. On the other hand, the implementation of ML for the generation of coarse grained (CG) mapping schemes [2], [3], and CG force fields required for developing hierarchical multiscale modelling schemes [4] on the basis of atomistic simulations is a less explored topic, and their application to the study of complex bulk systems is still rare [2],[5].


Challenges and Pitfalls of Bayesian Unlearning

arXiv.org Artificial Intelligence

Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model. Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data. Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated posterior by dividing out the likelihood of deleted data. However this has its own set of challenges as one often doesn't have access to the exact posterior of the model parameters. In this work we examine the use of the Laplace approximation and Variational Inference to obtain the updated posterior. With a neural network trained for a regression task as the guiding example, we draw insights on the applicability of Bayesian unlearning in practical scenarios.


Orchestrating data for machine learning pipelines

#artificialintelligence

Machine learning (ML) workloads require efficient infrastructure to yield rapid results. Model training relies heavily on large data sets. Funneling this data from storage to the training cluster is the first step of any ML workflow, which significantly impacts the efficiency of model training. This article will discuss a new solution to orchestrating data for end-to-end machine learning pipelines that addresses the above questions. I will outline common challenges and pitfalls, followed by proposing a new technique, data orchestration, to optimize the data pipeline for machine learning.


Finding Lingua Franca: The Power of AI and Linguistics for Legal Technology

#artificialintelligence

Let's face it - the meteoric rise in digital and text communication has drastically changed the way we speak to one another. This ever-evolving shift in language creates a massive burden for ediscovery teams, who need to understand how text is used in context in order to effectively use legal technology to navigate massive amounts of data. In this episode, Amanda Jones of Lighthouse joins Bill and Rob to illuminate some common challenges and pitfalls that can arise with modern language in ediscovery. Let's face it - the meteoric rise in digital and text communication has drastically changed the way we speak to one another. This ever-evolving shift in language creates a massive burden for ediscovery teams, who need to understand how text is used in context in order to effectively use legal technology to navigate massive amounts of data.


Challenges and Pitfalls of Reproducing Machine Learning Artifacts

arXiv.org Artificial Intelligence

An increasingly complex and diverse collection of Machine Learning(ML) models as well as hardware/software stacks, collectively referred to as "ML artifacts", are being proposed - leading to a diverse landscape of ML. These ML innovations proposed have outpaced researchers' ability to analyze, study and adapt them. This is exacerbated by the complicated and sometimes non-reproducible procedures for ML evaluation. The current practice of sharing ML artifacts is through repositories where artifact authors post ad-hoc code and some documentation. The authors often fail to reveal critical information for others to reproduce their results. One often fails to reproduce artifact authors' claims, not to mention adapt the model to his/her own use. This article discusses the common challenges and pitfalls of reproducing ML artifacts, which can be used as a guideline for ML researchers when sharing or reproducing artifacts.


Using Machine Learning with Health Data: The Challenges and Pitfalls - insideBIGDATA

#artificialintelligence

Applying Machine Learning (ML) to physiological data poses several challenges. While ML can be effectively used to model well-defined systems, applying it to a system as complex as the human body dictates a much more careful approach. The bottom line is that the human body is complex and subtle, and oversimplifying – as common sense sometimes impels us to do – can be hazardous to your health" (Andrew Weil). Clinicians understand when a chronically-ill patient requires attention by monitoring vital signs, as well as hundreds of other features. The human body is composed of several systems that affect each other, each with its own objectives and control mechanisms.