AITopics | civis

Collaborating Authors

civis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Xie, Laixin, Ouyang, Yang, Chen, Longfei, Wu, Ziming, Li, Quan

arXiv.org Artificial IntelligenceSep-18-2023

Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.

civis, full data, negative sample, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TVCG.2023.3285210

2309.09744

Country:

North America > United States (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.34)

Industry:

Banking & Finance (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Prediction at Scale with scikit-learn and PySpark Pandas UDFs

#artificialintelligenceOct-28-2018, 09:25:21 GMT

A common predictive modeling scenario, at least at Civis, is having a small or medium amount of labeled data to estimate a model from (e.g., 10,000 records), but a much larger unlabeled dataset to make predictions about. In this scenario, one might want to train a model on a laptop or single server with scikit-learn for ease of use and flexibility, but then apply that model to the large unlabeled dataset more quickly by distributing the computation with PySpark. Using PySpark for distributed prediction might also make sense if your ETL task is already implemented with (or would benefit from being implemented with) PySpark, which is wonderful for data transformations and ETL. PySpark has functionality to pickle python objects, including functions, and have them applied to data that is distributed across processes, machines, etc. Also, it has a pandas-like syntax but separates the definition of the computation from its execution, similar to TensorFlow.

artificial intelligence, machine learning, udf, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback