AITopics

Usage-based insurance is becoming the new standard in vehicle insurance; it is therefore relevant to find efficient ways of using insureds' driving data. Applying anomaly detection to vehicles' trip summaries, we develop a method allowing to derive a "routine" and a "peculiarity" anomaly profile for each vehicle. To this end, anomaly detection algorithms are used to compute a routine and a peculiarity anomaly score for each trip a vehicle makes. The former measures the anomaly degree of the trip compared to the other trips made by the concerned vehicle, while the latter measures its anomaly degree compared to trips made by any vehicle. The resulting anomaly scores vectors are used as routine and peculiarity profiles. Features are then extracted from these profiles, for which we investigate the predictive power in the claim classification framework. Using real data, we find that features extracted from the vehicles' peculiarity profile improve classification.

anomaly score, data mining, machine learning, (19 more...)

2209.11763

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Insurance (1.00)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Bosley, Mitchell, Kuzushima, Saki, Enamorado, Ted, Shiraito, Yuki

Improving Probabilistic Models in Text Classification via Active Learning

Social scientists often classify text documents to use the resulting labels as an outcome or a predictor in empirical research. Automated text classification has become a standard tool, since it requires less human coding. However, scholars still need many human-labeled documents to train automated classifiers. To reduce labeling costs, we propose a new algorithm for text classification that combines a probabilistic model with active learning. The probabilistic model uses both labeled and unlabeled data, and active learning concentrates labeling efforts on difficult documents to classify. Our validation study shows that the classification performance of our algorithm is comparable to state-of-the-art methods at a fraction of the computational cost. Moreover, we replicate two recently published articles and reach the same substantive conclusions with only a small proportion of the original labeled data used in those studies. We provide activeText, an open-source software to implement our method.

machine learning, natural language, text classification, (19 more...)

2202.02629

Country:

Asia > Middle East > Syria (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Foreign Policy (1.00)
Law > Civil Rights & Constitutional Law (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
(2 more...)

Kong, Lingli, Ji, Zhengran, Xin, Huolin L.

Electron energy loss spectroscopy database synthesis and automation of core-loss edge recognition by deep-learning neural networks

The ionization edges encoded in the electron energy loss spectroscopy (EELS) spectra enable advanced material analysis including composition analyses and elemental quantifications. The development of the parallel EELS instrument and fast, sensitive detectors have greatly improved the acquisition speed of EELS spectra. However, the traditional way of core-loss edge recognition is experience based and human labor dependent, which limits the processing speed. So far, the low signal-noise ratio and the low jump ratio of the core-loss edges on the raw EELS spectra have been challenging for the automation of edge recognition. In this work, a convolutional-bidirectional long short-term memory neural network (CNN-BiLSTM) is proposed to automate the detection and elemental identification of core-loss edges from raw spectra. An EELS spectral database is synthesized by using our forward model to assist in the training and validation of the neural network. To make the synthesized spectra resemble the real spectra, we collected a large library of experimentally acquired EELS core edges. In synthesize the training library, the edges are modeled by fitting the multi-gaussian model to the real edges from experiments, and the noise and instrumental imperfectness are simulated and added. The well-trained CNN-BiLSTM network is tested against both the simulated spectra and real spectra collected from experiments. The high accuracy of the network, 94.9 %, proves that, without complicated preprocessing of the raw spectra, the proposed CNN-BiLSTM network achieves the automation of core-loss edge recognition for EELS spectra with high accuracy.

artificial intelligence, machine learning, spectra, (19 more...)

2209.13026

Country:

North America > United States > California > Orange County > Irvine (0.14)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.40)

Industry:

Energy > Energy Storage (0.46)
Electrical Industrial Apparatus (0.46)
Materials (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Prayatul Matrix: A Direct Comparison Approach to Evaluate Performance of Supervised Machine Learning Models

Biswas, Anupam

Performance comparison of supervised machine learning (ML) models are widely done in terms of different confusion matrix based scores obtained on test datasets. However, a dataset comprises several instances having different difficulty levels. Therefore, it is more logical to compare effectiveness of ML models on individual instances instead of comparing scores obtained for the entire dataset. In this paper, an alternative approach is proposed for direct comparison of supervised ML models in terms of individual instances within the dataset. A direct comparison matrix called \emph{Prayatul Matrix} is introduced, which accounts for comparative outcome of two ML algorithms on different instances of a dataset. Five different performance measures are designed based on prayatul matrix. Efficacy of the proposed approach as well as designed measures is analyzed with four classification techniques on three datasets. Also analyzed on four large-scale complex image datasets with four deep learning models namely ResNet50V2, MobileNetV2, EfficientNet, and XceptionNet. Results are evident that the newly designed measure are capable of giving more insight about the comparing ML algorithms, which were impossible with existing confusion matrix based scores like accuracy, precision and recall.

algorithm, artificial intelligence, machine learning, (16 more...)

2209.12728

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Government (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

D'Asaro, Fabio Aurelio, Raggioli, Luca, Malek, Salim, Grazioso, Marco, Rossi, Silvia

An Application of a Runtime Epistemic Probabilistic Event Calculus to Decision-making in e-Health Systems

We present and discuss a runtime architecture that integrates sensorial data and classifiers with a logic-based decision-making system in the context of an e-Health system for the rehabilitation of children with neuromotor disorders. In this application, children perform a rehabilitation task in the form of games. The main aim of the system is to derive a set of parameters the child's current level of cognitive and behavioral performance (e.g., engagement, attention, task accuracy) from the available sensors and classifiers (e.g., eye trackers, motion sensors, emotion recognition techniques) and take decisions accordingly. These decisions are typically aimed at improving the child's performance by triggering appropriate re-engagement stimuli when their attention is low, by changing the game or making it more difficult when the child is losing interest in the task as it is too easy. Alongside state-of-the-art techniques for emotion recognition and head pose estimation, we use a runtime variant of a probabilistic and epistemic logic programming dialect of the Event Calculus, known as the Epistemic Probabilistic Event Calculus. In particular, the probabilistic component of this symbolic framework allows for a natural interface with the machine learning techniques. We overview the architecture and its components, and show some of its characteristics through a discussion of a running example and experiments. Under consideration for publication in Theory and Practice of Logic Programming (TPLP).

artificial intelligence, logic & formal reasoning, machine learning, (20 more...)

2209.13043

Country:

Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Europe > Italy > Campania (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area (0.76)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Klie, Jan-Christoph, Webber, Bonnie, Gurevych, Iryna

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

arXiv.org Artificial IntelligenceSep-25-2022

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising amount of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have been devised over the years. While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets. This raises strong concerns on methods' general performance and makes it difficult to asses their strengths and weaknesses. We therefore reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets for text classification as well as token and span labeling. In addition, we define a uniform evaluation setup including a new formalization of the annotation error detection task, evaluation protocol and general best practices. To facilitate future research and reproducibility, we release our datasets and implementations in an easy-to-use and open source software package.

machine learning, natural language, text classification, (18 more...)

2206.0228

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
(48 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Sports (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.67)
(2 more...)

arXiv.org Artificial IntelligenceSep-25-2022

Can We Automate the Analysis of Online Child Sexual Exploitation Discourse?

Cook, Darren, Zilka, Miri, DeSandre, Heidi, Giles, Susan, Weller, Adrian, Maskell, Simon

Social media's growing popularity raises concerns around children's online safety. Interactions between minors and adults with predatory intentions is a particularly grave concern. Research into online sexual grooming has often relied on domain experts to manually annotate conversations, limiting both scale and scope. In this work, we test how well-automated methods can detect conversational behaviors and replace an expert human annotator. Informed by psychological theories of online grooming, we label $6772$ chat messages sent by child-sex offenders with one of eleven predatory behaviors. We train bag-of-words and natural language inference models to classify each behavior, and show that the best performing models classify behaviors in a manner that is consistent, but not on-par, with human annotation.

artificial intelligence, machine learning, natural language, (15 more...)

2209.1232

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
Oceania > New Zealand (0.04)

Genre: Research Report (0.67)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.69)

#artificialintelligenceSep-24-2022, 10:16:59 GMT

Scikit-Learn for Data Science cheat sheet.

If you're dealing with Machine Learning in Python, Scikit Learn is considered as the gold standard. Scikit-learn is an open source Python library that provides a wide selection of supervised and unsupervised learning algorithms. It implements a range of machine learning tools performing preprocessing, cross-validation and visualization using a unified interface. Scikit-learn rich set of algorithm offerings includes Regression, Clustering, Decision Trees, Neural Networks, SVMs and Naive Bayes. Corresponding use cases are presented in other sections of this site.

data science cheat sheet, scikit-learn

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.73)

Minnema, Gosse, Gemelli, Sara, Zanchi, Chiara, Caselli, Tommaso, Nissim, Malvina

Dead or Murdered? Predicting Responsibility Perception in Femicide News Reports

arXiv.org Artificial IntelligenceSep-24-2022

Different linguistic expressions can conceptualize the same event from different viewpoints by emphasizing certain participants over others. Here, we investigate a case where this has social consequences: how do linguistic expressions of gender-based violence (GBV) influence who we perceive as responsible? We build on previous psycholinguistic research in this area and conduct a large-scale perception survey of GBV descriptions automatically extracted from a corpus of Italian newspapers. We then train regression models that predict the salience of GBV participants with respect to different dimensions of perceived responsibility. Our best model (fine-tuned BERT) shows solid overall performance, with large differences between dimensions and participants: salient _focus_ is more predictable than salient _blame_, and perpetrators' salience is more predictable than victims' salience. Experiments with ridge regression models using different representations show that features based on linguistic theory similarly to word-based features. Overall, we show that different linguistic choices do trigger different perceptions of responsibility, and that such perceptions can be modelled automatically. This work can be a core instrument to raise awareness of the consequences of different perspectivizations in the general public and in news producers alike.

artificial intelligence, machine learning, natural language, (20 more...)

2209.1203

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy (0.05)
Africa > Togo > Maritime Region > Lome (0.04)
(5 more...)

Genre: Questionnaire & Opinion Survey (1.00)

Industry:

Law > Criminal Law (0.88)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.88)
Education > Educational Setting > Higher Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

#artificialintelligenceSep-23-2022, 12:55:20 GMT

Introduction to Confusion Matrix

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. The Confusion Matrix is the visual representation of the Actual VS Predicted values.

algorithm, confusion matrix, matrix, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)