AITopics

2307.12445

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-23-2023

Early Prediction of Alzheimers Disease Leveraging Symptom Occurrences from Longitudinal Electronic Health Records of US Military Veterans

Li, Rumeng, Wang, Xun, Berlowitz, Dan, Silver, Brian, Hu, Wen, Keating, Heather, Goodwin, Raelene, Liu, Weisong, Lin, Honghuang, Yu, Hong

Early prediction of Alzheimer's disease (AD) is crucial for timely intervention and treatment. This study aims to use machine learning approaches to analyze longitudinal electronic health records (EHRs) of patients with AD and identify signs and symptoms that can predict AD onset earlier. We used a case-control design with longitudinal EHRs from the U.S. Department of Veterans Affairs Veterans Health Administration (VHA) from 2004 to 2021. Cases were VHA patients with AD diagnosed after 1/1/2016 based on ICD-10-CM codes, matched 1:9 with controls by age, sex and clinical utilization with replacement. We used a panel of AD-related keywords and their occurrences over time in a patient's longitudinal EHRs as predictors for AD prediction with four machine learning models. We performed subgroup analyses by age, sex, and race/ethnicity, and validated the model in a hold-out and "unseen" VHA stations group. Model discrimination, calibration, and other relevant metrics were reported for predictions up to ten years before ICD-based diagnosis. The study population included 16,701 cases and 39,097 matched controls. The average number of AD-related keywords (e.g., "concentration", "speaking") per year increased rapidly for cases as diagnosis approached, from around 10 to over 40, while remaining flat at 10 for controls. The best model achieved high discriminative accuracy (ROCAUC 0.997) for predictions using data from at least ten years before ICD-based diagnoses. The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.99) and consistent across subgroups of age, sex and race/ethnicity, except for patients younger than 65 (ROCAUC 0.746). Machine learning models using AD-related keywords identified from EHR notes can predict future AD diagnoses, suggesting its potential use for identifying AD risk using EHR notes, offering an affordable way for early screening on large population.

artificial intelligence, diagnosis, machine learning, (17 more...)

2307.12369

Country:

North America > United States > Massachusetts > Middlesex County > Lowell (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Massachusetts > Worcester County > Worcester (0.04)
North America > United States > Alaska (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Health Care Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

arXiv.org Artificial IntelligenceJul-23-2023

LAnoBERT: System Log Anomaly Detection based on BERT Masked Language Model

Lee, Yukyung, Kim, Jina, Kang, Pilsung

The system log generated in a computer system refers to large-scale data that are collected simultaneously and used as the basic data for determining errors, intrusion and abnormal behaviors. The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention, which is a critical problem in the industry. Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template using a parser. Particularly, a template corresponding to a specific event should be defined in advance for all the log data using which the information within the log key may get lost. In this study, we propose LAnoBERT, a parser free system log anomaly detection method that uses the BERT model, exhibiting excellent natural language processing performance. The proposed method, LAnoBERT, learns the model through masked language modeling, which is a BERT-based pre-training method, and proceeds with unsupervised learning-based anomaly detection using the masked language modeling loss function per log key during the test process. In addition, we also propose an efficient inference process to establish a practically applicable pipeline to the actual system. Experiments on three well-known log datasets, i.e., HDFS, BGL, and Thunderbird, show that not only did LAnoBERT yield a higher anomaly detection performance compared to unsupervised learning-based benchmark models, but also it resulted in a comparable performance with supervised learning-based benchmark models.

artificial intelligence, data mining, machine learning, (14 more...)

2111.09564

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.58)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Named Entity Resolution in Personal Knowledge Graphs

Kejriwal, Mayank

Entity Resolution (ER) is the problem of determining when two entities refer to the same underlying entity. The problem has been studied for over 50 years, and most recently, has taken on new importance in an era of large, heterogeneous 'knowledge graphs' published on the Web and used widely in domains as wide ranging as social media, e-commerce and search. This chapter will discuss the specific problem of named ER in the context of personal knowledge graphs (PKGs). We begin with a formal definition of the problem, and the components necessary for doing high-quality and efficient ER. We also discuss some challenges that are expected to arise for Web-scale data. Next, we provide a brief literature review, with a special focus on how existing techniques can potentially apply to PKGs. We conclude the chapter by covering some applications, as well as promising directions for future research.

information retrieval, machine learning, natural language, (21 more...)

2307.12173

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)
(18 more...)

Genre:

Overview (0.87)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Information Technology > Services (0.34)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
(7 more...)

Thalhammer, Stefan, Bauer, Dominik, Hönig, Peter, Weibel, Jean-Baptiste, García-Rodríguez, José, Vincze, Markus

Challenges for Monocular 6D Object Pose Estimation in Robotics

Object pose estimation is a core perception task that enables, for example, object grasping and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved.

artificial intelligence, machine learning, pose estimation, (13 more...)

2307.12172

Country:

North America > United States (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Gomes, Nícolas Barbosa, Yoshida, Arissa, Roder, Mateus, de Oliveira, Guilherme Camargo, Papa, João Paulo

Facial Point Graphs for Amyotrophic Lateral Sclerosis Identification

Identifying Amyotrophic Lateral Sclerosis (ALS) in its early stages is essential for establishing the beginning of treatment, enriching the outlook, and enhancing the overall well-being of those affected individuals. However, early diagnosis and detecting the disease's signs is not straightforward. A simpler and cheaper way arises by analyzing the patient's facial expressions through computational methods. When a patient with ALS engages in specific actions, e.g., opening their mouth, the movement of specific facial muscles differs from that observed in a healthy individual. This paper proposes Facial Point Graphs to learn information from the geometry of facial images to identify ALS automatically. The experimental outcomes in the Toronto Neuroface dataset show the proposed approach outperformed state-of-the-art results, fostering promising developments in the area.

artificial intelligence, machine learning, repetition, (16 more...)

2307.12159

Country:

North America > Canada > Ontario > Toronto (0.25)
South America > Brazil > São Paulo (0.04)
Asia > South Korea > Daegu > Daegu (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Amyotrophic Lateral Sclerosis (ALS) (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Christodoulou, Christos, Salamanos, Nikos, Leonidou, Pantelitsa, Papadakis, Michail, Sirivianos, Michael

Identifying Misinformation on YouTube through Transcript Contextual Analysis with Transformer Models

Misinformation on YouTube is a significant concern, necessitating robust detection strategies. In this paper, we introduce a novel methodology for video classification, focusing on the veracity of the content. We convert the conventional video classification task into a text classification task by leveraging the textual content derived from the video transcripts. We employ advanced machine learning techniques like transfer learning to solve the classification challenge. Our approach incorporates two forms of transfer learning: (a) fine-tuning base transformer models such as BERT, RoBERTa, and ELECTRA, and (b) few-shot learning using sentence-transformers MPNet and RoBERTa-large. We apply the trained models to three datasets: (a) YouTube Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and (c) Fake-News dataset (a collection of articles). Including the Fake-News dataset extended the evaluation of our approach beyond YouTube videos. Using these datasets, we evaluated the models distinguishing valid information from misinformation. The fine-tuned models yielded Matthews Correlation Coefficient>0.81, accuracy>0.90, and F1 score>0.90 in two of three datasets. Interestingly, the few-shot models outperformed the fine-tuned ones by 20% in both Accuracy and F1 score for the YouTube Pseudoscience dataset, highlighting the potential utility of this approach -- especially in the context of limited training data.

information retrieval, machine learning, natural language, (18 more...)

2307.12155

Country:

Europe > Middle East > Cyprus (0.05)
Europe > Spain (0.04)
Europe > Belgium (0.04)

Genre: Research Report > New Finding (0.94)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Health & Medicine > Therapeutic Area > Vaccines (0.72)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.56)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Shenoy, Ashish V, Singhvi, Arjun, Racha, Shruthi, Tunuguntla, Srinivas

Prediction of the outcome of a Twenty-20 Cricket Match : A Machine Learning Approach

Twenty20 cricket, sometimes written Twenty-20, and often abbreviated to T20, is a short form of cricket. In a Twenty20 game the two teams of 11 players have a single innings each, which is restricted to a maximum of 20 overs. This version of cricket is especially unpredictable and is one of the reasons it has gained popularity over recent times. However, in this paper we try four different machine learning approaches for predicting the results of T20 Cricket Matches. Specifically we take in to account: previous performance statistics of the players involved in the competing teams, ratings of players obtained from reputed cricket statistics websites, clustering the players' with similar performance statistics and propose a novel method using an ELO based approach to rate players. We compare the performances of each of these feature engineering approaches by using different ML algorithms, including logistic regression, support vector machines, bayes network, decision tree, random forest.

artificial intelligence, classifier, machine learning, (19 more...)

2209.06346

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Sports > Cricket (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

arXiv.org Artificial IntelligenceJul-21-2023

Towards Better Fairness-Utility Trade-off: A Comprehensive Measurement-Based Reinforcement Learning Framework

Zhang, Simiao, Bai, Jitao, Guan, Menghong, Huang, Yihao, Zhang, Yueling, Sun, Jun, Pu, Geguang

Machine learning is widely used to make decisions with societal impact such as bank loan approving, criminal sentencing, and resume filtering. How to ensure its fairness while maintaining utility is a challenging but crucial issue. Fairness is a complex and context-dependent concept with over 70 different measurement metrics. Since existing regulations are often vague in terms of which metric to use and different organizations may prefer different fairness metrics, it is important to have means of improving fairness comprehensively. Existing mitigation techniques often target at one specific fairness metric and have limitations in improving multiple notions of fairness simultaneously. In this work, we propose CFU (Comprehensive Fairness-Utility), a reinforcement learning-based framework, to efficiently improve the fairness-utility trade-off in machine learning classifiers. A comprehensive measurement that can simultaneously consider multiple fairness notions as well as utility is established, and new metrics are proposed based on an in-depth analysis of the relationship between different fairness metrics. The reward function of CFU is constructed with comprehensive measurement and new metrics. We conduct extensive experiments to evaluate CFU on 6 tasks, 3 machine learning models, and 15 fairness-utility measurements. The results demonstrate that CFU can improve the classifier on multiple fairness metrics without sacrificing its utility. It outperforms all state-of-the-art techniques and has witnessed a 37.5% improvement on average.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2307.11379

Country:

North America > United States > New York > New York County > New York City (0.05)
Asia > Singapore (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance (0.68)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Artificial IntelligenceJul-21-2023

Differentially Private Heavy Hitter Detection using Federated Analytics

Chadha, Karan, Chen, Junye, Duchi, John, Feldman, Vitaly, Hashemi, Hanieh, Javidbakht, Omid, McMillan, Audra, Talwar, Kunal

In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning algorithm that improves the performance of the algorithm while satisfying computational, communication and privacy constraints. We explore the impact of different data-selection schemes as well as the impact of introducing deny lists during multiple runs of the algorithm. We test these improvements using extensive experimentation on the Reddit dataset~\cite{caldas2018leaf} on the task of learning the most frequent words.

artificial intelligence, machine learning, social media, (17 more...)

2307.11749

Country:

North America > United States (0.04)
Europe > Italy > Veneto > Venice (0.04)
Europe > Croatia > Zagreb County > Zagreb (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)