AITopics

Timeseries partitioning is an essential step in most machine-learning driven, sensor-based IoT applications. This paper introduces a sample-efficient, robust, time-series segmentation model and algorithm. We show that by learning a representation specifically with the segmentation objective based on maximum mean discrepancy (MMD), our algorithm can robustly detect time-series events across different applications. Our loss function allows us to infer whether consecutive sequences of samples are drawn from the same distribution (null hypothesis) and determines the change-point between pairs that reject the null hypothesis (i.e., come from different distributions). We demonstrate its applicability in a real-world IoT deployment for ambient-sensing based activity recognition. Moreover, while many works on change-point detection exist in the literature, our model is significantly simpler and can be fully trained in 9-93 seconds on average with little variation in hyperparameters for data across different applications. We empirically evaluate Cadence on four popular change point detection (CPD) datasets where Cadence matches or outperforms existing CPD techniques.

artificial intelligence, data mining, machine learning, (16 more...)

2112.0336

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.68)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(4 more...)

Biswas, Biplab, Kumar, Nishith, Hoque, Md Aminul, Alam, Md Ashad

Weighted Scaling Approach for Metabolomics Data Analysis

Systematic variation is a common issue in metabolomics data analysis. Therefore, different scaling and normalization techniques are used to preprocess the data for metabolomics data analysis. Although several scaling methods are available in the literature, however, choice of scaling, transformation and/or normalization technique influence the further statistical analysis. It is challenging to choose the appropriate scaling technique for downstream analysis to get accurate results or to make a proper decision. Moreover, the existing scaling techniques are sensitive to outliers or extreme values. To fill the gap, our objective is to introduce a robust scaling approach that is not influenced by outliers as well as provides more accurate results for downstream analysis. Here, we introduced a new weighted scaling approach that is robust against outliers however, where no additional outlier detection/treatment step is needed in data preprocessing and also compared it with the conventional scaling and normalization techniques through artificial and real metabolomics datasets. We evaluated the performance of the proposed method in comparison to the other existing conventional scaling techniques using metabolomics data analysis in both the absence and presence of different percentages of outliers. Results show that in most cases, the proposed scaling technique performs better than the traditional scaling methods in both the absence and presence of outliers. The proposed method improves the further downstream metabolomics analysis. The R function of the proposed robust scaling method is available at https://github.com/nishithkumarpaul/robustScaling/blob/main/wscaling.R

artificial intelligence, data mining, machine learning, (16 more...)

2208.00603

Country:

Asia > Bangladesh (0.04)
North America > United States > New York (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.94)
Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Syeed, Miah Mohammad Asif, Farzana, Maisha, Namir, Ishadie, Ishrar, Ipshita, Nushra, Meherin Hossain, Rahman, Tanvir

Flood Prediction Using Machine Learning Models

Floods are one of nature's most catastrophic calamities which cause irreversible and immense damage to human life, agriculture, infrastructure and socio-economic system. Several studies on flood catastrophe management and flood forecasting systems have been conducted. The accurate prediction of the onset and progression of floods in real time is challenging. To estimate water levels and velocities across a large area, it is necessary to combine data with computationally demanding flood propagation models. This paper aims to reduce the extreme risks of this natural disaster and also contributes to policy suggestions by providing a prediction for floods using different machine learning models. This research will use Binary Logistic Regression, K-Nearest Neighbor (KNN), Support Vector Classifier (SVC) and Decision tree Classifier to provide an accurate prediction. With the outcome, a comparative analysis will be conducted to understand which model delivers a better accuracy.

artificial intelligence, classifier, machine learning, (14 more...)

2208.01234

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.06)
Oceania > Australia (0.05)
North America > United States (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.80)
Research Report > Experimental Study (0.80)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

On the Detection of Adaptive Adversarial Attacks in Speaker Verification Systems

Chen, Zesheng

Speaker verification systems have been widely used in smart phones and Internet of things devices to identify legitimate users. In recent work, it has been shown that adversarial attacks, such as FAKEBOB, can work effectively against speaker verification systems. The goal of this paper is to design a detector that can distinguish an original audio from an audio contaminated by adversarial attacks. Specifically, our designed detector, called MEH-FEST, calculates the minimum energy in high frequencies from the short-time Fourier transform of an audio and uses it as a detection metric. Through both analysis and experiments, we show that our proposed detector is easy to implement, fast to process an input audio, and effective in determining whether an audio is corrupted by FAKEBOB attacks. The experimental results indicate that the detector is extremely effective: with near zero false positive and false negative rates for detecting FAKEBOB attacks in Gaussian mixture model (GMM) and i-vector speaker verification systems. Moreover, adaptive adversarial attacks against our proposed detector and their countermeasures are discussed and studied, showing the game between attackers and defenders.

artificial intelligence, audio, machine learning, (18 more...)

doi: 10.1109/JIOT.2023.3267619

2202.05725

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(9 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models

Yan, Chao, Yan, Yao, Wan, Zhiyu, Zhang, Ziqi, Omberg, Larsson, Guinney, Justin, Mooney, Sean D., Malin, Bradley A.

Synthetic health data have the potential to mitigate privacy concerns when sharing data to support biomedical research and the development of innovative healthcare applications. Modern approaches for data generation based on machine learning, generative adversarial networks (GAN) methods in particular, continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a generalizable benchmarking framework to appraise key characteristics of synthetic health data with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records (EHRs) data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic EHR data. The results further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context.

artificial intelligence, data mining, machine learning, (17 more...)

doi: 10.1038/s41467-022-35295-1

2208.0123

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > Canada (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

How should we proxy for race/ethnicity? Comparing Bayesian improved surname geocoding to machine learning methods

Decter-Frain, Ari

Political science research often requires constructing a race/ethnicity proxy variable for datasets that do not contain it, like voter registration files, lists of electoral candidates, or political donation records. Constructing such a proxy is an important step for conducting ecological inference in voting rights litigation (Barreto et al. [2019], Imai and Khanna [2016]), redistricting (DeLuca and Curiel [2022], Kenny et al. [2021]), and substantive research on the role of race/ethnicity in politics (Enos [2016], Enos et al. [2019], Grumbach and Sahn [2020]). The most common method for proxying race/ethnicity is Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to compute a probability distribution over race/ethnicity categories conditional on a voter's surname and where they live (Elliott et al. [2008, 2009]). BISG has attained widespread popularity due to its parsimony, computational efficiency, and superior performance when compared to existing alternatives, namely spatial interpolation of Census racial-ethnic composition from Census geographies (Imai and Khanna [2016], Clark et al. [2021], Shah and Davis [2017]). While BISG performs well compared to the small suite of existing alternatives, it has not yet been benchmarked against machine learning (ML) models, which can produce race/ethnicity predictions from more flexible and potentially more accurate models. In this paper I present the results of such a benchmark. I train a range of machine learning models using voter registration data from Florida, Georgia, North Carolina, and a portion of California where voters self-report their race/ethnicity upon registration. The registries in these states contain over 26 million labelled observations, which equates to greater than a five percent non-representative sample of the United States electorate. I then compare BISG against predictions from these models made out-of-state.

bisg, race ethnicity, rmse, (14 more...)

2206.14583

Country:

North America > United States > Georgia (0.55)
North America > United States > North Carolina (0.25)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry: Government > Voting & Elections (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Berglind, Frej, Temam, Haron, Mukhopadhyay, Supratik, Das, Kamalika, Sajol, Md Saiful Islam, Kumar, Sricharan, Kallurupalli, Kumar

XOOD: Extreme Value Based Out-Of-Distribution Detection For Image Classification

Detecting out-of-distribution (OOD) data at inference time is crucial for many applications of machine learning. We present XOOD: a novel extreme value-based OOD detection framework for image classification that consists of two algorithms. The first, XOOD-M, is completely unsupervised, while the second XOOD-L is self-supervised. Both algorithms rely on the signals captured by the extreme values of the data in the activation layers of the neural network in order to distinguish between in-distribution and OOD instances. We show experimentally that both XOOD-M and XOOD-L outperform state-of-the-art OOD detection methods on many benchmark data sets in both efficiency and accuracy, reducing false-positive rate (FPR95) by 50%, while improving the inferencing time by an order of magnitude.

algorithm, detection, extreme value, (16 more...)

2208.00629

Country:

North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.14)
North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.84)

Chu, Yun-Wei, Hosseinalipour, Seyyedali, Tenorio, Elizabeth, Cruz, Laura, Douglas, Kerrie, Lan, Andrew, Brinton, Christopher

Mitigating Biases in Student Performance Prediction via Attention-Based Personalized Federated Learning

Traditional learning-based approaches to student modeling generalize poorly to underrepresented student groups due to biases in data availability. In this paper, we propose a methodology for predicting student performance from their online learning activities that optimizes inference accuracy over different demographic groups such as race and gender. Building upon recent foundations in federated learning, in our approach, personalized models for individual student subgroups are derived from a global model aggregated across all student models via meta-gradient updates that account for subgroup heterogeneity. To learn better representations of student activity, we augment our approach with a self-supervised behavioral pretraining methodology that leverages multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums), and include a neural network attention mechanism in the model aggregation stage. Through experiments on three real-world datasets from online courses, we demonstrate that our approach obtains substantial improvements over existing student modeling baselines in predicting student learning outcomes for all subgroups. Visual analysis of the resulting student embeddings confirm that our personalization methodology indeed identifies different activity patterns within different subgroups, consistent with its stronger inference ability compared with the baselines.

student, student subgroup, subgroup, (15 more...)

2208.01182

Country:

Oceania (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Antarctica (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.48)

Industry:

Education > Educational Setting > Online (1.00)
Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.66)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Ozyegen, Ozan, Kabe, Devika, Cevik, Mucahit

Word-level Text Highlighting of Medical Texts for Telehealth Services

The medical domain is often subject to information overload. The digitization of healthcare, constant updates to online medical repositories, and increasing availability of biomedical datasets make it challenging to effectively analyze the data. This creates additional work for medical professionals who are heavily dependent on medical data to complete their research and consult their patients. This paper aims to show how different text highlighting techniques can capture relevant medical context. This would reduce the doctors' cognitive load and response time to patients by facilitating them in making faster decisions, thus improving the overall quality of online medical services. Three different word-level text highlighting methodologies are implemented and evaluated. The first method uses TF-IDF scores directly to highlight important parts of the text. The second method is a combination of TF-IDF scores and the application of Local Interpretable Model-Agnostic Explanations to classification models. The third method uses neural networks directly to make predictions on whether or not a word should be highlighted. The results of our experiments show that the neural network approach is successful in highlighting medically-relevant terms and its performance is improved as the size of the input segment increases.

dataset, information, medical chat highlighting dataset, (14 more...)

2105.104

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Finland (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-31-2022

EMFlow: Data Imputation in Latent Space via EM and Deep Flow Models

Ma, Qi, Ghosh, Sujit K.

The presence of missing values within high-dimensional data is an ubiquitous problem for many applied sciences. A serious limitation of many available data mining and machine learning methods is their inability to handle partially missing values and so an integrated approach that combines imputation and model estimation is vital for down-stream analysis. A computationally fast algorithm, called EMFlow, is introduced that performs imputation in a latent space via an online version of Expectation-Maximization (EM) algorithm by using a normalizing flow (NF) model which maps the data space to a latent space. The proposed EMFlow algorithm is iterative, involving updating the parameters of online EM and NF alternatively. Extensive experimental results for high-dimensional multivariate and image datasets are presented to illustrate the superior performance of the EMFlow compared to a couple of recently available methods in terms of both predictive accuracy and speed of algorithmic convergence. We provide code for all our experiments.

artificial intelligence, imputation, machine learning, (18 more...)

2106.04804

Country:

North America > United States > North Carolina (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)