AITopics

2203.08409

Country:

Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.93)
Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Li, Yanke, Tobias, Hatt, Bica, Ioana, van der Schaar, Mihaela

DAPDAG: Domain Adaptation via Perturbed DAG Reconstruction

arXiv.org Artificial IntelligenceAug-2-2022

Leveraging labelled data from multiple domains to enable prediction in another domain without labels is a significant, yet challenging problem. To address this problem, we introduce the framework DAPDAG (\textbf{D}omain \textbf{A}daptation via \textbf{P}erturbed \textbf{DAG} Reconstruction) and propose to learn an auto-encoder that undertakes inference on population statistics given features and reconstructing a directed acyclic graph (DAG) as an auxiliary task. The underlying DAG structure is assumed invariant among observed variables whose conditional distributions are allowed to vary across domains led by a latent environmental variable $E$. The encoder is designed to serve as an inference device on $E$ while the decoder reconstructs each observed variable conditioned on its graphical parents in the DAG and the inferred $E$. We train the encoder and decoder jointly in an end-to-end manner and conduct experiments on synthetic and real datasets with mixed variables. Empirical results demonstrate that reconstructing the DAG benefits the approximate inference. Furthermore, our approach can achieve competitive performance against other benchmarks in prediction tasks, with better adaptation ability, especially in the target domain significantly different from the source domains.

adaptation, dataset, source domain, (14 more...)

2208.01373

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

#artificialintelligenceAug-1-2022, 23:40:22 GMT

What algorithm curate machine learning

In order to address a specific problem, practitioners must select an acceptable learning algorithm. A general rule of thumb is that for classification issues, we should use algorithms with high accuracy, whereas for regression problems, we should choose algorithms with lower accuracy but higher robustness because the absolute error rate is unimportant. Here are a few examples: Linear Regression: Linear regression uses the linearity principle to predict continuous values from a set of input variables. It achieves this by minimizing the total of squared errors. This method is fast and scalable for huge data sets since it avoids iterating over all possible replies; nonetheless, it is unstable.

activation function, algorithm, algorithm curate machine, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)

Chowdhury, Tahiya, Aldeer, Murtadha, Laghate, Shantanu, Ortiz, Jorge

Cadence: A Practical Time-series Partitioning Algorithm for Unlabeled IoT Sensor Streams

Timeseries partitioning is an essential step in most machine-learning driven, sensor-based IoT applications. This paper introduces a sample-efficient, robust, time-series segmentation model and algorithm. We show that by learning a representation specifically with the segmentation objective based on maximum mean discrepancy (MMD), our algorithm can robustly detect time-series events across different applications. Our loss function allows us to infer whether consecutive sequences of samples are drawn from the same distribution (null hypothesis) and determines the change-point between pairs that reject the null hypothesis (i.e., come from different distributions). We demonstrate its applicability in a real-world IoT deployment for ambient-sensing based activity recognition. Moreover, while many works on change-point detection exist in the literature, our model is significantly simpler and can be fully trained in 9-93 seconds on average with little variation in hyperparameters for data across different applications. We empirically evaluate Cadence on four popular change point detection (CPD) datasets where Cadence matches or outperforms existing CPD techniques.

artificial intelligence, data mining, machine learning, (16 more...)

2112.0336

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.68)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(4 more...)

Biswas, Biplab, Kumar, Nishith, Hoque, Md Aminul, Alam, Md Ashad

Weighted Scaling Approach for Metabolomics Data Analysis

Systematic variation is a common issue in metabolomics data analysis. Therefore, different scaling and normalization techniques are used to preprocess the data for metabolomics data analysis. Although several scaling methods are available in the literature, however, choice of scaling, transformation and/or normalization technique influence the further statistical analysis. It is challenging to choose the appropriate scaling technique for downstream analysis to get accurate results or to make a proper decision. Moreover, the existing scaling techniques are sensitive to outliers or extreme values. To fill the gap, our objective is to introduce a robust scaling approach that is not influenced by outliers as well as provides more accurate results for downstream analysis. Here, we introduced a new weighted scaling approach that is robust against outliers however, where no additional outlier detection/treatment step is needed in data preprocessing and also compared it with the conventional scaling and normalization techniques through artificial and real metabolomics datasets. We evaluated the performance of the proposed method in comparison to the other existing conventional scaling techniques using metabolomics data analysis in both the absence and presence of different percentages of outliers. Results show that in most cases, the proposed scaling technique performs better than the traditional scaling methods in both the absence and presence of outliers. The proposed method improves the further downstream metabolomics analysis. The R function of the proposed robust scaling method is available at https://github.com/nishithkumarpaul/robustScaling/blob/main/wscaling.R

artificial intelligence, data mining, machine learning, (16 more...)

2208.00603

Country:

Asia > Bangladesh (0.04)
North America > United States > New York (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.94)
Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Syeed, Miah Mohammad Asif, Farzana, Maisha, Namir, Ishadie, Ishrar, Ipshita, Nushra, Meherin Hossain, Rahman, Tanvir

Flood Prediction Using Machine Learning Models

Floods are one of nature's most catastrophic calamities which cause irreversible and immense damage to human life, agriculture, infrastructure and socio-economic system. Several studies on flood catastrophe management and flood forecasting systems have been conducted. The accurate prediction of the onset and progression of floods in real time is challenging. To estimate water levels and velocities across a large area, it is necessary to combine data with computationally demanding flood propagation models. This paper aims to reduce the extreme risks of this natural disaster and also contributes to policy suggestions by providing a prediction for floods using different machine learning models. This research will use Binary Logistic Regression, K-Nearest Neighbor (KNN), Support Vector Classifier (SVC) and Decision tree Classifier to provide an accurate prediction. With the outcome, a comparative analysis will be conducted to understand which model delivers a better accuracy.

artificial intelligence, classifier, machine learning, (14 more...)

2208.01234

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.06)
Oceania > Australia (0.05)
North America > United States (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.80)
Research Report > Experimental Study (0.80)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

On the Detection of Adaptive Adversarial Attacks in Speaker Verification Systems

Chen, Zesheng

Speaker verification systems have been widely used in smart phones and Internet of things devices to identify legitimate users. In recent work, it has been shown that adversarial attacks, such as FAKEBOB, can work effectively against speaker verification systems. The goal of this paper is to design a detector that can distinguish an original audio from an audio contaminated by adversarial attacks. Specifically, our designed detector, called MEH-FEST, calculates the minimum energy in high frequencies from the short-time Fourier transform of an audio and uses it as a detection metric. Through both analysis and experiments, we show that our proposed detector is easy to implement, fast to process an input audio, and effective in determining whether an audio is corrupted by FAKEBOB attacks. The experimental results indicate that the detector is extremely effective: with near zero false positive and false negative rates for detecting FAKEBOB attacks in Gaussian mixture model (GMM) and i-vector speaker verification systems. Moreover, adaptive adversarial attacks against our proposed detector and their countermeasures are discussed and studied, showing the game between attackers and defenders.

artificial intelligence, audio, machine learning, (18 more...)

doi: 10.1109/JIOT.2023.3267619

2202.05725

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(9 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models

Yan, Chao, Yan, Yao, Wan, Zhiyu, Zhang, Ziqi, Omberg, Larsson, Guinney, Justin, Mooney, Sean D., Malin, Bradley A.

Synthetic health data have the potential to mitigate privacy concerns when sharing data to support biomedical research and the development of innovative healthcare applications. Modern approaches for data generation based on machine learning, generative adversarial networks (GAN) methods in particular, continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a generalizable benchmarking framework to appraise key characteristics of synthetic health data with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records (EHRs) data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic EHR data. The results further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context.

artificial intelligence, data mining, machine learning, (17 more...)

doi: 10.1038/s41467-022-35295-1

2208.0123

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > Canada (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

How should we proxy for race/ethnicity? Comparing Bayesian improved surname geocoding to machine learning methods

Decter-Frain, Ari

Political science research often requires constructing a race/ethnicity proxy variable for datasets that do not contain it, like voter registration files, lists of electoral candidates, or political donation records. Constructing such a proxy is an important step for conducting ecological inference in voting rights litigation (Barreto et al. [2019], Imai and Khanna [2016]), redistricting (DeLuca and Curiel [2022], Kenny et al. [2021]), and substantive research on the role of race/ethnicity in politics (Enos [2016], Enos et al. [2019], Grumbach and Sahn [2020]). The most common method for proxying race/ethnicity is Bayesian Improved Surname Geocoding (BISG), which uses Bayes' rule to compute a probability distribution over race/ethnicity categories conditional on a voter's surname and where they live (Elliott et al. [2008, 2009]). BISG has attained widespread popularity due to its parsimony, computational efficiency, and superior performance when compared to existing alternatives, namely spatial interpolation of Census racial-ethnic composition from Census geographies (Imai and Khanna [2016], Clark et al. [2021], Shah and Davis [2017]). While BISG performs well compared to the small suite of existing alternatives, it has not yet been benchmarked against machine learning (ML) models, which can produce race/ethnicity predictions from more flexible and potentially more accurate models. In this paper I present the results of such a benchmark. I train a range of machine learning models using voter registration data from Florida, Georgia, North Carolina, and a portion of California where voters self-report their race/ethnicity upon registration. The registries in these states contain over 26 million labelled observations, which equates to greater than a five percent non-representative sample of the United States electorate. I then compare BISG against predictions from these models made out-of-state.

bisg, race ethnicity, rmse, (14 more...)

2206.14583

Country:

North America > United States > Georgia (0.55)
North America > United States > North Carolina (0.25)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry: Government > Voting & Elections (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Berglind, Frej, Temam, Haron, Mukhopadhyay, Supratik, Das, Kamalika, Sajol, Md Saiful Islam, Kumar, Sricharan, Kallurupalli, Kumar

XOOD: Extreme Value Based Out-Of-Distribution Detection For Image Classification

Detecting out-of-distribution (OOD) data at inference time is crucial for many applications of machine learning. We present XOOD: a novel extreme value-based OOD detection framework for image classification that consists of two algorithms. The first, XOOD-M, is completely unsupervised, while the second XOOD-L is self-supervised. Both algorithms rely on the signals captured by the extreme values of the data in the activation layers of the neural network in order to distinguish between in-distribution and OOD instances. We show experimentally that both XOOD-M and XOOD-L outperform state-of-the-art OOD detection methods on many benchmark data sets in both efficiency and accuracy, reducing false-positive rate (FPR95) by 50%, while improving the inferencing time by an order of magnitude.

algorithm, detection, extreme value, (16 more...)

2208.00629

Country:

North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.14)
North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.84)