AITopics

2106.00665

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.72)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

#artificialintelligenceMay-31-2021, 07:27:07 GMT

Predict Lead Score (the right way) using PyCaret

Leads are the driving force of many businesses today. With the advancement of subscription-based business models particularly in the start-up space, the ability to convert leads into paying customers is key to survival. In simple terms, a "lead" represents a potential customer interested in buying your product/service. A significant amount of time, money, and effort is spent by marketing and sales departments on lead management, a concept that we will take to encompass the three key phases of lead generation, qualification, and monetization. Lead generation is the initiation of customer interest or inquiry into the products or services of your business.

conversion, customer, pycaret, (16 more...)

#artificialintelligence

Technology:

Information Technology > Enterprise Applications > Customer Relationship Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Vasu, Rosni K, Seetharaman, Sanjay, Malaviya, Shubham, Shukla, Manish, Lodha, Sachin

Gradient-based Data Subversion Attack Against Binary Classifiers

arXiv.org Artificial IntelligenceMay-31-2021

Machine learning based data-driven technologies have shown impressive performances in a variety of application domains. Most enterprises use data from multiple sources to provide quality applications. The reliability of the external data sources raises concerns for the security of the machine learning techniques adopted. An attacker can tamper the training or test datasets to subvert the predictions of models generated by these techniques. Data poisoning is one such attack wherein the attacker tries to degrade the performance of a classifier by manipulating the training data. In this work, we focus on label contamination attack in which an attacker poisons the labels of data to compromise the functionality of the system. We develop Gradient-based Data Subversion strategies to achieve model degradation under the assumption that the attacker has limited-knowledge of the victim model. We exploit the gradients of a differentiable convex loss function (residual errors) with respect to the predicted label as a warm-start and formulate different strategies to find a set of data instances to contaminate. Further, we analyze the transferability of attacks and the susceptibility of binary classifiers. Our experiments show that the proposed approach outperforms the baselines and is computationally efficient.

attacker, classifier, dataset, (14 more...)

arXiv.org Artificial Intelligence

2105.14803

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.04)
Europe > Italy > Sardinia > Cagliari (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

arXiv.org Artificial IntelligenceMay-31-2021

Semi-orthogonal Embedding for Efficient Unsupervised Anomaly Segmentation

Kim, Jin-Hwa, Kim, Do-Hyeong, Yi, Saehoon, Lee, Taehoon

We present the efficiency of semi-orthogonal embedding for unsupervised anomaly segmentation. The multi-scale features from pre-trained CNNs are recently used for the localized Mahalanobis distances with significant performance. However, the increased feature size is problematic to scale up to the bigger CNNs, since it requires the batch-inverse of multi-dimensional covariance tensor. Here, we generalize an ad-hoc method, random feature selection, into semi-orthogonal embedding for robust approximation, cubically reducing the computational cost for the inverse of multi-dimensional covariance tensor. With the scrutiny of ablation studies, the proposed method achieves a new state-of-the-art with significant margins for the MVTec AD, KolektorSDD, KolektorSDD2, and mSTC datasets. The theoretical and empirical analyses offer insights and verification of our straightforward yet cost-effective approach.

eigenvalue, mahalanobis distance, matrix, (12 more...)

arXiv.org Artificial Intelligence

2105.14737

Country:

Asia > South Korea (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Cui, Hugo, Loureiro, Bruno, Krzakala, Florent, Zdeborová, Lenka

Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime

Kernel methods are among the most popular models in machine learning. Despite their relative simplicity, they define a powerful framework in which non-linear features can be exploited without leaving the realm of convex optimisation. Kernel methods in machine learning have a long and rich literature dating back to the 60s [1, 2], but have recently made it back to the spotlight as a proxy for studying neural networks in different regimes, e.g. the infinite width limit [3-6] and the lazy regime of training [7]. Despite being defined in terms of a non-parametric optimisation problem, kernel methods can be mathematically understood as a standard parametric linear problem in a (possibly infinite) Hilbert space spanned by the kernel eigenvectors (a.k.a features). This dual picture fully characterizes the asymptotic performance of kernels in terms of a trade-off between two key quantities: the relative decay of the eigenvalues of the kernel (a.k.a.

crossover, decay, regime, (15 more...)

2105.15004

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Switzerland (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

Azad, Amar Prakash, Ghosh, Supriyo, Gupta, Ajay, Kumar, Harshit, Mohapatra, Prateeti

Picking Pearl From Seabed: Extracting Artefacts from Noisy Issue Triaging Collaborative Conversations for Hybrid Cloud Services

arXiv.org Artificial IntelligenceMay-31-2021

Site Reliability Engineers (SREs) play a key role in issue identification and resolution. After an issue is reported, SREs come together in a virtual room (collaboration platform) to triage the issue. While doing so, they leave behind a wealth of information which can be used later for triaging similar issues. However, usability of the conversations offer challenges due to them being i) noisy and ii) unlabelled. This paper presents a novel approach for issue artefact extraction from the noisy conversations with minimal labelled data. We propose a combination of unsupervised and supervised model with minimum human intervention that leverages domain knowledge to predict artefacts for a small amount of conversation data and use that for fine-tuning an already pretrained language model for artefact prediction on a large amount of conversation data. Experimental results on our dataset show that the proposed ensemble of unsupervised and supervised model is better than using either one of them individually.

artefact, latexit sha1, utterance, (12 more...)

arXiv.org Artificial Intelligence

2105.15065

Country:

North America > United States (0.04)
Europe > Czechia > Prague (0.04)
Asia > Singapore (0.04)
Asia > India (0.04)

Genre: Research Report (0.84)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Early Detection of COVID-19 Hotspots Using Spatio-Temporal Data

Zhu, Shixiang, Bukharin, Alexander, Xie, Liyan, Yang, Shihao, Keskinocak, Pinar, Xie, Yao

Recently, the Centers for Disease Control and Prevention (CDC) has worked with other federal agencies to identify counties with increasing coronavirus disease 2019 (COVID-19) incidence (hotspots) and offers support to local health departments to limit the spread of the disease. Understanding the spatio-temporal dynamics of hotspot events is of great importance to support policy decisions and prevent large-scale outbreaks. This paper presents a spatio-temporal Bayesian framework for early detection of COVID-19 hotspots (at the county level) in the United States. We assume both the observed number of cases and hotspots depend on a class of latent random variables, which encode the underlying spatio-temporal dynamics of the transmission of COVID-19. Such latent variables follow a zero-mean Gaussian process, whose covariance is specified by a non-stationary kernel function. The most salient feature of our kernel function is that deep neural networks are introduced to enhance the model's representative power while still enjoying the interpretability of the kernel. We derive a sparse model and fit the model using a variational learning strategy to circumvent the computational intractability for large data sets. Our model demonstrates better interpretability and superior hotspot-detection performance compared to other baseline methods.

hotspot, kernel, prediction, (17 more...)

2106.00072

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(10 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Government > Regional Government > North America Government > United States Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
(2 more...)

Shah, Kulin, Gupta, Pooja, Deshpande, Amit, Bhattacharyya, Chiranjib

Rawlsian Fair Adaptation of Deep Learning Classifiers

Group-fairness in classification aims for equality of a predictive utility across different sensitive sub-populations, e.g., race or gender. Equality or near-equality constraints in group-fairness often worsen not only the aggregate utility but also the utility for the least advantaged sub-population. In this paper, we apply the principles of Pareto-efficiency and least-difference to the utility being accuracy, as an illustrative example, and arrive at the Rawls classifier that minimizes the error rate on the worst-off sensitive sub-population. Our mathematical characterization shows that the Rawls classifier uniformly applies a threshold to an ideal score of features, in the spirit of fair equality of opportunity. In practice, such a score or a feature representation is often computed by a black-box model that has been useful but unfair. Our second contribution is practical Rawlsian fair adaptation of any given black-box deep learning model, without changing the score or feature representation it computes. Given any score function or feature representation and only its second-order statistics on the sensitive sub-populations, we seek a threshold classifier on the given score or a linear threshold classifier on the given feature representation that achieves the Rawls error rate restricted to this hypothesis class. Our technical contribution is to formulate the above problems using ambiguous chance constraints, and to provide efficient algorithms for Rawlsian fair adaptation, along with provable upper bounds on the Rawls error rate. Our empirical results show significant improvement over state-of-the-art group-fair algorithms, even without retraining for fairness.

classifier, error rate, rawl classifier, (15 more...)

doi: 10.1145/3461702.3462592

2105.1489

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Law > Civil Rights & Constitutional Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Cabello, Nestor, Naghizade, Elham, Qi, Jianzhong, Kulik, Lars

Fast, Accurate and Interpretable Time Series Classification Through Randomization

Time series classification (TSC) aims to predict the class label of a given time series, which is critical to a rich set of application areas such as economics and medicine. State-of-the-art TSC methods have mostly focused on classification accuracy and efficiency, without considering the interpretability of their classifications, which is an important property required by modern applications such as appliance modeling and legislation such as the European General Data Protection Regulation. To address this gap, we propose a novel TSC method - the Randomized-Supervised Time Series Forest (r-STSF). r-STSF is highly efficient, achieves state-of-the-art classification accuracy and enables interpretability. r-STSF takes an efficient interval-based approach to classify time series according to aggregate values of discriminatory sub-series (intervals). To achieve state-of-the-art accuracy, r-STSF builds an ensemble of randomized trees using the discriminatory sub-series. It uses four time series representations, nine aggregation functions and a supervised binary-inspired search combined with a feature ranking metric to identify highly discriminatory sub-series. The discriminatory sub-series enable interpretable classifications. Experiments on extensive datasets show that r-STSF achieves state-of-the-art accuracy while being orders of magnitude faster than most existing TSC methods. It is the only classifier from the state-of-the-art group that enables interpretability. Our findings also highlight that r-STSF is the best TSC method when classifying complex time series datasets.

accuracy, interval feature, representation, (14 more...)

2105.14876

Country: Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.86)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
(2 more...)

Haben, Stephen, Arora, Siddharth, Giasemidis, Georgios, Voss, Marcus, Greetham, Danica Vukadinovic

Review of Low-Voltage Load Forecasting: Methods, Applications, and Recommendations

arXiv.org Machine LearningMay-30-2021

The increased digitalisation and monitoring of the energy system opens up numerous opportunities % and solutions which can help to decarbonise the energy system. Applications on low voltage (LV), localised networks, such as community energy markets and smart storage will facilitate decarbonisation, but they will require advanced control and management. Reliable forecasting will be a necessary component of many of these systems to anticipate key features and uncertainties. Despite this urgent need, there has not yet been an extensive investigation into the current state-of-the-art of low voltage level forecasts, other than at the smart meter level. This paper aims to provide a comprehensive overview of the landscape, current approaches, core applications, challenges and recommendations. Another aim of this paper is to facilitate the continued improvement and advancement in this area. To this end, the paper also surveys some of the most relevant and promising trends. It establishes an open, community-driven list of the known LV level open datasets to encourage further research and development.

forecast, renewable energy, upstream oil & gas, (28 more...)

2106.00006

Country:

North America > United States (1.00)
Asia > China (0.67)
Europe > Germany (0.46)
(14 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)
Energy > Energy Storage (0.93)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
(15 more...)