AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Giuliano Liguori on LinkedIn: #BigData #Analytics #DataScience

#artificialintelligenceJun-22-2022, 09:26:55 GMT

The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable. K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data. It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset. The Naive Bayes classification algorithm is a probabilistic classifier.

algorithm, datascience, giuliano liguori, (8 more...)

#artificialintelligence

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.58)

Add feedback

Building Transparency Into AI Projects - AI Summary

#artificialintelligenceJun-22-2022, 05:55:15 GMT

That means communicating why an AI solution was chosen, how it was designed and developed, on what grounds it was deployed, how it's monitored and updated, and the conditions under which it may be retired. There are four specific effects of building in transparency: 1) it decreases the risk of error and misuse, 2) it distributes responsibility, 3) it enables internal and external oversight, and 4) it expresses respect for people. In 2018, one of the largest tech companies in the world premiered an AI that called restaurants and impersonated a human to make reservations. To "prove" it was human, the company trained the AI to insert "umms" and "ahhs" into its request: for instance, "When would I like the reservation? If the product team doesn't explain how to properly handle the outputs of the model, introducing AI can be counterproductive in high-stakes situations. In designing the model, the data scientists reasonably thought that erroneously marking an x-ray as negative when in fact, the x-ray does show a cancerous tumor can have very dangerous consequences and so they set a low tolerance for false negatives and, thus, a high tolerance for false positives. Had they been properly informed -- had the design decision been made transparent to the end-user -- the radiologists may have thought, I really don't see anything here and I know the AI is overly sensitive, so I'm going to move on. By being transparent from start to finish, genuine accountability can be distributed among all as they are given the knowledge they need to make responsible decisions. Consider, for instance, a financial advisor who hides the existence of some investment opportunities and emphasizes the potential upsides of others because he gets a larger commission when he sells the latter. The more general point is that AI can undermine people's autonomy -- their ability to see the options available to them and to choose among them without undue influence or manipulation. That means communicating why an AI solution was chosen, how it was designed and developed, on what grounds it was deployed, how it's monitored and updated, and the conditions under which it may be retired. There are four specific effects of building in transparency: 1) it decreases the risk of error and misuse, 2) it distributes responsibility, 3) it enables internal and external oversight, and 4) it expresses respect for people. In 2018, one of the largest tech companies in the world premiered an AI that called restaurants and impersonated a human to make reservations. To "prove" it was human, the company trained the AI to insert "umms" and "ahhs" into its request: for instance, "When would I like the reservation?

building transparency, reservation, tolerance, (12 more...)

#artificialintelligence

Industry:

Health & Medicine (0.83)
Banking & Finance (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.60)

Add feedback

A Comprehensive Survey on the Cyber-Security of Smart Grids: Cyber-Attacks, Detection, Countermeasure Techniques, and Future Directions

Khoei, Tala Talaei, Slimane, Hadjar Ould, Kaabouch, Naima

arXiv.org Artificial IntelligenceJun-22-2022

One of the significant challenges that smart grid networks face is cyber-security. Several studies have been conducted to highlight those security challenges. However, the majority of these surveys classify attacks based on the security requirements, confidentiality, integrity, and availability, without taking into consideration the accountability requirement. In addition, some of these surveys focused on the Transmission Control Protocol/Internet Protocol (TCP/IP) model, which does not differentiate between the application, session, and presentation and the data link and physical layers of the Open System Interconnection (OSI) model. In this survey paper, we provide a classification of attacks based on the OSI model and discuss in more detail the cyber-attacks that can target the different layers of smart grid networks communication. We also propose new classifications for the detection and countermeasure techniques and describe existing techniques under each category. Finally, we discuss challenges and future research directions.

evolutionary algorithm, machine learning, smart grid, (17 more...)

arXiv.org Artificial Intelligence

2207.07738

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > China (0.04)
North America > United States > North Dakota > Grand Forks County > Grand Forks (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.92)
Government > Military > Cyberwarfare (0.91)
Energy > Power Industry > Utilities (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks > Sensor Networks (1.00)
(6 more...)

Add feedback

Sharp Constants in Uniformity Testing via the Huber Statistic

Gupta, Shivam, Price, Eric

arXiv.org Machine LearningJun-21-2022

Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $\epsilon$-far distribution with $1-\delta$ probability is $n = \Theta\left(\frac{\sqrt{m \log (1/\delta)}}{\epsilon^2} + \frac{\log (1/\delta)}{\epsilon^2}\right)$, which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to $0$ or $\infty$. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of $(1 + o(1))\frac{\sqrt{m \log (1/\delta)}}{\epsilon^2}$ in the regime where this term is dominant, unlike all other existing testers.

artificial intelligence, machine learning, statistic, (19 more...)

arXiv.org Machine Learning

2206.10722

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

PAC-Wrap: Semi-Supervised PAC Anomaly Detection

Li, Shuo, Ji, Xiayan, Dobriban, Edgar, Sokolsky, Oleg, Lee, Insup

arXiv.org Machine LearningJun-21-2022

Anomaly detection is essential for preventing hazardous outcomes for safety-critical applications like autonomous driving. Given their safety-criticality, these applications benefit from provable bounds on various errors in anomaly detection. To achieve this goal in the semi-supervised setting, we propose to provide Probably Approximately Correct (PAC) guarantees on the false negative and false positive detection rates for anomaly detection algorithms. Our method (PAC-Wrap) can wrap around virtually any existing semi-supervised and unsupervised anomaly detection method, endowing it with rigorous guarantees. Our experiments with various anomaly detectors and datasets indicate that PAC-Wrap is broadly effective.

data mining, machine learning, prediction, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3534678.3539408

2205.10798

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.68)
Information Technology (0.66)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Machine Learning on a Large Scale

#artificialintelligenceJun-20-2022, 23:50:12 GMT

The ROC curve is also used in order to compute the area under the ROC curve metric. The ROC curve of a perfect model will approach the top-left corner, whilst a random model will approach the diagonal (True positive rate False positive rate). The area under the ROC curve ranges between 0. and 1 and can be computed via a BinaryClassificationEvaluator object The result is impressive, despite the attempt to hamper the model quality. The area under the ROC curve for the training set can be obtained from the model summary lr_model.summary.areaUnderROC. The BinaryClassificationEvaluator object can also be used to compute the area under the PR curve.

estimator, logistic regression model, roc curve, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Python for Machine Learning: A Tutorial

#artificialintelligenceJun-20-2022, 15:05:11 GMT

Python has become the most popular data science and machine learning programming language. But in order to obtain effective data and results, it's important that you have a basic understanding of how it works with machine learning. In this introductory tutorial, you'll learn the basics of Python for machine learning, including different model types and the steps to take to ensure you obtain quality data, using a sample machine learning problem. In addition, you'll get to know some of the most popular libraries and tools for machine learning. Machine learning (ML) is a form of artificial intelligence (AI) that teaches computers to make predictions and recommendations and solve problems based on data. Its problem-solving capabilities make it a useful tool in industries such as financial services, healthcare, marketing and sales, and education among others. There are three main types of machine learning: supervised, unsupervised, and reinforcement. In supervised learning, the computer is given a set of training data that includes both the input data (what we want to predict) and the output data (the prediction).

algorithm, dataset, library, (14 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.34)

Industry: Education > Curriculum > Subject-Specific Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

Particle Transformer for Jet Tagging

Qu, Huilin, Li, Congqiao, Qian, Sitian

arXiv.org Artificial IntelligenceJun-18-2022

Jet tagging is a critical yet challenging classification task in particle physics. While deep learning has transformed jet tagging and significantly improved performance, the lack of a large-scale public dataset impedes further enhancement. In this work, we present JetClass, a new comprehensive dataset for jet tagging. The JetClass dataset consists of 100 M jets, about two orders of magnitude larger than existing public datasets. A total of 10 types of jets are simulated, including several types unexplored for tagging so far. Based on the large dataset, we propose a new Transformer-based architecture for jet tagging, called Particle Transformer (ParT). By incorporating pairwise particle interactions in the attention mechanism, ParT achieves higher tagging performance than a plain Transformer and surpasses the previous state-of-the-art, ParticleNet, by a large margin. The pre-trained ParT models, once fine-tuned, also substantially enhance the performance on two widely adopted jet tagging benchmarks. The dataset, code and models are publicly available at https://github.com/jet-universe/particle_transformer.

artificial intelligence, machine learning, particle, (18 more...)

arXiv.org Artificial Intelligence

2202.03772

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Baltimore (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

From Modeling to Scoring: Correcting Predicted Class Probabilities in Imbalanced Datasets

#artificialintelligenceJun-17-2022, 06:01:08 GMT

Model evaluation is an important part of a data science project and it's exactly this part that quantifies how good your model is, how much it has improved from the previous version, how much better it is than your colleague's model, and how much room for improvement there still is. It is not unusual in machine learning applications to deal with imbalanced datasets such as fraud detection, computer network intrusion, medical diagnostics, and many more. Data imbalance refers to unequal distribution of classes within a dataset, namely that there are far fewer events in one class in comparison to the others. If, for example we have credit card fraud detection dataset, most of the transactions are not fraudulent and very few can be classed as fraud detections. This underrepresented class is called the minority class, and by convention, the positive class.

class probability, probability, transaction, (14 more...)

#artificialintelligence

Industry: Law Enforcement & Public Safety > Fraud (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Random projections and Kernelised Leave One Cluster Out Cross-Validation: Universal baselines and evaluation tools for supervised machine learning for materials properties

Durdy, Samantha, Gaultois, Michael, Gusev, Vladimir, Bollegala, Danushka, Rosseinsky, Matthew J.

arXiv.org Artificial IntelligenceJun-17-2022

With machine learning being a popular topic in current computational materials science literature, creating representations for compounds has become common place. These representations are rarely compared, as evaluating their performance - and the performance of the algorithms that they are used with - is non-trivial. With many materials datasets containing bias and skew caused by the research process, leave one cluster out cross validation (LOCO-CV) has been introduced as a way of measuring the performance of an algorithm in predicting previously unseen groups of materials. This raises the question of the impact, and control, of the range of cluster sizes on the LOCO-CV measurement outcomes. We present a thorough comparison between composition-based representations, and investigate how kernel approximation functions can be used to better separate data to enhance LOCO-CV applications. We find that domain knowledge does not improve machine learning performance in most tasks tested, with band gap prediction being the notable exception. We also find that the radial basis function improves the linear separability of chemical datasets in all 10 datasets tested and provide a framework for the application of this function in the LOCO-CV process to improve the outcome of LOCO-CV measurements regardless of machine learning algorithm, choice of metric, and choice of compound representation. We recommend kernelised LOCO-CV as a training paradigm for those looking to measure the extrapolatory power of an algorithm on materials data.

artificial intelligence, machine learning, universal baseline and evaluation tool, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1039/D2DD00039C

2206.08841

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.60)

Add feedback