AITopics

2502.18002

Country:

North America > United States (0.45)
Asia (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (0.68)
Information Technology > Security & Privacy (0.65)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.65)

arXiv.org Artificial IntelligenceFeb-7-2024

Strong convexity-guided hyper-parameter optimization for flatter losses

Yedida, Rahul, Saha, Snehanshu

We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the strong convexity of the loss and its flatness. Based on this, we seek to find hyper-parameter configurations that improve flatness by minimizing the strong convexity of the loss. By using the structure of the underlying neural network, we derive closed-form equations to approximate the strong convexity parameter, and attempt to find hyper-parameters that minimize it in a randomized fashion. Through experiments on 14 classification datasets, we show that our method achieves strong performance at a fraction of the runtime.

artificial intelligence, bayesian inference, machine learning, (15 more...)

2402.05025

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJan-17-2024

SMOOTHIE: A Theory of Hyper-parameter Optimization for Software Analytics

Yedida, Rahul, Menzies, Tim

Hyper-parameter optimization is the black art of tuning a learner's control parameters. In software analytics, a repeated result is that such tuning can result in dramatic performance improvements. Despite this, hyper-parameter optimization is often applied rarely or poorly in software analytics--perhaps due to the CPU cost of exploring all those parameter options can be prohibitive. We theorize that learners generalize better when the loss landscape is ``smooth''. This theory is useful since the influence on ``smoothness'' of different hyper-parameter choices can be tested very quickly (e.g. for a deep learner, after just one epoch). To test this theory, this paper implements and tests SMOOTHIE, a novel hyper-parameter optimizer that guides its optimizations via considerations of ``smothness''. The experiments of this paper test SMOOTHIE on numerous SE tasks including (a) GitHub issue lifetime prediction; (b) detecting false alarms in static code warnings; (c) defect prediction, and (d) a set of standard ML datasets. In all these experiments, SMOOTHIE out-performed state-of-the-art optimizers. Better yet, SMOOTHIE ran 300% faster than the prior state-of-the art. We hence conclude that this theory (that hyper-parameter optimization is best viewed as a ``smoothing'' function for the decision landscape), is both theoretically interesting and practically very useful. To support open science and other researchers working in this area, all our scripts and datasets are available on-line at https://github.com/yrahul3910/smoothness-hpo/.

data mining, machine learning, optimization, (21 more...)

2401.09622

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.94)

Industry: Education (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
(3 more...)

arXiv.org Artificial IntelligenceDec-23-2022

How to Find Actionable Static Analysis Warnings: A Case Study with FindBugs

Yedida, Rahul, Kang, Hong Jin, Tu, Huy, Yang, Xueqi, Lo, David, Menzies, Tim

Automatically generated static code warnings suffer from a large number of false alarms. Hence, developers only take action on a small percent of those warnings. To better predict which static code warnings should not be ignored, we suggest that analysts need to look deeper into their algorithms to find choices that better improve the particulars of their specific problem. Specifically, we show here that effective predictors of such warnings can be created by methods that locally adjust the decision boundary (between actionable warnings and others). These methods yield a new high water-mark for recognizing actionable static code warnings. For eight open-source Java projects (cassandra, jmeter, commons, lucene-solr, maven, ant, tomcat, derby) we achieve perfect test results on 4/8 datasets and, overall, a median AUC (area under the true negatives, true positives curve) of 92%.

artificial intelligence, engineering, machine learning, (18 more...)

2205.10504

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Education (0.67)
Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningSep-29-2021

Partitioning Cloud-based Microservices (via Deep Learning)

Yedida, Rahul, Krishna, Rahul, Kalia, Anup, Menzies, Tim, Xiao, Jin, Vukovic, Maja

Cloud-based software has many advantages. When services are divided into many independent components, they are easier to update. Also, during peak demand, it is easier to scale cloud services (just hire more CPUs). Hence, many organizations are partitioning their monolithic enterprise applications into cloud-based microservices. Recently there has been much work using machine learning to simplify this partitioning task. Despite much research, no single partitioning method can be recommended as generally useful. More specifically, those prior solutions are "brittle''; i.e. if they work well for one kind of goal in one dataset, then they can be sub-optimal if applied to many datasets and multiple goals. In order to find a generally useful partitioning method, we propose DEEPLY. This new algorithm extends the CO-GCN deep learning partition generator with (a) a novel loss function and (b) some hyper-parameter optimization. As shown by our experiments, DEEPLY generally outperforms prior work (including CO-GCN, and others) across multiple datasets and goals. To the best of our knowledge, this is the first report in SE of such stable hyper-parameter optimization. To aid reuse of this work, DEEPLY is available on-line at https://bit.ly/2WhfFlB.

cloud computing, information technology services, machine learning, (19 more...)

2109.14569

Country: North America > United States > North Carolina (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Artificial IntelligenceJan-15-2021

When SIMPLE is better than complex: A case study on deep learning for predicting Bugzilla issue close time

Yedida, Rahul, Yang, Xueqi, Menzies, Tim

Is deep learning over-hyped? Where are the case studies that compare state-of-the-art deep learners with simpler options? In response to this gap in the literature, this paper offers one case study on using deep learning to predict issue close time in Bugzilla. We report here that a SIMPLE extension to a decades-old feedforward neural network works better than the more recent, and more elaborate, "long-short term memory" deep learning (which are currently popular in the SE literature). SIMPLE is a combination of a fast feedforward network and a hyper-parameter optimizer. SIMPLE runs in 3 seconds while the newer algorithms take 6 hours to terminate. Since it runs so fast, it is more amenable to being tuned by our optimizer. This paper reports results seen after running SIMPLE on issue close time data from 45,364 issues raised in Chromium, Eclipse, and Firefox projects from January 2010 to March 2016. In our experiments, this SIMPLEr tuning approach achieves significantly better predictors for issue close time than the more complex deep learner. These better and SIMPLEr results can be generated 2,700 times faster than if using a state-of-the-art deep learner. From this result, we make two conclusions. Firstly, for predicting issue close time, we would recommend SIMPLE over complex deep learners. Secondly, before analysts try very sophisticated (but very slow) algorithms, they might achieve better results, much sooner, by applying hyper-parameter optimization to simple (but very fast) algorithms.

deep learning, learner, neural network, (17 more...)

2101.06319

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

arXiv.org Machine LearningMay-17-2020

Parsimonious Computing: A Minority Training Regime for Effective Prediction in Large Microarray Expression Data Sets

Sridhar, Shailesh, Saha, Snehanshu, Shaikh, Azhar, Yedida, Rahul, Saha, Sriparna

Rigorous mathematical investigation of learning rates used in back-propagation in shallow neural networks has become a necessity. This is because experimental evidence needs to be endorsed by a theoretical background. Such theory may be helpful in reducing the volume of experimental effort to accomplish desired results. We leveraged the functional property of Mean Square Error, which is Lipschitz continuous to compute learning rate in shallow neural networks. We claim that our approach reduces tuning efforts, especially when a significant corpus of data has to be handled. We achieve remarkable improvement in saving computational cost while surpassing prediction accuracy reported in literature. The learning rate, proposed here, is the inverse of the Lipschitz constant. The work results in a novel method for carrying out gene expression inference on large microarray data sets with a shallow architecture constrained by limited computing resources. A combination of random sub-sampling of the dataset, an adaptive Lipschitz constant inspired learning rate and a new activation function, A-ReLU helped accomplish the results reported in the paper.

deep learning, learning rate, neural network, (15 more...)

2005.08442

Country: Asia > India (0.47)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Machine LearningJun-1-2019

Evolution of Novel Activation Functions in Neural Network Training with Applications to Classification of Exoplanets

Saha, Snehanshu, Nagaraj, Nithin, Mathur, Archana, Yedida, Rahul

Neural networks, although a powerful engine in supervised methods, often require expensive tuning efforts for optimized performance. Habitability classes are hard to discriminate, especially when attributes used as hard markers of separation are removed from the data set. The solution is approached from the point of investigating analytical properties of the proposed activation functions. The theory of ordinary differential equations and fixed point are exploited to justify the "lack of tuning efforts" to achieve optimal performance compared to traditional activation functions. Additionally, the relationship between the proposed activation functions and the more popular ones is established through extensive analytical and empirical evidence. Finally, the activation functions have been implemented in plain vanilla feed-forward neural network to classify exoplanets.

activation function, artificial intelligence, neural network, (17 more...)

1906.01975

Country:

North America > United States (0.46)
Asia > India (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Machine LearningFeb-19-2019

A novel adaptive learning rate scheduler for deep neural networks

Yedida, Rahul, Saha, Snehanshu

Deep learning [8] is becoming more omnipresent for several tasks, including image recognition [22, 29], face recognition [30], and object detection [6]. At the same time, the trend is towards deeper neural networks [12, 9]. Deep convolutional neural networks [15, 16] are a variant that introduce convolutional and pooling layers, and have seen incredible success in image classification [21, 33], even surpassing human-level performance [9]. Very deep convolutional neural networks have even crossed 1000 layers [11]. Despite their popularity, training neural networks is made difficult by several problems. These include vanishing and exploding gradients [7, 3] and overfitting.

computer based training, deep learning, learning rate, (18 more...)

1902.07399

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-19-2018

Employee Attrition Prediction

Yedida, Rahul, Reddy, Rahul, Vahi, Rakshit, Jana, Rahul, GV, Abhilash, Kulkarni, Deepti

We aim to predict whether an employee of a company will leave or not, using the k-Nearest Neighbors algorithm. We use evaluation of employee performance, average monthly hours at work and number of years spent in the company, among others, as our features. Other approaches to this problem include the use of ANNs, decision trees and logistic regression. The dataset was split, using 70% for training the algorithm and 30% for testing it, achieving an accuracy of 94.32%.

artificial intelligence, classifier, machine learning, (18 more...)

1806.1048

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.73)