Goto

Collaborating Authors

 algorithm performance


35th Conference on Neural Information Processing Systems 2021 . Corresponding author https

Neural Information Processing Systems

We demonstrate our framework's utility by proving and methods that are guaranteed to be defended against deception, given bounded sistent conclusions about performance. Our framework enables us to prove EHPO put forth a logical framework to capture its semantics and how it can lead to inconrigorous. We call this process epistemic hyperparameter optimization (EHPO), and deception, the process of drawing conclusions from HPO should be made more provide a theoretical complement to this prior work, arguing that, to avoid such the opposite. In short, the way we choose hyperparameters can deceive us. We yield the conclusion that J outperforms K, whereas searching another can entail research.




Fifteen Years of Child-Centered Long-Form Recordings: Promises, Resources, and Remaining Challenges to Validity

arXiv.org Artificial Intelligence

Audio-recordings collected with a child-worn device are a fundamental tool in child language research. Long-form recordings collected over whole days promise to capture children's input and production with minimal observer bias, and therefore high validity. The sheer volume of resulting data necessitates automated analysis to extract relevant metrics for researchers and clinicians. This paper summarizes collective knowledge on this technique, providing entry points to existing resources. We also highlight various sources of error that threaten the accuracy of automated annotations and the interpretation of resulting metrics. To address this, we propose potential troubleshooting metrics to help users assess data quality. While a fully automated quality control system is not feasible, we outline practical strategies for researchers to improve data collection and contextualize their analyses.


Enhance the machine learning algorithm performance in phishing detection with keyword features

arXiv.org Artificial Intelligence

Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this method is useful and effective. On average, this method can reduce the classification error by 30% for the large dataset. Moreover, its enhancement is more significant for the small dataset. In addition, this method extracts the information from the URL and does not rely on the additional information provided by the third-part service. The best result for the machine learning algorithm using our proposed method has achieved the accuracy of 99.68%.


Meta-learning optimizes predictions of missing links in real-world networks

arXiv.org Artificial Intelligence

Relational data are ubiquitous in real-world data applications, e.g., in social network analysis or biological modeling, but networks are nearly always incompletely observed. The state-of-the-art for predicting missing links in the hard case of a network without node attributes uses model stacking or neural network techniques. It remains unknown which approach is best, and whether or how the best choice of algorithm depends on the input network's characteristics. We answer these questions systematically using a large, structurally diverse benchmark of 550 real-world networks under two standard accuracy measures (AUC and Top-k), comparing four stacking algorithms with 42 topological link predictors, two of which we introduce here, and two graph neural network algorithms. We show that no algorithm is best across all input networks, all algorithms perform well on most social networks, and few perform well on economic and biological networks. Overall, model stacking with a random forest is both highly scalable and surpasses on AUC or is competitive with graph neural networks on Top-k accuracy. But, algorithm performance depends strongly on network characteristics like the degree distribution, triangle density, and degree assortativity. We introduce a meta-learning algorithm that exploits this variability to optimize link predictions for individual networks by selecting the best algorithm to apply, which we show outperforms all state-of-the-art algorithms and scales to large networks.


Instance space analysis of the capacitated vehicle routing problem

arXiv.org Artificial Intelligence

These features aim to answer the question: "What is the general shape of the problem and how does the arrangement of the nodes influence its optimization difficulty?". The following features are used to analyze the geometry of the problem: G1: Area of the enclosing rectangle [11], [12] G2: Convex hull area [12], [15] G3: Ratio of points on the hull [12], [15] G4: Distance of enclosed points to the convex hull contour [12] G5: Edge lengths of the convex hull 5) Nearest neighborhood (NN) features: This features' category regards relationships between each node and its nearest neighbors, capturing the local search space structure and node connectivity. Unlike MST or geometric features, which focus on overall structure, NN features focus on relationships between a node and its closest nodes. These features aim to answer the question: "How are nodes connected to each other in their immediate neighborhoods, and how do these connections influence the optimization difficulty?". The following features are used to describe the neighborhood of each node: NN1: Distance to 1st NN [11], [15] NN2: Number of strongly connected components [16] NN3: Number of weakly connected components [16] NN4: Size of strongly connected components [16] NN5: Size of weakly connected components [16] NN6: Node input degree in directed kNN graph [16] NN7: Ratio of number of strongly and weakly connected components [16] NN8: Angles between a node and its two nearest neighbor nodes [12] 6) VRP Specific Features: This category of features is composed by values obtained directly from the parameter values of the VRP instances, such as vehicle capacity, customer demands, etc.


Tracing the Interactions of Modular CMA-ES Configurations Across Problem Landscapes

arXiv.org Artificial Intelligence

This paper leverages the recently introduced concept of algorithm footprints to investigate the interplay between algorithm configurations and problem characteristics. Performance footprints are calculated for six modular variants of the CMA-ES algorithm (modCMA), evaluated on 24 benchmark problems from the BBOB suite, across two-dimensional settings: 5-dimensional and 30-dimensional. These footprints provide insights into why different configurations of the same algorithm exhibit varying performance and identify the problem features influencing these outcomes. Our analysis uncovers shared behavioral patterns across configurations due to common interactions with problem properties, as well as distinct behaviors on the same problem driven by differing problem features. The results demonstrate the effectiveness of algorithm footprints in enhancing interpretability and guiding configuration choices.


Customized Exploration of Landscape Features Driving Multi-Objective Combinatorial Optimization Performance

arXiv.org Artificial Intelligence

We present an analysis of landscape features for predicting the performance of multi-objective combinatorial optimization algorithms. We consider features from the recently proposed compressed Pareto Local Optimal Solutions Networks (C-PLOS-net) model of combinatorial landscapes. The benchmark instances are a set of rmnk-landscapes with 2 and 3 objectives and various levels of ruggedness and objective correlation. We consider the performance of three algorithms -- Pareto Local Search (PLS), Global Simple EMO Optimizer (GSEMO), and Non-dominated Sorting Genetic Algorithm (NSGA-II) - using the resolution and hypervolume metrics. Our tailored analysis reveals feature combinations that influence algorithm performance specific to certain landscapes. This study provides deeper insights into feature importance, tailored to specific rmnk-landscapes and algorithms.


The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong

arXiv.org Artificial Intelligence

Algorithm selection, aiming to identify the best algorithm for a given problem, plays a pivotal role in continuous black-box optimization. A common approach involves representing optimization functions using a set of features, which are then used to train a machine learning meta-model for selecting suitable algorithms. Various approaches have demonstrated the effectiveness of these algorithm selection meta-models. However, not all evaluation approaches are equally valid for assessing the performance of meta-models. We highlight methodological issues that frequently occur in the community and should be addressed when evaluating algorithm selection approaches. First, we identify flaws with the "leave-instance-out" evaluation technique. We show that non-informative features and meta-models can achieve high accuracy, which should not be the case with a well-designed evaluation framework. Second, we demonstrate that measuring the performance of optimization algorithms with metrics sensitive to the scale of the objective function requires careful consideration of how this impacts the construction of the meta-model, its predictions, and the model's error. Such metrics can falsely present overly optimistic performance assessments of the meta-models. This paper emphasizes the importance of careful evaluation, as loosely defined methodologies can mislead researchers, divert efforts, and introduce noise into the field