Goto

Collaborating Authors

 hadoop


A Benchmark Dataset for Graph Regression with Homogeneous and Multi-Relational Variants

Samoaa, Peter, Vukojevic, Marcus, Chehreghani, Morteza Haghir, Longa, Antonio

arXiv.org Artificial Intelligence

Graph-level regression underpins many real-world applications, yet public benchmarks remain heavily skewed toward molecular graphs and citation networks. This limited diversity hinders progress on models that must generalize across both homogeneous and heterogeneous graph structures. We introduce RelSC, a new graph-regression dataset built from program graphs that combine syntactic and semantic information extracted from source code. Each graph is labelled with the execution-time cost of the corresponding program, providing a continuous target variable that differs markedly from those found in existing benchmarks. RelSC is released in two complementary variants. RelSC-H supplies rich node features under a single (homogeneous) edge type, while RelSC-M preserves the original multi-relational structure, connecting nodes through multiple edge types that encode distinct semantic relationships. Together, these variants let researchers probe how representation choice influences model behaviour. We evaluate a diverse set of graph neural network architectures on both variants of RelSC. The results reveal consistent performance differences between the homogeneous and multi-relational settings, emphasising the importance of structural representation. These findings demonstrate RelSC's value as a challenging and versatile benchmark for advancing graph regression methods.


OneLog: Towards End-to-End Training in Software Log Anomaly Detection

Hashemi, Shayan, Mäntylä, Mika

arXiv.org Artificial Intelligence

With the growth of online services, IoT devices, and DevOps-oriented software development, software log anomaly detection is becoming increasingly important. Prior works mainly follow a traditional four-staged architecture (Preprocessor, Parser, Vectorizer, and Classifier). This paper proposes OneLog, which utilizes a single Deep Neural Network (DNN) instead of multiple separate components. OneLog harnesses Convolutional Neural Networks (CNN) at the character level to take digits, numbers, and punctuations, which were removed in prior works, into account alongside the main natural language text. We evaluate our approach in six message- and sequence-based data sets: HDFS, Hadoop, BGL, Thunderbird, Spirit, and Liberty. We experiment with Onelog with single-, multi-, and cross-project setups. Onelog offers state-of-the-art performance in our datasets. Onelog can utilize multi-project datasets simultaneously during training, which suggests our model can generalize between datasets. Multi-project training also improves Onelog performance making it ideal when limited training data is available for an individual project. We also found that cross-project anomaly detection is possible with a single project pair (Liberty and Spirit). Analysis of model internals shows that one log has multiple modes of detecting anomalies and that the model learns manually validated parsing rules for the log messages. We conclude that character-based CNNs are a promising approach toward end-to-end learning in log anomaly detection. They offer good performance and generalization over multiple datasets. We will make our scripts publicly available upon the acceptance of this paper.


An Empirical Study on Log-based Anomaly Detection Using Machine Learning

Ali, Shan, Boufaied, Chaima, Bianculli, Domenico, Branco, Paula, Briand, Lionel, Aschbacher, Nathan

arXiv.org Artificial Intelligence

The growth of systems complexity increases the need of automated techniques dedicated to different log analysis tasks such as Log-based Anomaly Detection (LAD). The latter has been widely addressed in the literature, mostly by means of different deep learning techniques. Nevertheless, the focus on deep learning techniques results in less attention being paid to traditional Machine Learning (ML) techniques, which may perform well in many cases, depending on the context and the used datasets. Further, the evaluation of different ML techniques is mostly based on the assessment of their detection accuracy. However, this is is not enough to decide whether or not a specific ML technique is suitable to address the LAD problem. Other aspects to consider include the training and prediction time as well as the sensitivity to hyperparameter tuning. In this paper, we present a comprehensive empirical study, in which we evaluate different supervised and semi-supervised, traditional and deep ML techniques w.r.t. four evaluation criteria: detection accuracy, time performance, sensitivity of detection accuracy as well as time performance to hyperparameter tuning. The experimental results show that supervised traditional and deep ML techniques perform very closely in terms of their detection accuracy and prediction time. Moreover, the overall evaluation of the sensitivity of the detection accuracy of the different ML techniques to hyperparameter tuning shows that supervised traditional ML techniques are less sensitive to hyperparameter tuning than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.


Senior Data Engineer (Spark, Python, Hadoop) at Visa - Bengaluru, India

#artificialintelligence

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive. When you join Visa, you join a culture of purpose and belonging – where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world – helping unlock financial access to enable the future of money movement.


Senior Data Engineer (Scala, Spark, Hadoop) at Visa - Bengaluru, India

#artificialintelligence

Find open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general, filtered by job title or popular skill, toolset and products used.


Cloud Software Engineer 3

#artificialintelligence

A Bachelor's Degree in Computer Science or in a related technical field is highly desired which will be considered equivalent to two (2) years of experience A Master's degree in a Technical Field will be considered equivalent to four (4) years of experience A degree in Mathematics, Information Systems, Engineering, or similar degree will be considered as a technical field Eight (8) years of experience in software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing, and problem diagnosis/resolution and at least six (6) years of experience developing software with high level languages such as Java, C, C Demonstrated ability to work with OpenSource (NoSQL) products that support highly distributed, massively parallel computation needs such as Hbase, Acumulo, Big Table, etcAd: Ready to find your dream job? Use this free career assessment test to figure it out. Peraton drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy As the world's leading mission capability integrator and transformative enterprise IT provider, we deliver trusted and highly differentiated national security solutions and technologies that keep people safe and secureAd: Stop spending hours editing your resume to fit job descriptions. Peraton serves as a valued partner to essential government agencies across the intelligence, space, cyber, defense, civilian, health, and state and local markets Every day, our employees do the can't be done, solving the most daunting challenges facing our customers For Colorado Residents: Colorado Salary Minimum: $90,500 Colorado Salary Maximum: $219,700 The estimate displayed represents the typical salary range for this position, and is just one component of Peraton's total compensation package for employees Other rewards may include annual bonuses, short- and long-term incentives, and program-specific awards In addition, Peraton provides a variety of benefits to employees



Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Azhir, Elham, Hosseinzadeh, Mehdi, Khan, Faheem, Mosavi, Amir

arXiv.org Artificial Intelligence

Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional clustering algorithms take a significant amount of execution time for clustering such large datasets. The MapReduce distributed computing model provides efficient solutions for storing and processing vast quantities of data. Apache Spark and Apache Hadoop frameworks are used in the present investigation to cluster different sizes of query datasets in the MapReduce-based access plan recommendation method. The performance evaluation is performed based on execution time. The results of the experiments demonstrated the effectiveness of parallel query clustering in achieving high scalability. Furthermore, Apache Spark achieved better performance than Apache Hadoop, reaching an average speedup of 2x.


Senior Machine Learning Engineer (m/d/f) computer vision / NLP - Remote Tech Jobs

#artificialintelligence

Become part of a very successful Medtech startup that has operated for over 5 years and is on its way to skyrocket in the field of analyzing human psychology and developing an AI engine that helps psychologists to make better diagnoses. Starting in gaming, optimized for human potential and health:…


La veille de la cybersécurité

#artificialintelligence

Deep Learning is the subset of Machine Learning that primarily deals with Neural Networks. Deep Learning skills are the key skills that students today need to be able to thrive in the global economy. Deep learning skills can help them land prestigious job positions at FAANG companies. FAANG is an acronym that indicates the stocks of five prominent American technology companies: Facebook, Amazon, Apple, Netflix, and Google. Read on to find out more about the key deep learning skills in demand for FAANG.