Collaborating Authors


Global Big Data Conference


Data science has reached its peak through automation. All the phases of a data science project -- like data cleaning, model development, model comparison, model validation, and deployment -- are fully automated and can be executed in minutes, which earlier would have taken months. Machine learning (ML) continuously works to tweak the model to improve predictions. It's extremely critical to set up the right data pipeline to have a continuous flow of new data for all your data science, artificial intelligence (AI), ML, and decision intelligence projects. Decision intelligence (DI) is the next major data-driven decision-making technique for disruptive innovation after data science. Futuristic – Models ML outcomes to predict social, environmental, and business impact.

Responsible Data Management

Communications of the ACM

Incorporating ethics and legal compliance into data-driven algorithmic systems has been attracting significant attention from the computing research community, most notably under the umbrella of fair8 and interpretable16 machine learning. While important, much of this work has been limited in scope to the "last mile" of data analysis and has disregarded both the system's design, development, and use life cycle (What are we automating and why? Is the system working as intended? Are there any unforeseen consequences post-deployment?) and the data life cycle (Where did the data come from? How long is it valid and appropriate?). In this article, we argue two points. First, the decisions we make during data collection and preparation profoundly impact the robustness, fairness, and interpretability of the systems we build. Second, our responsibility for the operation of these systems does not stop when they are deployed. To make our discussion concrete, consider the use of predictive analytics in hiring. Automated hiring systems are seeing ever broader use and are as varied as the hiring practices themselves, ranging from resume screeners that claim to identify promising applicantsa to video and voice analysis tools that facilitate the interview processb and game-based assessments that promise to surface personality traits indicative of future success.c Bogen and Rieke5 describe the hiring process from the employer's point of view as a series of decisions that forms a funnel, with stages corresponding to sourcing, screening, interviewing, and selection. The hiring funnel is an example of an automated decision system--a data-driven, algorithm-assisted process that culminates in job offers to some candidates and rejections to others. The popularity of automated hiring systems is due in no small part to our collective quest for efficiency.

Data Quality for Big Data and Machine Learning


Machine learning (ML) has drawn great attention from academics as well as industries during the past decades and continues to achieve impressive human-level performance on nontrivial tasks such as image classification, voice recognition, natural language processing, and autopiloting. Both data and algorithms are critical to ensure the performance, fairness, robustness, reliability, and scalability of ML systems. However, artificial intelligence (AI) researchers and practitioners overwhelmingly concentrate on algorithms while undervaluing the impact of data quality. Recently, a report showed that the cost of data quality is approximately more than 600 billion US dollars per year for the US market alone, and a 2019 survey by Lourentzou indicates that 96% of the companies have run into problems with data quality, data labeling required to train ML. Due to the limitations of algorithmic solutions in AI success, scholars have proposed data-centric AI, with the initiative to carefully design the datasets, evaluate and improve the data quality for enhancing ML systems.This Research Topic focuses on data quality in ML, particularly on how to use state-of-the-art technology on assessment, assurance, and improvement of big data for building high-quality ML systems. Although some efforts have been devoted to data quality improvement for ML, uncovering data quality problems, and developing strategies to assess data quality, the data quality is rarely, rigorously, and systematically ev...

Calling All Data Scientists: Data Observability Needs You -


We live in a complex world that is full of data, and it's getting even more full every day. In 2020, the world collectively created, captured, copied, and consumed nearly 64.2 zettabytes of data and by 2025 that figure is expected to more than double to 180 zettabytes. Increasingly, companies depend on this data to create great experiences for customers and drive revenue. At the same time, without a way to automate the process of detecting data quality issues, all of this data can quickly get out of hand, eroding trust and hurting the bottom line. Data observability systems have emerged as crucial tools for data-driven companies, helping them leverage huge amounts of data without sacrificing quality and reliability.

Human rights, democracy, and the rule of law assurance framework for AI systems: A proposal Artificial Intelligence

Following on from the publication of its Feasibility Study in December 2020, the Council of Europe's Ad Hoc Committee on Artificial Intelligence (CAHAI) and its subgroups initiated efforts to formulate and draft its Possible Elements of a Legal Framework on Artificial Intelligence, based on the Council of Europe's standards on human rights, democracy, and the rule of law. This document was ultimately adopted by the CAHAI plenary in December 2021. To support this effort, The Alan Turing Institute undertook a programme of research that explored the governance processes and practical tools needed to operationalise the integration of human right due diligence with the assurance of trustworthy AI innovation practices. The resulting framework was completed and submitted to the Council of Europe in September 2021. It presents an end-to-end approach to the assurance of AI project lifecycles that integrates context-based risk analysis and appropriate stakeholder engagement with comprehensive impact assessment, and transparent risk management, impact mitigation, and innovation assurance practices. Taken together, these interlocking processes constitute a Human Rights, Democracy and the Rule of Law Assurance Framework (HUDERAF). The HUDERAF combines the procedural requirements for principles-based human rights due diligence with the governance mechanisms needed to set up technical and socio-technical guardrails for responsible and trustworthy AI innovation practices. Its purpose is to provide an accessible and user-friendly set of mechanisms for facilitating compliance with a binding legal framework on artificial intelligence, based on the Council of Europe's standards on human rights, democracy, and the rule of law, and to ensure that AI innovation projects are carried out with appropriate levels of public accountability, transparency, and democratic governance.

Using Automation in AI with Recent Enterprise Tools -


Data Science (DS) and Machine Learning (ML) are the spines of today's data-driven business decision-making. From a human viewpoint, ML often consists of multiple phases: from gathering requirements and datasets to deploying a model, and to support human decision-making--we refer to these stages together as DS/ML Lifecycle. There are also various personas in the DS/ML team and these personas must coordinate across the lifecycle: stakeholders set requirements, data scientists define a plan, and data engineers and ML engineers support with data cleaning and model building. Later, stakeholders verify the model, and domain experts use model inferences in decision making, and so on. Throughout the lifecycle, refinements may be performed at various stages, as needed. It is such a complex and time-consuming activity that there are not enough DS/ML professionals to fill the job demands, and as much as 80% of their time is spent on low-level activities such as tweaking data or trying out various algorithmic options and model tuning. These two challenges -- the dearth of data scientists, and time-consuming low-level activities -- have stimulated AI researchers and system builders to explore an automated solution for DS/ML work: Automated Data Science (AutoML). Several AutoML algorithms and systems have been built to automate the various stages of the DS/ML lifecycle. For example, the ETL (extract/transform/load) task has been applied to the data readiness, pre-processing & cleaning stage, and has attracted research attention.

Learnable Wavelet Packet Transform for Data-Adapted Spectrograms Machine Learning

Capturing high-frequency data concerning the condition of complex systems, e.g. by acoustic monitoring, has become increasingly prevalent. Such high-frequency signals typically contain time dependencies ranging over different time scales and different types of cyclic behaviors. Processing such signals requires careful feature engineering, particularly the extraction of meaningful time-frequency features. This can be time-consuming and the performance is often dependent on the choice of parameters. To address these limitations, we propose a deep learning framework for learnable wavelet packet transforms, enabling to learn features automatically from data and optimise them with respect to the defined objective function. The learned features can be represented as a spectrogram, containing the important time-frequency information of the dataset. We evaluate the properties and performance of the proposed approach by evaluating its improved spectral leakage and by applying it to an anomaly detection task for acoustic monitoring.

Forecasting: theory and practice Machine Learning

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

AI and Data Integrity Can Power Up Trusted Business Decisions


Undoubtedly, AI and Machine learning have taken over those IT organizations that are seeking competitive advantage, through digital transformation. Both AI and Machine Learning play a critical role when it comes to data integrity. More than 75% of the organizations across the globe, are prioritizing AI and Machine learning over traditional IT practices. Traditional IT practices have failed to handle the enormous volume of complex data available to organizations today. In order to analyse the huge amount of data, organizations have to adapt faster means, and this is where AI and Machine Learning prove to be beneficial.

Artificial Intellgence -- Application in Life Sciences and Beyond. The Upper Rhine Artificial Intelligence Symposium UR-AI 2021 Artificial Intelligence

The TriRhenaTech alliance presents the accepted papers of the 'Upper-Rhine Artificial Intelligence Symposium' held on October 27th 2021 in Kaiserslautern, Germany. Topics of the conference are applications of Artificial Intellgence in life sciences, intelligent systems, industry 4.0, mobility and others. The TriRhenaTech alliance is a network of universities in the Upper-Rhine Trinational Metropolitan Region comprising of the German universities of applied sciences in Furtwangen, Kaiserslautern, Karlsruhe, Offenburg and Trier, the Baden-Wuerttemberg Cooperative State University Loerrach, the French university network Alsace Tech (comprised of 14 'grandes \'ecoles' in the fields of engineering, architecture and management) and the University of Applied Sciences and Arts Northwestern Switzerland. The alliance's common goal is to reinforce the transfer of knowledge, research, and technology, as well as the cross-border mobility of students.