Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach

Liu, Yue, Zhang, Dawen, Xia, Boming, Anticev, Julia, Adebayo, Tunde, Xing, Zhenchang, Machao, Moses

arXiv.org Artificial Intelligence 

Data governance is critical in the era of advanced artificial intelligence (AI), particularly with the proliferation of large-scale generative AI that necessitates extensive datasets for model training and fine-tuning. Organisations that navigate complex data supply chains involving multiple stakeholders and varied tools are facing challenges in ensuring the traceability, verifiability, and reproducibility of data. This complexity is compounded in cross-departmental or cross-organisational data exchanges, where maintaining data accountability becomes increasingly significant. This issue is exacerbated after the emergence of large-scale generative AI models such as Large Language Models (LLMs) [1]. As enterprises and research institutions all need large and high-quality corpora for model development and enhancement, the lack of effective governance frameworks to manage data creation, usage, and transfer, especially across diverse stakeholders, becomes evident. Within a data supply chain, which involves continuing dataset artifact transformation and dissemination, stakeholders need to i) ensure data traceability in terms of the origin, authorisation and operations conducted on the dataset artifacts, ii) achieve data verifiability with authenticated sources and licence, iii) preserve data reproducibility that if questions are raised for specific steps on processing or transferring, and consequently, iv) the overall accountability to identify the responsible stakeholders if violations are detected. Nevertheless, current data governance models, often tied to specific platforms and focusing on data storage schemes (e.g., object storage, InterPlanetary File System), secure trading protocols [2, 3], and privacy regulations (e.g. the General Data Protection Regulation), fall short in addressing the dynamic nature of data flows from the perspective of the overall data supply chain and the requirement for platform-agnostic traceability solutions.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found