data collaboration
ROSfs: A User-Level File System for ROS
Xu, Zijun, Wen, Xuanjun, Song, Yanjie, Yin, Shu
We present ROSfs, a novel user-level file system for the Robot Operating System (ROS). ROSfs interprets a robot file as a group of sub-files, with each having a distinct label. ROSfs applies a time index structure to enhance the flexible data query while the data file is under modification. It provides multi-robot systems (MRS) with prompt cross-robot data acquisition and collaboration. We implemented a ROSfs prototype and integrated it into a mainstream ROS platform. We then applied and evaluated ROSfs on real-world UAVs and data servers. Evaluation results show that compared with traditional ROS storage methods, ROSfs improves the offline query performance by up to 129x and reduces inter-robot online data query latency under a wireless network by up to 7x.
Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration
Bogdanova, Anna, Imakura, Akira, Sakurai, Tetsuya, Fujii, Tomoya, Sakamoto, Teppei, Abe, Hiroyuki
Transparency of Machine Learning models used for decision support in various industries becomes essential for ensuring their ethical use. To that end, feature attribution methods such as SHAP (SHapley Additive exPlanations) are widely used to explain the predictions of black-box machine learning models to customers and developers. However, a parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data. Such models, trained over horizontally or vertically partitioned data, present a challenge for explainable AI because the explaining party may have a biased view of background data or a partial view of the feature space. As a result, explanations obtained from different participants of distributed machine learning might not be consistent with one another, undermining trust in the product. This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm (KernelSHAP) and Data Collaboration method of privacy-preserving distributed machine learning. In particular, we present three algorithms for different scenarios of explainability in Data Collaboration and verify their consistency with experiments on open-access datasets. Our results demonstrated a significant (by at least a factor of 1.75) decrease in feature attribution discrepancies among the users of distributed machine learning.
Linux Foundation unveils new permissive license for open data collaboration - JackOfAllTechs.com
The Linux Foundation has announced a new permissive license designed to help foster collaboration around open data for artificial intelligence (AI) and machine learning (ML) projects. It has often been said that data is the new oil, but for AI and ML projects in particular, having access to expansive and diverse data sets is key to reducing bias and building powerful models capable of all manner of intelligent tasks. To machines, data is a little like "experience" is to humans -- the more of it you have, the better decisions you are likely to make. With CDLA-Permissive-2.0, the Linux Foundation is building on its previous efforts to encourage data-sharing efforts through licensing arrangements that clearly define how the data -- and any derivative data sets -- can and can't be used. The Linux Foundation first introduced the Community Data License Agreement (CDLA) back in 2017 to entice organizations to open up their vast pools of (underused) data to third-parties.
Snowflake Unveils Your Data Exchange Potential
Data is at the core of every business irrespective of the field of activity it is engaged in. Business success depends on how effectively it uses its multiple kinds and copious amounts of data to interact with each individual constituent from employees and customers to vendors, business associates, influencers etc. Sharing and exchanging data efficiently at minimum cost is absolutely critical to your path to competitive advantage. Snowflake is a globally recognised expert across industries, small, medium and large in establishing data exchange and managing complex data sharing in a governed and secure way with minimal risk, cost, headache and delay that have plagued traditional methods. Thus your organisation will have the most modern data sharing ability to easily and quickly forge one-to-one, one-to-many, and many-to-many relationships to share data in new and imaginative ways reducing time to a level never before possible. Data exchange is the process of sending and receiving data in a manner that the information, content or meaning assigned to the data is not altered during the transmission.
How SMC Allows You to Perform Advanced Data Collaboration Without Exposing Your Data - UrIoTNews
Data collaboration is the process of combining datasets together to generate new value from data-driven insights. The datasets being combined can come from different organizations, or they can come from data silos internal to an organization. A number of use cases are possible through data collaboration: fraud detection, advances in healthcare research, real-world data, cross-selling, churn analysis, etc. However, there are significant blockers in realizing the potential benefits of data collaboration. Some of these blockers are so severe that they can stymie potentially valuable collaborations. The blockers originate from a host of areas -- fear of loss of IP (intellectual property), privacy regulations, data residency restrictions, and reputational risk (just to name a few).
Advancing Microbiome Research Through Data Collaboration
The National Microbiome Data Collaborative (NMDC), a new initiative aimed at empowering microbiome research, is gearing up its pilot phase after receiving $10 million from the U.S. Department of Energy (DOE) Office of Science. Spearheaded by Lawrence Berkeley National Laboratory (Berkeley Lab), in partnership with Los Alamos (LANL), Oak Ridge (ORNL), and Pacific Northwest (PNNL) national laboratories, the NMDC will leverage DOE's existing data-science resources and high-performance computing systems to develop a framework that facilitates more efficient use of microbiome data for applications in energy, environment, health, and agriculture. Nearly every ecosystem and organism on Earth hosts a diverse community of microorganisms – its microbiome. Yet we know little about the functions of individual microbes, let alone how they interact with each other, their hosts, or their environments, and how their activity varies over time or in response to perturbations. The past decade has seen tremendous advances in genome and metagenome DNA-sequencing technologies, which has led to an unprecedented volume of microbiome data being generated.