Austrian synthetic data startup MOSTLY AI today announced that it has raised a $25 million Series B round. British VC firm Molten Ventures led the operation, with participation from new investor Citi Ventures. Two existing investors also returned: Munich-based 42CAP, and Berlin-based Earlybird, which had led MOSTLY AI's $5 million Series A round in 2020. Synthetic data is fake data, but not random: MOSTLY AI uses artificial intelligence to achieve a high degree of fidelity to its clients' databases. Its data sets "look just as real as a company's original customer data with just as many details, but without the original personal data points," the company says.
Designing good data-driven models hugely depends on the quality of data. Well, data is a set of numbers, and shouldn't bother the developers much. As they say, the devil lies in the details, real data comes with a set of issues like imbalanced classes, inherent biases, unstructured values, etc. On the other hand, synthetic data provides the developers with the flexibility of scalability of data and freedom from biases, opening a whole lot of possibilities for creating a model that doesn't exist in the real world. In addition, synthetic data holds the benefits of protecting user data privacy all while giving the freedom to experiment with.
Synthesis AI, a startup developing a platform that generates synthetic data to train AI systems, today announced that it raised $17 million in a Series A funding round led by 468 Capital with participation from Sorenson Ventures and Strawberry Creek Ventures, Bee Partners, PJC, iRobot Ventures, Boom Capital and Kubera Venture Capital. CEO and cofounder Yashar Behzadi says that the proceeds will be put toward product R&D, growing the company's team, and expanding research -- particularly in the area of mixed real and synthetic data. Synthetic data, or data that's created artificially rather than captured from the real world, is coming into wider use in data science as the demand for AI systems grows. The benefits are obvious: While collecting real-world data to develop an AI system is costly and labor-intensive, a theoretically infinite amount of synthetic data can be generated to fit any criteria. For example, a developer could use synthetic images of cars and other vehicles to develop a system that can differentiate between makes and models.
Imagine that data could be shared seamlessly with partners, governments, and other organisations, without breaking any data protection law, to facilitate innovation. How will it be possible to use closely guarded customer data while still maintaining the highest privacy and safety standards? Is it possible to monetise data without compromising the sensitivity of the information (or data)? The following write-up spills it all. Data is the fuel for the rapidly progressing Artificial Intelligence (AI) industry -as it is for almost all other industries. Digitisation, interconnection of the network channels, and IoT generate mountainous volumes of data at an unimagined and unprecedented scale.
Synthetic data generation (SDG) is rapidly emerging as a practical privacy enhancing technology (PET) for sharing data for secondary purposes. It does so by generating non-identifiable datasets that can be used and disclosed without the legislative need for additional consent given that these datasets would not be considered personal information. Having worked in the privacy and data anonymization space for over 15 years, the limitations of traditional de-identification methods are becoming more evident. This creates room for modern PETs that can enable the responsible processing of data for secondary purposes. There's a growing appetite from CPOs to understand where SDG fits as a PET, how it's generated, what problems it can solve, as well as how laws and regulations apply. In a nutshell, synthetic data is generated from real data.