datagen
Is 'fake data' the real deal when training algorithms?
You're at the wheel of your car but you're exhausted. Your shoulders start to sag, your neck begins to droop, your eyelids slide down. As your head pitches forward, you swerve off the road and speed through a field, crashing into a tree. But what if your car's monitoring system recognised the tell-tale signs of drowsiness and prompted you to pull off the road and park instead? The European Commission has legislated that from this year, new vehicles be fitted with systems to catch distracted and sleepy drivers to help avert accidents.
- Oceania > Australia (0.05)
- North America > United States > Pennsylvania (0.05)
- North America > United States > Arizona (0.05)
Surveillance AI needs fake data to track people. These companies are supplying it.
Companies are building software that uses AI to monitor people's behavior and interpret their emotions and body language in real life, virtually and even in the metaverse. But to develop that AI, they need fake data, and startups are stepping in to supply it. Synthetic data companies are providing millions of images, videos and sometimes audio data samples that have been generated for the sole purpose of training or improving AI models that could become part of our everyday lives in controversial forms of AI such as facial recognition, emotion AI and other algorithmic systems used to keep track of people's behavior. While in the past companies building computer vision-based AI often relied on publicly available datasets, now AI developers are looking to customized synthetic data to "address more and more domain-specific problems that have zero data you can actually access," said Ofir Zuk, co-founder and CEO of synthetic data company Datagen. Synthetic data companies including Datagen, Mindtech and Synthesis AI represent a corner of an increasingly compartmentalized AI industry.
- Transportation > Ground (0.47)
- Banking & Finance > Capital Markets (0.47)
Datagen nabs $50 million to provide synthetic data for computer vision
To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. New York-headquartered Datagen has raised $50 million in its series B funding to strengthen its platform and meet the growing demand for synthetic data in the broader AI space. Today, every organization understands an AI model is only as good as the data it is trained upon. Companies give particular focus on sourcing and annotating data correctly, but when it comes to computer vision models, the task becomes twice as difficult. This is largely due to the scarcity of high-quality 2D and 3D visual training data.
Datagen nabs $50 million to provide synthetic data for computer vision
New York-headquartered Datagen has raised $50 million in its series B funding to strengthen its platform and meet the growing demand for synthetic data in the broader AI space. Today, every organization understands an AI model is only as good as the data it is trained upon. Companies give particular focus on sourcing and annotating data correctly, but when it comes to computer vision models, the task becomes twice as difficult. This is largely due to the scarcity of high-quality 2D and 3D visual training data. A study conducted by Datagen itself found 99% of computer vision (CV) teams have had a machine learning (ML) project canceled due to insufficient training data while 100% saw delays due to the same problem.
MacBook M1 Pro vs. Google Colab for Data Science -- Should You Buy the Latest from Apple?
You'll need TensorFlow installed if you're following along. Here's an entire article dedicated to installing TensorFlow on Apple M1: Also, you'll need an image dataset. I've used the Dogs vs. Cats dataset from Kaggle, which is licensed under the Creative Commons License. Long story short, you can use it for free. We'll do two tests today: Let's go over the code used in the tests.
Report: Computer vision teams worldwide say projects are delayed by insufficient data
According to new research by Datagen, 99% of computer vision (CV) teams have had a machine learning (ML) project canceled due to insufficient training data. Delays, meanwhile, appear truly ubiquitous, with 100% of teams reporting experiencing significant project delays due to insufficient training data. The research also indicates that these training data challenges come in many forms and affect CV teams in near-equal measure. The top issues experienced by CV teams include poor annotation (48%), inadequate domain coverage (47%), and simple scarcity (44%). The scarcity of robust, domain-specific training data is only compounded by the fact that the field of computer vision is lacking many well-defined standards or best practices. When asked how training data is typically gathered at their organizations, respondents revealed a patchwork of sources and methodologies are being employed both across the field and within individual organizations.
Taking the world by simulation: The rise of synthetic data in AI
The survey's findings are based on responses from people working in the computer vision industry. However, the findings of the survey are of broader interest. First, because there is a broad spectrum of markets that are dependent upon computer vision, including extended reality, robotics, smart vehicles, and manufacturing. And second, because the approach of generating synthetic data for AI applications could be generalized beyond computer vision. Datagen, a company that specialized in simulated synthetic data, recently commissioned Wakefield Research to conduct an online survey of 300 computer vision professionals to better understand how they obtain and use AI/ML training data for computer vision systems and applications, and how those choices impact their projects.
Drive Synthetic Data Boom: Top Predictions for 2022 by Synthetic Data Innovator Datagen
Synthetic data is in for a banner year, as businesses look to leverage AI for a growing number of increasingly-sophisticated applications, including tackling the world's supply-chain disruptions, reinventing automotive safety, and creating a whole new class of intelligent consumer goods with the metaverse at the fore TEL AVIV, Israel, Nov. 30, 2021 (GLOBE NEWSWIRE) -- Datagen, the pioneer of domain-specific synthetic data for humans and object perception, today released its new year's predictions for the fields of Artificial Intelligence, Machine Learning, and Computer Vision. As AI makes its way into ubiquitous adoption by a growing number of industries and applications, the demand for robust training data will expand accordingly. However, with manual data collection already at the limits of its own utility, the race for AI supremacy will only serve to widen the existing gulf between supply and demand. At the same time, companies like Datagen are making it easier and more affordable to generate high-quality synthetic datasets to train computer vision (CV) AI models. The ability to generate tens of thousands of synthetic images -- customized to suit the unique parameters of each distinct application -- makes synthetic data the obvious solution to the limitations of traditional, manually-collected data.
- North America > United States (0.30)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.25)
- Automobiles & Trucks (0.69)
- Government > Regional Government (0.48)
Gil Elbaz, Co-founder & CTO of Datagen – Interview Series
Gil's thesis research was focused on 3D Computer Vision and has been published at CVPR, the top computer vision research conference in the world. Datagen is a pioneer in the new field of Simulated Data, a subset of synthetic data, which concentrates on photo-realistically recreating the world around us. The company launched from stealth with over $18M in funding in March 2021 and is now working with a number of Fortune 100 companies in augmented/virtual reality, robotics, and automotive, including the majority of the top U.S. tech giants. What initially attracted you to robotics and machine learning? Sci-Fi books, like Isaac Asimov's Foundation Series and iRobot always got me thinking about a future in which robots were an integral part of our day-to-day lives.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.06)
- North America > United States > Hawaii (0.05)
- Asia > Middle East > Israel > Haifa District > Haifa (0.05)
Datagen emerges from stealth to create synthetic datasets for computer vision models
Datagen, a Tel Aviv, Israel-based startup offering a platform to create synthetic computer vision system training data, today emerged from stealth with $18.5 million in funding from TLV Partners and Viola Ventures. The company says the proceeds will be put toward growing its R&D lab while it expands into new markets globally. Datagen, which Ofir Chakon and Gil Elbaz founded in 2018, leverages computer graphics and data generation to simulate the real world with datasets that include 2D and 3D annotations. By combining generative adversarial networks (GANs) with reinforcement learning-driven humanoid motion algorithms within a physical simulator, Datagen says it can deliver photorealistic, scalable datasets suitable for augmented and virtual reality, internet of things, smart store, robotics, and smart car use cases. GANs are two-part AI models consisting of a generator that creates samples and a discriminator that attempts to differentiate between the generated samples and real-world samples.
- Transportation > Ground > Road (0.36)
- Information Technology > Smart Houses & Appliances (0.36)