OpenAI's hunger for data is coming back to bite it
In AI development, the dominant paradigm is that the more training data, the better. OpenAI's GPT-2 model had a data set consisting of 40 gigabytes of text. GPT-3, which ChatGPT is based on, was trained on 570 GB of data. OpenAI has not shared how big the data set for its latest model, GPT-4, is. But that hunger for larger models is now coming back to bite the company. In the past few weeks, several Western data protection authorities have started investigations into how OpenAI collects and processes the data powering ChatGPT.
Apr-19-2023, 09:50:32 GMT
- Country:
- Europe > Italy (0.18)
- North America > United States
- California (0.06)
- South America > Brazil (0.06)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: