Microsoft’s Mustafa Suleyman Says Everything On The Internet Can Be Used For Free To Train AI Models.

Mustafa Suleyman, the CEO of Microsoft’s new AI division, recently said in an interview with CNBC’s Andrew Ross that anything you publish on the internet becomes ‘freeware’ and that it can be copied and used to train AI models.

When asked if “AI companies have effectively stolen the world’s IP”, the Google Deepmind co-founder said, “With respect to content that is already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That’s been the understanding.”

He went on to say that unless a publisher or a news organisation explicitly asks not to scrape or crawl their content for anything other than indexing to make content visible to other people, it can be freely used to train AI models. This might suggest that Microsoft, alongside other AI companies like Perplexity, Google and OpenAI think it is okay to train their AI models on content available on the web without having to pay the creator.

Currently, one of the biggest controversies concerning AI chatbots like ChatGPT, Gemini and Copilot is that generative AI companies might be scraping copyrighted data and using it to train their upcoming AI models.

In the last few months, several organisations and publications like Forbes, the New York Times and the Recording Industry Association of America have filed lawsuits against the likes of Microsoft, ChatGPT maker OpenAI, Perplexity, Udio and others, saying that these companies have been using their content to train their AI models without permission.

Source: https://indianexpress.com/article/technology/artificial-intelligence/microsoft-ai-head-mustafa-suleyman-internet-content-ai-scraping-training-9422265/