Google has recently updated its privacy policy, outlining its intention to scrape a wide range of online content for the purpose of enhancing its AI tools. The new policy explicitly states that anything users post online may be used by Google to train AI models and develop new products and features. This means that if Google can read your words, they now belong to the company, and they could be incorporated into their chatbot algorithms.
The revised Google policy acknowledges the company’s reliance on publicly available information to fuel its AI advancements. Previously, the policy mentioned the use of data for “language models,” but it now specifies “AI models.” Additionally, while Google Translate was previously the only service mentioned, the updated policy introduces two more features: Bard and Cloud AI capabilities.
This move by Google represents an unprecedented clause in a privacy policy. Typically, these policies outline how a company utilizes the information shared on its own platform. However, Google’s updated policy suggests that the company reserves the right to harvest and utilize data from any public part of the internet. In other words, the entire web becomes Google’s AI playground. Despite this notable change, Google has not yet provided a comment regarding the updated policy.
This development raises intriguing privacy concerns. While people generally understand that public posts are accessible to others, the concept of posting something online now requires a new perspective. It is no longer just a matter of who can view the information, but rather how it could be exploited. It is entirely possible that Bard and ChatGPT, Google’s chatbots, have already ingested your long-forgotten blog posts or decade-old restaurant reviews. These chatbots may be regurgitating an altered version of your words in unpredictable and often perplexing ways.
The emergence of data-hungry chatbots also brings up a less apparent complication: the origin of their information. Tech giants such as Google and OpenAI have relied on scraping extensive portions of the internet to fuel their chatbot algorithms. However, the legality of this practice remains uncertain, and copyright questions that were once considered science fiction are now being contested in courts. As a result, consumers are already experiencing unexpected consequences.
Twitter and Reddit are two prominent platforms that have taken drastic measures in response to the AI data collection issue. Both companies have restricted free access to their application programming interfaces (APIs), preventing users from freely downloading large amounts of posts. While this move aims to safeguard intellectual property, it has inadvertently affected users who relied on third-party tools to access these platforms. Twitter even briefly contemplated charging public entities, such as weather, transit, and emergency services, for the ability to tweet, but it swiftly reversed its decision following widespread criticism.
In recent times, web scraping has become Elon Musk’s primary concern. Musk has attributed several Twitter mishaps to the unauthorized extraction of data from his site, even when the incidents seem unrelated. To combat this, Twitter recently imposed limitations on the number of tweets users can view per day, rendering the service nearly unusable. Musk defended this action as a response to “data scraping” and “system manipulation.” However, many IT experts believe that the rate-limiting was primarily a crisis response to technical issues resulting from mismanagement or incompetence. Twitter has yet to address inquiries from Gizmodo regarding this matter.
Read More: 13 January in World History
Unanswered Questions Surrounding Google’s Data Scraping Policy
As Google continues to assert its right to scrape and utilize user data from across the internet, questions surrounding privacy and legal implications remain unanswered. With the rise of AI technologies and the pervasive nature of data collection, it is crucial to examine the boundaries of personal information and how it can be exploited in the realm of artificial intelligence. As users navigate the digital landscape, a reassessment of what it means to share information online is essential, as the consequences of data utilization by tech giants like Google may be far-reaching and difficult to predict.