What you need to know
- A former OpenAI employee recently published a blog post highlighting the firm’s transgressions, including breaking copyright law by using internet data to train ChatGPT.
- The report suggests OpenAI relies on technicalities in copyright law to continue using copyrighted content and internet data to train AI models without authorization or compensation.
- He also highlighted AI-generated content’s role in ruining the internet, including inaccurate information.
Amid bankruptcy reports and efforts to restructure its business model into a for-profit venture, high-profile employees continue to depart from OpenAI. Suchir Balaji recently left OpenAI to work on “personal projects.”
Balaji joined the ChatGPT maker shortly after graduating from UC Berkeley, hoping to be part of the team that leverages generative AI’s cutting-edge capabilities to cure diseases and potentially stop aging. He predominantly worked on OpenAI’s GPT-4 model, described as “mildly embarrassing at best,” with Sam Altman admitting that it “kind of sucks.”
However, the 25-year-old departed from the AI firm after realizing his goals weren’t aligned with the company’s. While speaking to the New York Times, Balaji indicated:
“AI companies are destroying the commercial viability of the individuals, businesses, and internet services that created the digital data used to train these A.I. systems.”
He blatantly claimed OpenAI breaks the U.S. copyright law, a serious allegation coming from someone who’s worked at the company. This isn’t the first time OpenAI has been under fire for copyright infringement issues. The ChatGPT maker is fighting several copyright infringement lawsuits in court alongside Microsoft.
OpenAI CEO Sam Altman previously admitted developing tools like ChatGPT is virtually impossible without copyrighted content. He added that copyright law doesn’t categorically prohibit training AI models using copyrighted content.
Is AI model training using copyrighted content fair use?
In Balaji’s blog, he attempted to highlight how OpenAI was breaking copyright law. Through his analysis, the former OpenAI staffer established that the information generated using ChatGPT doesn’t meet the “fair use” threshold. For context, “fair use” is a standard set that warrants limited use of copyrighted content without the author’s accent.
Following Balaji’s copyright infringement claims, OpenAI issued the following statement to Gizmodo:
“We build our A.I. models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.”OpenAI and Microsoft constantly argue that using copyrighted content from the internet to train their AI models falls under fair use. However, Balaji seems to have a different opinion. While he admits that the information generated from the AI systems isn’t directly lifted from the source, it’s not original either. Balaji argues that AI-generated content is reminiscent of copyrighted material, and by this standard, it is illegal under copyright law.
Aside from his copyright concern, Balaji highlighted his concerns over the potential impact of AI tools like ChatGPT on the internet. A former Google Engineer warned that OpenAI’s temporary prototype search tool, SearchGPT, could potentially give Google a run for its money in the foreseeable future amid antitrust regulation after being classified as an illegal monopoly in search. He also highlighted AI is prone to generating inaccurate and misleading information. “If you believe what I believe, you have to just leave the company,” Balaji added.