Goto

Collaborating Authors

 chhabria


Judges Don't Know What AI's Book Piracy Means

The Atlantic - Technology

More than 40 lawsuits have been filed against AI companies since 2022. Late last month, there were rulings on two of these cases, first in a lawsuit against Anthropic and, two days later, in one against Meta. Both of the cases were brought by book authors who alleged that AI companies had trained large language models using authors' work without consent or compensation. In each case, the judges decided that the tech companies were engaged in "fair use" when they trained their models with authors' books. Both judges said that the use of these books was "transformative"--that training an LLM resulted in a fundamentally different product that does not directly compete with those books.


What comes next for AI copyright lawsuits?

MIT Technology Review

On the other side, plaintiffs range from individual artists and authors to large companies like Getty and the New York Times. The outcomes of these cases are set to have an enormous impact on the future of AI. In effect, they will decide whether or not model makers can continue ordering up a free lunch. If not, they will need to start paying for such training data via new kinds of licensing deals--or find new ways to train their models. And that's why last week's wins for the technology companies matter. If you drill into the details, the rulings are less cut-and-dried than they seem at first.


Two New Legal Rulings Are Bad News for Your Favorite Authors

Slate

Judge Vince Chhabria sided with Meta but appeared to do so regretfully, stating that Meta's use of the writers' work to train its bots isn't necessarily legal but that the plaintiffs "made the wrong arguments."


Meta wins AI copyright lawsuit as US judge rules against authors

The Guardian

However, the ruling offered some hope for American creative professionals who argue that training AI models on their work without permission is illegal. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." A Meta spokesperson said the company appreciated the decision and called fair use a "vital legal framework" for building "transformative" AI technology. The authors sued Meta in 2023, arguing the company misused pirated versions of their books to train its AI system Llama without permission or compensation. Get set for the working day – we'll point you to all the business news and analysis you need every morning Chhabria expressed sympathy for that argument during a hearing in May, which he reiterated on Wednesday.


Meta Wins Blockbuster AI Copyright Case--but There's a Catch

WIRED

He concluded that the plaintiffs did not present sufficient evidence that Meta's use of their books was harmful.


A Judge Says Meta's AI Copyright Case Is About 'the Next Taylor Swift'

WIRED

US District Court Judge Vince Chhabria spent several hours grilling lawyers from both sides after they each filed motions for partial summary judgment, meaning they want Chhabria to rule on specific issues of the case rather than leaving each one to be decided at trial. The authors allege that Meta illegally used their work to build its generative AI tools, emphasizing that the company pirated their books through "shadow libraries" like LibGen. Kadrey v. Meta is one of the dozens of lawsuits filed against AI companies that are winding through the US legal system. While the authors were heavily focused on the piracy element of the case, Chhabria spoke emphatically about his belief that the big question is whether Meta's AI tools will hurt book sales and otherwise cause the authors to lose money. "If you are dramatically changing, you might even say obliterating, the market for that person's work, and you're saying that you don't even have to pay a license to that person to use their work to create the product that's destroying the market for their work--I just don't understand how that can be fair use," he told Meta lawyer Kannon Shanmugam.


Zuckerberg approved Meta's use of 'pirated' books to train AI models, authors claim

The Guardian

Citing internal Meta communications, the filing claims that the social network company's chief executive backed the use of the LibGen dataset, a vast online archive of books, despite warnings within the company's AI executive team that it is a dataset "we know to be pirated". The internal message says that using a database containing pirated material could weaken the Facebook and Instagram owner's negotiations with regulators, according to the filing. "Media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, may undermine our negotiating position with regulators." The authors sued Meta in 2023, arguing that the social media company misused their books to train Llama, the large language model that powers its chatbots. The Library Genesis, or LibGen, dataset is a "shadow library" that originated in Russia and claims to contain millions of novels, nonfiction books and science magazine articles.


Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

WIRED

Against the company's wishes, a court unredacted information alleging that Meta used Library Genesis (LibGen), a notorious so-called shadow library of pirated books that originated in Russia, to help train its generative AI language models. Its outcome, along with those of dozens of similar cases working their way through courts in the United States, will determine whether technology companies can legally use creative works to train AI moving forward and could either entrench AI's most powerful players or derail them. Vince Chhabria, a judge for the United States District Court for the Northern District of California, ordered both Meta and the plaintiffs on Wednesday to file full versions of a batch of documents after calling Meta's approach to redacting them "preposterous," adding that, for the most part, "there is not a single thing in those briefs that should be sealed." Chhabria ruled that Meta was not pushing to redact the materials in order to protect its business interests but instead to "avoid negative publicity." The documents were originally filed late last year but remained publicly unavailable until now.


A Machine Learning Approach to Improving Timing Consistency between Global Route and Detailed Route

Chhabria, Vidya A., Jiang, Wenjing, Kahng, Andrew B., Sapatnekar, Sachin S.

arXiv.org Artificial Intelligence

Due to the unavailability of routing information in design stages prior to detailed routing (DR), the tasks of timing prediction and optimization pose major challenges. Inaccurate timing prediction wastes design effort, hurts circuit performance, and may lead to design failure. This work focuses on timing prediction after clock tree synthesis and placement legalization, which is the earliest opportunity to time and optimize a "complete" netlist. The paper first documents that having "oracle knowledge" of the final post-DR parasitics enables post-global routing (GR) optimization to produce improved final timing outcomes. To bridge the gap between GR-based parasitic and timing estimation and post-DR results during post-GR optimization, machine learning (ML)-based models are proposed, including the use of features for macro blockages for accurate predictions for designs with macros. Based on a set of experimental evaluations, it is demonstrated that these models show higher accuracy than GR-based timing estimation. When used during post-GR optimization, the ML-based models show demonstrable improvements in post-DR circuit performance. The methodology is applied to two different tool flows - OpenROAD and a commercial tool flow - and results on 45nm bulk and 12nm FinFET enablements show improvements in post-DR slack metrics without increasing congestion. The models are demonstrated to be generalizable to designs generated under different clock period constraints and are robust to training data with small levels of noise.