Goto

Collaborating Authors

 Generative AI


Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models

arXiv.org Artificial Intelligence

Legal practice requires careful adherence to procedural rules. In the United States, few are more complex than those found in The Bluebook: A Uniform System of Citation. Compliance with this system's 500+ pages of byzantine formatting instructions is the raison d'etre of thousands of student law review editors and the bete noire of lawyers everywhere. To evaluate whether large language models (LLMs) are able to adhere to the procedures of such a complicated system, we construct an original dataset of 866 Bluebook tasks and test flagship LLMs from OpenAI, Anthropic, Google, Meta, and DeepSeek. We show (1) that these models produce fully compliant Bluebook citations only 69%-74% of the time and (2) that in-context learning on the Bluebook's underlying system of rules raises accuracy only to 77%. These results caution against using off-the-shelf LLMs to automate aspects of the law where fidelity to procedure is paramount.


Beyond the model: Key differentiators in large language models and multi-agent services

arXiv.org Artificial Intelligence

With the launch of foundation models like DeepSeek, Manus AI, and Llama 4, it has become evident that large language models (LLMs) are no longer the sole defining factor in generative AI. As many now operate at comparable levels of capability, the real race is not about having the biggest model but optimizing the surrounding ecosystem, including data quality and management, computational efficiency, latency, and evaluation frameworks. This review article delves into these critical differentiators that ensure modern AI services are efficient and profitable.


Real-time Spatial Retrieval Augmented Generation for Urban Environments

arXiv.org Artificial Intelligence

The proliferation of Generative Artificial Ingelligence (AI), especially Large Language Models, presents transformative opportunities for urban applications through Urban Foundation Models. However, base models face limitations, as they only contain the knowledge available at the time of training, and updating them is both time-consuming and costly. Retrieval Augmented Generation (RAG) has emerged in the literature as the preferred approach for injecting contextual information into Foundation Models. It prevails over techniques such as fine-tuning, which are less effective in dynamic, real-time scenarios like those found in urban environments. However, traditional RAG architectures, based on semantic databases, knowledge graphs, structured data, or AI-powered web searches, do not fully meet the demands of urban contexts. Urban environments are complex systems characterized by large volumes of interconnected data, frequent updates, real-time processing requirements, security needs, and strong links to the physical world. This work proposes a real-time spatial RAG architecture that defines the necessary components for the effective integration of generative AI into cities, leveraging temporal and spatial filtering capabilities through linked data. The proposed architecture is implemented using FIWARE, an ecosystem of software components to develop smart city solutions and digital twins. The design and implementation are demonstrated through the use case of a tourism assistant in the city of Madrid. The use case serves to validate the correct integration of Foundation Models through the proposed RAG architecture.


Generative AI in clinical practice: novel qualitative evidence of risk and responsible use of Google's NotebookLM

arXiv.org Artificial Intelligence

Figure 1 presents examples of NotebookLM's shortcomings Importantly, using NotebookLM to educate medical professionals presently risks of misleading them, as NotebookLM's lack Inaccurate responses given by NotebookLM to user queries; output is stylized for visual clarity. NotebookLM advises the user to tell their patients that eating rocks is healthy, citing the user's document. Passages from Dihan et al. advocating for use of NotebookLM (Column 1) which are associated with clinical and/or ethical concerns "Though NotebookLM is a commercial entity that does not abide by patient privacy regulations, it does represent an " A podcast generator can improve the way Given any set of documents, and especially those containing complex documents, LLMs may misinterpret and subsequently misrepresent some of their contents. "Rather than requiring active visual engagement through reading, podcasts allow NotebookLM can neither identify misinformation contained within uploaded files nor incorporate relevant information beyond the uploaded content. "[NotebookLM's] citations are automatically generated for all content that NotebookLM pulls from within these materials, No funding was received for the publication of this article.


CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models

arXiv.org Artificial Intelligence

Advancements in generative Artificial Intelligence (AI) hold great promise for automating radiology workflows, yet challenges in interpretability and reliability hinder clinical adoption. This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system to bridge AI performance with clinical explainability. CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification. Meanwhile, the RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports. Our demonstration showcases the system's ability to deliver interpretable predictions, mitigate hallucinations, and generate high-quality, tailored reports with an interactive interface addressing accuracy, trust, and usability challenges. This framework provides a pathway to improving diagnostic consistency and empowering radiologists with actionable insights.


Securing Agentic AI: A Comprehensive Threat Model and Mitigation Framework for Generative AI Agents

arXiv.org Artificial Intelligence

--As generative AI (GenAI) agents become more common in enterprise settings, they introduce security challenges that differ significantly from those posed by traditional systems. These agents aren't just LLMs--they reason, remember, and act, often with minimal human oversight. This paper introduces a comprehensive threat model tailored specifically for GenAI agents, focusing on how their autonomy, persistent memory access, complex reasoning, and tool integration create novel risks. This research work identifies 9 primary threats and organizes them across five key domains: cognitive architecture vulnerabilities, temporal persistence threats, operational execution vulnerabilities, trust boundary violations, and governance circumvention. These threats aren't just theoretical--they bring practical challenges such as delayed exploitability, cross-system propagation, cross system lateral movement, and subtle goal misalignments that are hard to detect with existing frameworks and standard approaches. T o help address this, the research work present two complementary frameworks: A TF AA (Advanced Threat Framework for Autonomous AI Agents), which organizes agent-specific risks, and SHIELD, a framework proposing practical mitigation strategies designed to reduce enterprise exposure. While this work builds on existing work in LLM and AI security, the focus is squarely on what makes agents different--and why those differences matter . Ultimately, this research argues that GenAI agents require a new lens for security. If we fail to adapt our threat models and defenses to account for their unique architecture and behavior, we risk turning a powerful new tool into a serious enterprise liability. Generative AI (GenAI) agents are emerging as a new category of enterprise technology.


OpenAI says non-profit will remain in control after backlash

BBC News

In the update on Monday, Mr Altman said the non-profit would continue to control OpenAI, receiving a "big", yet-to-be determined stake in OpenAI's business arm, which would give it access to money to put towards its own goals. He said the new plan would still allow the organisation to stop operating under its current complex governance structure, which had capped its profits. That was seen as a sticking point for investors, including Microsoft, that hindered the firm's ability to raise money. "We are moving to a normal capital structure where everyone has stock," he wrote in a letter to staff that was shared on the OpenAI website. "This is not a sale, but a change of structure to something simpler."


OpenAI Backs Down on Restructuring Amid Pushback

WIRED

OpenAI on Monday announced a proposed restructuring that would give its nonprofit arm ongoing control of ChatGPT and the rest of the startup's AI products. The move is a reversal of an earlier announcement which called for the nonprofit to relinquish its authority to a newly created public-benefit corporation. The proposed company structure has to be approved by the attorney general offices in California and Delaware by early next year. Up to 30 billion in funding from SoftBank and other investors is contingent on this approval. That money is crucial for OpenAI to maintain its position as a leader in generative AI and give higher returns to investors.


OpenAI reverses course and says non-profit arm will retain control of firm

The Guardian

OpenAI has reversed course in the process of transforming into a for-profit entity, announcing on Monday that its non-profit arm would continue to control the business that makes ChatGPT and other artificial intelligence (AI) products. Previously, the company had sought more independence for its for-profit division. "We made the decision for the nonprofit to stay in control after hearing from civic leaders and having discussions with the offices of the Attorneys General of California and Delaware," said CEO Sam Altman in a letter to employees. Altman and the chair of OpenAI's non-profit board, Bret Taylor, said the board made the choice for the non-profit to retain control of OpenAI. A press release from the company said that the for-profit portion of the company, through which Altman has been able to raise billions to fund OpenAI's work, would transition to a public benefit corporation, a mission-driven designation for a corporate structure that is still aimed at profit but also "has to consider the interests of both shareholders and the mission".


Inside the Battle Over OpenAI's Corporate Restructuring

WIRED

Last October, the news that OpenAI was planning to simplify its unusual nonprofit structure caught the attention of economic-justice activist Orson Aguilar. He feared that the ChatGPT maker's plan to transition into a more conventional company, from which investors could generate unlimited returns, would financially hurt the working-class communities he has spent nearly 30 years fighting to protect. Aguilar's new organization, LatinoProsperity, focuses on intergenerational wealth building, and he believed cutting-edge AI chatbots such as ChatGPT would become an integral part of many good-paying jobs of the future. But after reading about OpenAI's desires, he worried that transitioning into a public-benefit corporation empowered to chase profits would enrich the already wealthy and neglect the startup's stated mission to benefit all of humanity with AI. Aguilar decided to make a phone call that day, kicking off a series of events that eventually led him to become one of the leading voices battling over OpenAI's future and the establishment of what may become the deepest-pocketed charitable foundation in the world. Today, OpenAI's for-profit business is controlled by a nonprofit, and the returns for investors are capped.