Goto

Collaborating Authors

 opus


OPUS: Occupancy Prediction Using a Sparse Set

Neural Information Processing Systems

Occupancy prediction, aiming at predicting the occupancy status within voxelized 3D environment, is quickly gaining momentum within the autonomous driving community. Mainstream occupancy prediction works first discretize the 3D environment into voxels, then perform classification on such dense grids. However, inspection on sample data reveals that the vast majority of voxels is unoccupied.


Photonic Differential Privacy with Direct Feedback Alignment

Neural Information Processing Systems

Optical Processing Units (OPUs) -- low-power photonic chips dedicated to large scale random projections -- have been used in previous work to train deep neural networks using Direct Feedback Alignment (DFA), an effective alternative to backpropagation. Here, we demonstrate how to leverage the intrinsic noise of optical random projections to build a differentially private DFA mechanism, making OPUs a solution of choice to provide a \emph{private-by-design} training. We provide a theoretical analysis of our adaptive privacy mechanism, carefully measuring how the noise of optical random projections propagates in the process and gives rise to provable Differential Privacy. Finally, we conduct experiments demonstrating the ability of our learning procedure to achieve solid end-task performance.


OPUS: Occupancy Prediction Using a Sparse Set

Neural Information Processing Systems

Mainstream occupancy prediction works first discretize the 3D environment into voxels, then perform classification on such dense grids. However, inspection on sample data reveals that the vast majority of voxels is unoccupied.



Can Large Language Models Understand As Well As Apply Patent Regulations to Pass a Hands-On Patent Attorney Test?

Khera, Bhakti, Alamian, Rezvan, Scherz, Pascal A., Goetz, Stephan M.

arXiv.org Artificial Intelligence

The legal field already uses various large language models (LLMs) in actual applications, but their quantitative performance and reasons for it are underexplored. We evaluated several open-source and proprietary LLMs -- including GPT-series, Anthropic, Deepseek and Llama-3, variants -- on parts of the European Qualifying Examination (EQE) for future European Patent Attorneys. OpenAI o1 led with 0.82 accuracy and 0.81 F1 score, whereas (Amazon Web Services) AWS Llama 3.1 8B lagged at 0.50 accuracy, and a Python-deployed Llama 3.1 8B scored 0.55. The latter two are within the range of mere guessing for the two-answer forced-choice design. None of the evaluated models could have passed the examination fully, as accuracy never exceeded the average threshold of 0.90 required for professional-level standards -- also not models that are regularly promoted for their assumed beyond-PhD- and bar-admitted-lawyer-level performance. GPT-4o excelled at integrating text and graphics, while Claude 3 Opus often lost formatting coherence. Human patent experts evaluated the textual justifications and uncovered various critical shortcomings of each model. They valued clarity and legal rationale over the raw correctness of the answers, which revealed misalignment between automatic metrics and expert judgment. Model outputs were sensitive to modest temperature changes and prompt wording, which underscores the remaining necessity of expert oversight. Future work should target logical consistency, robust multimodality, and adaptive prompting to approach human-level patent proficiency. In summary, despite the outstanding performance of recent large models, the general public might overestimate their performance. The field has a long way to go to develop a virtual patent attorney. This paper wants to point out several specific limitations that need solutions.


Camera Control at the Edge with Language Models for Scene Understanding

Buynitsky, Alexiy, Ehsani, Sina, Pallakonda, Bhanu, Mishra, Pragyana

arXiv.org Artificial Intelligence

In this paper, we present Optimized Prompt-based Unified System (OPUS), a framework that utilizes a Large Language Model (LLM) to control Pan-Tilt-Zoom (PTZ) cameras, providing contextual understanding of natural environments. To achieve this goal, the OPUS system improves cost-effectiveness by generating keywords from a high-level camera control API and transferring knowledge from larger closed-source language models to smaller ones through Supervised Fine-Tuning (SFT) on synthetic data. This enables efficient edge deployment while maintaining performance comparable to larger models like GPT-4. OPUS enhances environmental awareness by converting data from multiple cameras into textual descriptions for language models, eliminating the need for specialized sensory tokens. In benchmark testing, our approach significantly outperformed both traditional language model techniques and more complex prompting methods, achieving a 35% improvement over advanced techniques and a 20% higher task accuracy compared to closed-source models like Gemini Pro. The system demonstrates OPUS's capability to simplify PTZ camera operations through an intuitive natural language interface. This approach eliminates the need for explicit programming and provides a conversational method for interacting with camera systems, representing a significant advancement in how users can control and utilize PTZ camera technology.


Claude Fans Threw a Funeral for Anthropic's Retired AI Model

WIRED

On July 21 at 9 am PT, Anthropic retired Claude 3 Sonnet, a lightweight model known for being quick and cost-effective. On Saturday, in a large warehouse in San Francisco's SOMA district, more than 200 people gathered to mourn its passing. The star-studded funeral was put on by a group of Claude fanatics and Gen Z founders, one of whom told me he dropped out of college after learning about artificial general intelligence. Attendees included Amanda Askell, an Anthropic researcher who has jokingly called herself the "Fairy Claudemother," staffers from Anthropic and OpenAI, and high-profile X posters including the writer Noah Smith. The warehouse was dimly lit, with a tentacle from a shoggoth (a fictional H.P. Lovecraft creature that's become a popular metaphor for AI models) hanging from the ceiling.


GeistBERT: Breathing Life into German NLP

Scheible-Schmitt, Raphael, Frei, Johann

arXiv.org Artificial Intelligence

Advances in transformer-based language models have highlighted the benefits of language-specific pre-training on high-quality corpora. In this context, German NLP stands to gain from updated architectures and modern datasets tailored to the linguistic characteristics of the German language. GeistBERT seeks to improve German language processing by incrementally training on a diverse corpus and optimizing model performance across various NLP tasks. We pre-trained GeistBERT using fairseq, following the RoBERTa base configuration with Whole Word Masking (WWM), and initialized from GottBERT weights. The model was trained on a 1.3 TB German corpus with dynamic masking and a fixed sequence length of 512 tokens. For evaluation, we fine-tuned the model on standard downstream tasks, including NER (CoNLL 2003, GermEval 2014), text classification (GermEval 2018 coarse/fine, 10kGNAD), and NLI (German XNLI), using $F_1$ score and accuracy as evaluation metrics. GeistBERT achieved strong results across all tasks, leading among base models and setting a new state-of-the-art (SOTA) in GermEval 2018 fine text classification. It also outperformed several larger models, particularly in classification benchmarks. To support research in German NLP, we release GeistBERT under the MIT license.


OPUS: Occupancy Prediction Using a Sparse Set

Neural Information Processing Systems

Occupancy prediction, aiming at predicting the occupancy status within voxelized 3D environment, is quickly gaining momentum within the autonomous driving community. Mainstream occupancy prediction works first discretize the 3D environment into voxels, then perform classification on such dense grids. However, inspection on sample data reveals that the vast majority of voxels is unoccupied. To this end, we present a novel perspective on the occupancy prediction task: formulating it as a streamlined set prediction paradigm without the need for explicit space modeling or complex sparsification procedures. Our proposed framework, called OPUS, utilizes a transformer encoder-decoder architecture to simultaneously predict occupied locations and classes using a set of learnable queries.


Anthropic's newest Claude AI models are experts at programming

PCWorld

Yesterday in an announcement blog post, AI company Anthropic unveiled Claude 4, its new generation of AI models consisting of Claude 4 Opus and Claude 4 Sonnet with a range of new abilities. Both Claude 4 models are hybrid models, which means they're capable of giving you short-and-quick answers or thinking longer on their responses with deeper reasoning. Claude 4 Opus is excellent at solving complex problems and at programming. The model can maintain its performance in long tasks over several hours with thousands of different steps. Meanwhile, Anthropic says Claude 4 Sonnet is a huge upgrade over Claude 3.7 Sonnet's abilities.