shape
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
Token and Span Classification for Entity Recognition in French Historical Encyclopedias
Moncla, Ludovic, Zeghidi, Hédi
Named Entity Recognition (NER) in historical texts presents unique challenges due to non-standardized language, archaic orthography, and nested or overlapping entities. This study benchmarks a diverse set of NER approaches, ranging from classical Conditional Random Fields (CRFs) and spaCy-based models to transformer-based architectures such as CamemBERT and sequence-labeling models like Flair. Experiments are conducted on the GeoEDdA dataset, a richly annotated corpus derived from 18th-century French encyclopedias. We propose framing NER as both token-level and span-level classification to accommodate complex nested entity structures typical of historical documents. Additionally, we evaluate the emerging potential of few-shot prompting with generative language models for low-resource scenarios. Our results demonstrate that while transformer-based models achieve state-of-the-art performance, especially on nested entities, generative models offer promising alternatives when labeled data are scarce. The study highlights ongoing challenges in historical NER and suggests avenues for hybrid approaches combining symbolic and neural methods to better capture the intricacies of early modern French text.
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations.
SHAPE : Self-Improved Visual Preference Alignment by Iteratively Generating Holistic Winner
Chen, Kejia, Zhang, Jiawen, Hu, Jiacong, Yang, Jiazhen, Lou, Jian, Feng, Zunlei, Song, Mingli
Large Visual Language Models (LVLMs) increasingly rely on preference alignment to ensure reliability, which steers the model behavior via preference fine-tuning on preference data structured as ``image - winner text - loser text'' triplets. However, existing approaches often suffer from limited diversity and high costs associated with human-annotated preference data, hindering LVLMs from fully achieving their intended alignment capabilities. We present \projectname, a self-supervised framework capable of transforming the already abundant supervised text-image pairs into holistic preference triplets for more effective and cheaper LVLM alignment, eliminating the need for human preference annotations. Our approach facilitates LVLMs in progressively enhancing alignment capabilities through iterative self-improvement. The key design rationale is to devise preference triplets where the winner text consistently improves in holisticness and outperforms the loser response in quality, thereby pushing the model to ``strive to the utmost'' of alignment performance through preference fine-tuning. For each given text-image pair, SHAPE introduces multiple visual augmentations and pairs them with a summarized text to serve as the winner response, while designating the original text as the loser response. Experiments across \textbf{12} benchmarks on various model architectures and sizes, including LLaVA and DeepSeek-VL, show that SHAPE achieves significant gains, for example, achieving +11.3\% on MMVet (comprehensive evaluation), +1.4\% on MMBench (general VQA), and +8.0\% on POPE (hallucination robustness) over baselines in 7B models. Notably, qualitative analyses confirm enhanced attention to visual details and better alignment with human preferences for holistic descriptions.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Linearized Optimal Transport pyLOT Library: A Toolkit for Machine Learning on Point Clouds
Linwu, Jun, Khurana, Varun, Karris, Nicholas, Cloninger, Alexander
Instead, point clouds or continuous probability measures are the appropriate data structures. These data arise naturally in fields such as computer vision, image processing, shape analysis, and generative modeling, where representing complex objects as probability distributions provides a richer and more flexible framework for analysis. Real-world examples include text documents with bag-of-words models treating word counts as features, which forms a histogram for each document [35], imaging data where pixel intensity is interpreted as mass [26] and results in 2D discrete probability measures over the image grid, and gene expression data that is interpretted as a distribution across a gene network [8, 15]. Optimal transport (OT) theory [30] has recently emerged as a powerful tool to compare probability measures. Qualitatively, OT generates a distance metric between probability measures by minimizing the work needed to move one distribution into another over all transport plans. It has gained significant popularity for applications [4, 26, 27] involving point clouds and probability distributions. OT allows for the computation of distances between distributions by solving a minimization problem over transportation plans. Despite its theoretical elegance and its ability to capture geometric properties of distributions, using vanilla OT is computationally expensive and does not directly integrate into existing machine learning pipelines. For this reason, OT has been somewhat limited in practical applications, particularly in settings that demand scalable and efficient algorithms for tasks such as classification, dimension reduction, and generation.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Massachusetts (0.04)
Anomaly detection using Diffusion-based methods
Bhosale, Aryan, Mukherjee, Samrat, Banerjee, Biplab, Cuzzolin, Fabio
This paper explores the utility of diffusion-based models for anomaly detection, focusing on their efficacy in identifying deviations in both compact and high-resolution datasets. Diffusion-based architectures, including Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs), are evaluated for their performance using reconstruction objectives. By leveraging the strengths of these models, this study benchmarks their performance against traditional anomaly detection methods such as Isolation Forests, One-Class SVMs, and COPOD. The results demonstrate the superior adaptability, scalability, and robustness of diffusion-based methods in handling complex real-world anomaly detection tasks. Key findings highlight the role of reconstruction error in enhancing detection accuracy and underscore the scalability of these models to high-dimensional datasets. Future directions include optimizing encoder-decoder architectures and exploring multi-modal datasets to further advance diffusion-based anomaly detection.
- Asia > India > Maharashtra > Mumbai (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Get3D: NVIDIA's New Generative AI Model For 3D Shapes - AI Summary
Get3D is a new generative AI model from NVIDIA that can create 3D shapes. The model was recently added to NVIDIA's Omniverse platform. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning advancements in a concise and easy-to-understand format. The new model was recently added to NVIDIA's marquee Omniverse platform.. "NVIDIA's Get3D is a Generative AI Model for 3D Shapes" is published by Jesus Rodriguez.
Why Contextual AI Will Shape the Future of Advertising
As marketers hurtle towards a privacy-first future, the industry is being flooded with countless stories about new approaches to consumer engagement that will soon shake up the advertising world. What if we need to go back to basics? What if it's time to return to contextual targeting? Marketing pros would understandably have some concerns. Contextual targeting has been around for a while and fell out of favor with many in the industry.
- Marketing (0.36)
- Media (0.31)
- Leisure & Entertainment (0.31)
This Copyright Lawsuit Could Shape the Future of Generative AI
The tech industry might be reeling from a wave of layoffs, a dramatic crypto-crash, and ongoing turmoil at Twitter, but despite those clouds some investors and entrepreneurs are already eyeing a new boom--built on artificial intelligence that can generate coherent text, captivating images, and functional computer code. But that new frontier has a looming cloud of its own. A class-action lawsuit filed in a federal court in California this month takes aim at GitHub Copilot, a powerful tool that automatically writes working code when a programmer starts typing. The lawsuit is at an early stage, and its prospects are unclear because the underlying technology is novel and has not faced much legal scrutiny. But legal experts say it may have a bearing on the broader trend of generative AI tools. AI programs that generate paintings, photographs, and illustrations from a prompt, as well as text for marketing copy, are all built with algorithms trained on previous work produced by humans.
- Media > Film (1.00)
- Leisure & Entertainment (1.00)