atch
SlotMatch: Distilling Object-Centric Representations for Unsupervised Video Segmentation
Grigore, Diana-Nicoleta, Madan, Neelu, Mogelmose, Andreas, Moeslund, Thomas B., Ionescu, Radu Tudor
Unsupervised video segmentation is a challenging computer vision task, especially due to the lack of supervisory signals coupled with the complexity of visual scenes. To overcome this challenge, state-of-the-art models based on slot attention often have to rely on large and computationally expensive neural architectures. To this end, we propose a simple knowledge distillation framework that effectively transfers object-centric representations to a lightweight student. The proposed framework, called SlotMatch, aligns corresponding teacher and student slots via the cosine similarity, requiring no additional distillation objectives or auxiliary supervision. The simplicity of SlotMatch is confirmed via theoretical and empirical evidence, both indicating that integrating additional losses is redundant. We conduct experiments on three datasets to compare the state-of-the-art teacher model, SlotContrast, with our distilled student. The results show that our student based on SlotMatch matches and even outperforms its teacher, while using 3.6x less parameters and running up to 2.7x faster. Moreover, our student surpasses all other state-of-the-art unsupervised video segmentation models.
- Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
- Europe > Denmark > North Jutland > Aalborg (0.04)
CleverCatch: A Knowledge-Guided Weak Supervision Model for Fraud Detection
Mozafari, Amirhossein, Hashemi, Kourosh, Shafagh, Erfan, Motamedi, Soroush, Tayebi, Azar Taheri, Tayebi, Mohammad A.
Healthcare fraud detection remains a critical challenge due to limited availability of labeled data, constantly evolving fraud tactics, and the high dimensionality of medical records. Traditional supervised methods are challenged by extreme label scarcity, while purely unsupervised approaches often fail to capture clinically meaningful anomalies. In this work, we introduce CleverCatch, a knowledge-guided weak supervision model designed to detect fraudulent prescription behaviors with improved accuracy and interpretability. Our approach integrates structured domain expertise into a neural architecture that aligns rules and data samples within a shared embedding space. By training encoders jointly on synthetic data representing both compliance and violation, CleverCatch learns soft rule embeddings that generalize to complex, real-world datasets. This hybrid design enables data-driven learning to be enhanced by domain-informed constraints, bridging the gap between expert heuristics and machine learning. Experiments on the large-scale real-world dataset demonstrate that CleverCatch outperforms four state-of-the-art anomaly detection baselines, yielding average improvements of 1.3\% in AUC and 3.4\% in recall. Our ablation study further highlights the complementary role of expert rules, confirming the adaptability of the framework. The results suggest that embedding expert rules into the learning process not only improves detection accuracy but also increases transparency, offering an interpretable approach for high-stakes domains such as healthcare fraud detection.
- North America > United States > New York > Albany County > Albany (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Ontario > Niagara Region > St. Catharines (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
Hughes, Anthony, Duddu, Vasisht, Asokan, N., Aletras, Nikolaos, Ma, Ning
Language models (LMs) may memorize personally identifiable information (PII) from training data, enabling adversaries to extract it during inference. Existing defense mechanisms such as differential privacy (DP) reduce this leakage, but incur large drops in utility. Based on a comprehensive study using circuit discovery to identify the computational circuits responsible PII leakage in LMs, we hypothesize that specific PII leakage circuits in LMs should be responsible for this behavior. Therefore, we propose PATCH (Privacy-Aware Targeted Circuit PatcHing), a novel approach that first identifies and subsequently directly edits PII circuits to reduce leakage. PATCH achieves better privacy-utility trade-off than existing defenses, e.g., reducing recall of PII leakage from LMs by up to 65%. Finally, PATCH can be combined with DP to reduce recall of residual leakage of an LM to as low as 0.01%. Our analysis shows that PII leakage circuits persist even after the application of existing defense mechanisms. In contrast, PATCH can effectively mitigate their impact.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (12 more...)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)
CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP
An, Na Min, Kang, Inha, Lee, Minhyun, Shim, Hyunjung
Spatial grounding is crucial for referring image segmentation (RIS), where the goal of the task is to localize an object described by language. Current foundational vision-language models (VLMs), such as CLIP, excel at aligning images and text but struggle with understanding spatial relationships. Within the language stream, most existing methods often focus on the primary noun phrase when extracting local text features, undermining contextual tokens. Within the vision stream, CLIP generates similar features for images with different spatial layouts, resulting in limited sensitivity to spatial structure. To address these limitations, we propose \textsc{CoPatch}, a zero-shot RIS framework that leverages internal model components to enhance spatial representations in both text and image modalities. For language, \textsc{CoPatch} constructs hybrid text features by incorporating context tokens carrying spatial cues. For vision, it extracts patch-level image features using our novel path discovered from intermediate layers, where spatial structure is better preserved. These enhanced features are fused into a clustered image-text similarity map, \texttt{CoMap}, enabling precise mask selection. As a result, \textsc{CoPatch} significantly improves spatial grounding in zero-shot RIS across RefCOCO, RefCOCO+, RefCOCOg, and PhraseCut (+ 2--7 mIoU) without requiring any additional training. Our findings underscore the importance of recovering and leveraging the untapped spatial knowledge inherently embedded in VLMs, thereby paving the way for opportunities in zero-shot RIS.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Singapore (0.04)
- (3 more...)
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Chen, Zhaorun, Pinto, Francesco, Pan, Minzhou, Li, Bo
With the rise of generative AI and rapid growth of high-quality video generation, video guardrails have become more crucial than ever to ensure safety and security across platforms. Current video guardrails, however, are either overly simplistic, relying on pure classification models trained on simple policies with limited unsafe categories, which lack detailed explanations, or prompting multimodal large language models (MLLMs) with long safety guidelines, which are inefficient and impractical for guardrailing real-world content. To bridge this gap, we propose SafeWatch, an efficient MLLM-based video guardrail model designed to follow customized safety policies and provide multi-label video guardrail outputs with content-specific explanations in a zero-shot manner. In particular, unlike traditional MLLM-based guardrails that encode all safety policies autoregressively, causing inefficiency and bias, SafeWatch uniquely encodes each policy chunk in parallel and eliminates their position bias such that all policies are attended simultaneously with equal importance. In addition, to improve efficiency and accuracy, SafeWatch incorporates a policy-aware visual token pruning algorithm that adaptively selects the most relevant video tokens for each policy, discarding noisy or irrelevant information. This allows for more focused, policy-compliant guardrail with significantly reduced computational overhead. Considering the limitations of existing video guardrail benchmarks, we propose SafeWatch-Bench, a large-scale video guardrail benchmark comprising over 2M videos spanning six safety categories which covers over 30 tasks to ensure a comprehensive coverage of all potential safety scenarios. SafeWatch outperforms SOTA by 28.2% on SafeWatch-Bench, 13.6% on benchmarks, cuts costs by 10%, and delivers top-tier explanations validated by LLM and human reviews.
- South America (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hua, Hang, Shi, Jing, Kafle, Kushal, Jenni, Simon, Zhang, Daoan, Collomosse, John, Cohen, Scott, Luo, Jiebo
Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoning for VLMs, current models often struggle to effectively and precisely capture the compositional information on both the image and text sides. To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction. This benchmark introduces a novel task for boosting and evaluating the VLMs' compositionality for aspect-based fine-grained text and image matching. In this task, models are required to identify mismatched aspect phrases within a caption, determine the aspect's class, and propose corrections for an image-text pair that may contain between 0 and 3 mismatches. To evaluate the models' performance on this new task, we propose a new evaluation metric named ITM-IoU for which our experiments show a high correlation to human evaluation. In addition, we also provide a comprehensive experimental analysis of existing mainstream VLMs, including fully supervised learning and in-context learning settings. We have found that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches. Moreover, models (e.g., GPT-4V, Gemini Pro Vision) with strong abilities to perform multimodal in-context learning are not as skilled at fine-grained compositional image and text matching analysis. With FineMatch, we are able to build a system for text-to-image generation hallucination detection and correction.
McCatch: Scalable Microcluster Detection in Dimensional and Nondimensional Datasets
Vinces, Braulio V. Sánchez, Cordeiro, Robson L. F., Faloutsos, Christos
How could we have an outlier detector that works even with nondimensional data, and ranks together both singleton microclusters ('one-off' outliers) and nonsingleton microclusters by their anomaly scores? How to obtain scores that are principled in one scalable and 'hands-off' manner? Microclusters of outliers indicate coalition or repetition in fraud activities, etc.; their identification is thus highly desirable. This paper presents McCatch: a new algorithm that detects microclusters by leveraging our proposed 'Oracle' plot (1NN Distance versus Group 1NN Distance). We study 31 real and synthetic datasets with up to 1M data elements to show that McCatch is the only method that answers both of the questions above; and, it outperforms 11 other methods, especially when the data has nonsingleton microclusters or is nondimensional. We also showcase McCatch's ability to detect meaningful microclusters in graphs, fingerprints, logs of network connections, text data, and satellite imagery. For example, it found a 30-elements microcluster of confirmed 'Denial of Service' attacks in the network logs, taking only ~3 minutes for 222K data elements on a stock desktop.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (7 more...)
Gigs with Guarantees: Achieving Fair Wage for Food Delivery Workers
Nair, Ashish, Yadav, Rahul, Gupta, Anjali, Chakraborty, Abhijnan, Ranu, Sayan, Bagchi, Amitabha
With the increasing popularity of food delivery platforms, it has become pertinent to look into the working conditions of the 'gig' workers in these platforms, especially providing them fair wages, reasonable working hours, and transparency on work availability. However, any solution to these problems must not degrade customer experience and be cost-effective to ensure that platforms are willing to adopt them. We propose WORK4FOOD, which provides income guarantees to delivery agents, while minimizing platform costs and ensuring customer satisfaction. WORK4FOOD ensures that the income guarantees are met in such a way that it does not lead to increased working hours or degrade environmental impact. To incorporate these objectives, WORK4FOOD balances supply and demand by controlling the number of agents in the system and providing dynamic payment guarantees to agents based on factors such as agent location, ratings, etc. We evaluate WORK4FOOD on a real-world dataset from a leading food delivery platform and establish its advantages over the state of the art in terms of the multi-dimensional objectives at hand.
- Asia > India > West Bengal > Kolkata (0.04)
- Asia > Middle East > Republic of Türkiye (0.04)
- Asia > China (0.04)
- Transportation > Freight & Logistics Services (1.00)
- Information Technology > Services (0.92)
- Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.92)
- Banking & Finance > Economy (0.68)
FutureMatch: Combining Human Value Judgments and Machine Learning to Match in Dynamic Environments
Dickerson, John P. (Carnegie Mellon University) | Sandholm, Tuomas (Carnegie Mellon University)
The preferred treatment for kidney failure is a transplant; however, demand for donor kidneys far outstrips supply. Kidney exchange, an innovation where willing but incompatible patient-donor pairs can exchange organs- — via barter cycles and altruist-initiated chains —provides a life-saving alternative.Typically, fielded exchanges act myopically, considering only the current pool of pairs when planning the cycles and chains. Yet kidney exchange is inherently dynamic, with participants arriving and departing. Also, many planned exchange transplants do not go to surgery due to various failures. So, it is important to consider the future when matching. Motivated by our experience running the computational side of a large nationwide kidney exchange, we present FutureMatch, a framework for learning to match in a general dynamic model. FutureMatch takes as input a high-level objective (e.g., "maximize graft survival of transplants over time'') decided on by experts, then automatically (i) learns based on data how to make this objective concrete and (ii) learns the ``means'' to accomplish this goal — a task, in our experience, that humans handle poorly. It uses data from all live kidney transplants in the US since 1987 to learn the quality of each possible match; it then learns the potentials of elements of the current input graph offline (e.g., potentials of pairs based on features such as donor and patient blood types), translates these to weights, and performs a computationally feasible batch matching that incorporates dynamic, failure-aware considerations through the weights. We validate FutureMatch on real fielded exchange data. It results in higher values of the objective. Furthermore, even under economically inefficient objectives that enforce equity, it yields better solutions for the efficient objective (which does not incorporate equity) than traditional myopic matching that uses the efficiency objective.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)