AITopics

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Neural Information Processing SystemsFeb-7-2026, 22:45:07 GMT

26d4b4313a7e5828856bc0791fca39a2-Supplemental.pdf

log 1, probability, transition, (12 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Neural Information Processing SystemsFeb-7-2026, 22:45:03 GMT

26d4b4313a7e5828856bc0791fca39a2-Paper.pdf

phase transition, probability, transition, (11 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.52)

arXiv.org Artificial IntelligenceJan-8-2025

Supervision-free Vision-Language Alignment

Giannone, Giorgio, Li, Ruoteng, Feng, Qianli, Perevodchikov, Evgeny, Chen, Rui, Martinez, Aleix

Vision-language models (VLMs) have demonstrated remarkable potential in integrating visual and linguistic information, but their performance is often constrained by the need for extensive, high-quality image-text training data. Curation of these image-text pairs is both time-consuming and computationally expensive. To address this challenge, we introduce SVP (Supervision-free Visual Projection), a novel framework that enhances vision-language alignment without relying on curated data or preference annotation. SVP leverages self-captioning and a pre-trained grounding model as a feedback mechanism to elicit latent information in VLMs. We evaluate our approach across six key areas: captioning, referring, visual question answering, multitasking, hallucination control, and object recall. Results demonstrate significant improvements, including a 14% average improvement in captioning tasks, up to 12% increase in object recall, and substantial reduction in hallucination rates. Notably, a small VLM using SVP achieves hallucination reductions comparable to a model five times larger, while a VLM with initially poor referring capabilities more than doubles its performance, approaching parity with a model twice its size.

large language model, llava-1, machine learning, (20 more...)

2501.04568

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Manitoba > Westman Region > Brandon (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports > Tennis (0.93)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-12-2024

Selective Visual Prompting in Vision Mamba

Yao, Yifeng, Liu, Zichen, Cui, Zhenyu, Peng, Yuxin, Zhou, Jiahuan

Pre-trained Vision Mamba (Vim) models have demonstrated exceptional performance across various computer vision tasks in a computationally efficient manner, attributed to their unique design of selective state space models. To further extend their applicability to diverse downstream vision tasks, Vim models can be adapted using the efficient fine-tuning technique known as visual prompting. However, existing visual prompting methods are predominantly tailored for Vision Transformer (ViT)-based models that leverage global attention, neglecting the distinctive sequential token-wise compression and propagation characteristics of Vim. Specifically, existing prompt tokens prefixed to the sequence are insufficient to effectively activate the input and forget gates across the entire sequence, hindering the extraction and propagation of discriminative information. To address this limitation, we introduce a novel Selective Visual Prompting (SVP) method specifically for the efficient fine-tuning of Vim. To prevent the loss of discriminative information during state space propagation, SVP employs lightweight selective prompters for token-wise prompt generation, ensuring adaptive activation of the update and forget gates within Mamba blocks to promote discriminative information propagation. Moreover, considering that Vim propagates both shared cross-layer information and specific inner-layer information, we further refine SVP with a dual-path structure: Cross-Prompting and Inner-Prompting. Cross-Prompting utilizes shared parameters across layers, while Inner-Prompting employs distinct parameters, promoting the propagation of both shared and specific information, respectively. Extensive experimental results on various large-scale benchmarks demonstrate that our proposed SVP significantly outperforms state-of-the-art methods. Our code is available at https://github.com/zhoujiahuan1991/AAAI2025-SVP.

artificial intelligence, information, machine learning, (15 more...)

2412.08947

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsMar-15-2024, 15:43:05 GMT

Guaranteed Rank Minimization via Singular Value Projection

Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem.

algorithm, matrix completion, svp, (13 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

arXiv.org Artificial IntelligenceJul-13-2023

RVD: A Handheld Device-Based Fundus Video Dataset for Retinal Vessel Segmentation

Khan, MD Wahiduzzaman, Sheng, Hongwei, Zhang, Hu, Du, Heming, Wang, Sen, Coroneo, Minas Theodore, Hajati, Farshid, Shariflou, Sahar, Kalloniatis, Michael, Phu, Jack, Agar, Ashish, Huang, Zi, Golzan, Mojtaba, Yu, Xin

Retinal vessel segmentation is generally grounded in image-based datasets collected with bench-top devices. The static images naturally lose the dynamic characteristics of retina fluctuation, resulting in diminished dataset richness, and the usage of bench-top devices further restricts dataset scalability due to its limited accessibility. Considering these limitations, we introduce the first video-based retinal dataset by employing handheld devices for data acquisition. The dataset comprises 635 smartphone-based fundus videos collected from four different clinics, involving 415 patients from 50 to 75 years old. It delivers comprehensive and precise annotations of retinal structures in both spatial and temporal dimensions, aiming to advance the landscape of vasculature segmentation. Specifically, the dataset provides three levels of spatial annotations: binary vessel masks for overall retinal structure delineation, general vein-artery masks for distinguishing the vein and artery, and fine-grained vein-artery masks for further characterizing the granularities of each artery and vein. In addition, the dataset offers temporal annotations that capture the vessel pulsation characteristics, assisting in detecting ocular diseases that require fine-grained recognition of hemodynamic fluctuation. In application, our dataset exhibits a significant domain shift with respect to data captured by bench-top devices, thus posing great challenges to existing methods. In the experiments, we provide evaluation metrics and benchmark results on our dataset, reflecting both the potential and challenges it offers for vessel segmentation tasks. We hope this challenging dataset would significantly contribute to the development of eye disease diagnosis and early prevention.

artificial intelligence, machine learning, segmentation, (18 more...)

2307.06577

Country:

North America > United States (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Kaushik, Chiraag, McRae, Andrew D., Davenport, Mark A., Muthukumar, Vidya

New Equivalences Between Interpolation and SVMs: Kernels and Structured Features

arXiv.org Artificial IntelligenceMay-3-2023

Recent empirical and theoretical efforts in supervised machine learning have discovered a wide range of surprising phenomena that arise in the modern overparameterized regime (i.e., where the number of free parameters in the model is much larger than the number of training examples [13, 6]). For example, after it was observed that deep neural networks can perfectly fit noisy training data and still generalise well to new data (see, e.g., [35, 43]), several theoretical efforts have demonstrated that this "harmless interpolation" phenomenon can in fact occur even in the simpler settings of linear and kernel regression[8, 7, 5]. Aseparate, but equally surprising observation in this overparameterized regime is that training procedures that optimize different loss functions can still yield similar test performance. For example, the empirical studies of [36, 22, 26, 16] demonstrate that kernel machines and deep neural networks trained using the squared loss, which is traditionally reserved for regression problems with continuous labels, can result in comparable classification performance to those trained with the more popular cross-entropy loss. Motivated by these observations, recent work has sought to deepen theoretical understanding of the impact of the loss function in overparameterized classification tasks, starting with linear models.

artificial intelligence, deep learning, machine learning, (19 more...)

2305.02304

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Massachusetts (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Hawley, Scott H., Morrison, Andrew C.

ConvNets for Counting: Object Detection of Transient Phenomena in Steelpan Drums

arXiv.org Artificial IntelligenceSep-6-2021

We train an object detector built from convolutional neural networks to count interference fringes in elliptical antinode regions in frames of high-speed video recordings of transient oscillations in Caribbean steelpan drums illuminated by electronic speckle pattern interferometry (ESPI). The annotations provided by our model aim to contribute to the understanding of time-dependent behavior in such drums by tracking the development of sympathetic vibration modes. The system is trained on a dataset of crowdsourced human-annotated images obtained from the Zooniverse Steelpan Vibrations Project. Due to the small number of human-annotated images and the ambiguity of the annotation task, we also evaluate the model on a large corpus of synthetic images whose properties have been matched to the real images by style transfer using a Generative Adversarial Network. Applying the model to thousands of unlabeled video frames, we measure oscillations consistent with audio recordings of these drum strikes. One unanticipated result is that sympathetic oscillations of higher-octave notes significantly precede the rise in sound intensity of the corresponding second harmonic tones; the mechanism responsible for this remains unidentified. This paper primarily concerns the development of the predictive model; further exploration of the steelpan images and deeper physical insights await its further application.

annotation, antinode, ring count, (17 more...)