comput
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- North America > United States > California (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Representation Learning via Consistent Assignment of Views over Random Partitions
CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments.
- North America > Canada > Ontario > Toronto (0.14)
- South America > Brazil (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States (0.04)
- Media > Film (0.46)
- Leisure & Entertainment (0.46)
ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation
Zhou, Yuan, Jin, Shilong, Hua, Litao, Lv, Wanjun, Duan, Haoran, Han, Jungong
Abstract--Recent advances in zero-shot text-to-3D generation have revolutionised 3D content creation by enabling direct synthesis from textual descriptions. While state-of-the-art methods leverage 3D Gaussian Splatting with score distillation to enhance multi-view rendering through pre-trained text-to-image (T2I) models, they suffer from inherent prior view biases in T2I Models. These biases lead to inconsistent 3D generation, particularly manifesting as the multi-face Janus problem, where objects exhibit conflicting features across views. T o address this fundamental challenge, we propose ConsDreamer, a novel method that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process: (1) a View Disentanglement Module (VDM) that eliminates viewpoint biases in conditional prompts by decoupling irrelevant view components and injecting precise view control; and (2) a similarity-based partial order loss that enforces geometric consistency in the unconditional term by aligning cosine similarities with azimuth relationships. Extensive experiments demonstrate that ConsDreamer can be seamlessly integrated into various 3D representations and score distillation paradigms, effectively mitigating the multi-face Janus problem. GENERA TION technology plays a crucial role in various fields such as innovative industrial design, game development, and virtual reality. In particular, zero-shot text-to-3D generation [1], [2], [3], [4], [5] aims to generate 3D content without 3D training data, enabling the conversion from concept to reality. However, zero-shot text-to-3D generation tasks [6], [7], [8], [9] are constrained by the inherent complexity of the wild world and the scarcity of 3D data, unlike text-to-image (T2I) tasks [10], [11]. From this perspective, generating high-quality 3D content from text is still a significant challenge.
Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression
Jennings, Roy H., Paikin, Genady, Shaul, Roy, Soloveichik, Evgeny
Multimodal Large Language Models (MLLMs) show promise for image-based regression tasks, but current approaches face key limitations. Recent methods fine-tune MLLMs using preset output vocabularies and generic task-level prompts (e.g., "How would you rate this image?"), assuming this mimics human rating behavior. Our analysis reveals that these approaches provide no benefit over image-only training. Models using preset vocabularies and generic prompts perform equivalently to image-only models, failing to leverage semantic understanding from textual input. We propose Regression via Transformer-Based Classification (RvTC), which replaces vocabulary-constrained classification with a flexible bin-based approach. Unlike approaches that address discretization errors through complex distributional modeling, RvTC eliminates manual vocabulary crafting through straightforward bin increase, achieving state-of-the-art performance on four image assessment datasets using only images. More importantly, we demonstrate that data-specific prompts dramatically improve performance. Unlike generic task descriptions, prompts containing semantic information about specific images enable MLLMs to leverage cross-modal understanding. On the AVA dataset, adding challenge titles to prompts substantially improves our already state-of-the-art image-only baseline. We demonstrate through empirical evidence from the AVA and AGIQA-3k datasets that MLLMs benefit from semantic prompt information, surpassing mere statistical biases. We validate RvTC across two different MLLM architectures, demonstrating consistent improvements and method generalizability.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT
Tan, Zhi Qin, Zhu, Xiatian, Addison, Owen, Li, Yunpeng
Cone-Beam Computed Tomography (CBCT) is a widely used 3D imaging technique in dentistry, providing volumetric information about the anatomical structures of jaws and teeth. Accurate segmentation of these anatomies is critical for clinical applications such as diagnosis and surgical planning, but remains time-consuming and challenging. In this paper, we present U-Mamba2, a new neural network architecture designed for multi-anatomy CBCT segmentation in the context of the ToothFairy3 challenge. U-Mamba2 integrates the Mamba2 state space models into the U-Net architecture, enforcing stronger structural constraints for higher efficiency without compromising performance. In addition, we integrate interactive click prompts with cross-attention blocks, pre-train U-Mamba2 using self-supervised learning, and incorporate dental domain knowledge into the model design to address key challenges of dental anatomy segmentation in CBCT. Extensive experiments, including independent tests, demonstrate that U-Mamba2 is both effective and efficient, securing first place in both tasks of the Toothfairy3 challenge. In Task 1, U-Mamba2 achieved a mean Dice of 0.84, HD95 of 38.17 with the held-out test data, with an average inference time of 40.58s. In Task 2, U-Mamba2 achieved the mean Dice of 0.87 and HD95 of 2.15 with the held-out test data. The code is publicly available at https://github.com/zhiqin1998/UMamba2.
- North America > United States > Tennessee > Davidson County > Nashville (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- (7 more...)
Sepsis Prediction Using Graph Convolutional Networks over Patient-Feature-Value Triplets
Dan, Bozhi, Wu, Di, Xu, Ji, Liu, Xiang, Zhu, Yiziting, Shu, Xin, Li, Yujie, Yi, Bin
In the intensive care setting, sepsis continues to be a major contributor to patient illness and death; however, its timely detection is hindered by the complex, sparse, and heterogeneous nature of electronic health record (EHR) data. We propose Triplet-GCN, a single-branch graph convolutional model that represents each encounter as patient-feature-value triplets, constructs a bipartite EHR graph, and learns patient embeddings via a Graph Convolutional Network (GCN) followed by a lightweight multilayer perceptron (MLP). The pipeline applies type-specific preprocessing -- median imputation and standardization for numeric variables, effect coding for binary features, and mode imputation with low-dimensional embeddings for rare categorical attributes -- and initializes patient nodes with summary statistics, while retaining measurement values on edges to preserve "who measured what and by how much". In a retrospective, multi-center Chinese cohort (N = 648; 70/30 train-test split) drawn from three tertiary hospitals, Triplet-GCN consistently outperforms strong tabular baselines (KNN, SVM, XGBoost, Random Forest) across discrimination and balanced error metrics, yielding a more favorable sensitivity-specificity trade-off and improved overall utility for early warning. These findings indicate that encoding EHR as triplets and propagating information over a patient-feature graph produce more informative patient representations than feature-independent models, offering a simple, end-to-end blueprint for deployable sepsis risk stratification.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)