AITopics

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Neural Information Processing SystemsMar-18-2026, 00:16:22 GMT

IMAGPose: A Unified Conditional Framework for Pose-Guided Person Generation

Diffusion models represent a promising avenue for image generation, having demonstrated competitive performance in pose-guided person image generation. However, existing methods are limited to generating target images from a source image and a target pose, overlooking two critical user scenarios: generating multiple target images with different poses simultaneously and generating target images from multi-view source images.To overcome these limitations, we propose IMAGPose, a unified conditional framework for pose-guided image generation, which incorporates three pivotal modules: a feature-level conditioning (FLC) module, an image-level conditioning (ILC) module, and a cross-view attention (CVA) module. Firstly, the FLC module combines the low-level texture feature from the VAE encoder with the high-level semantic feature from the image encoder, addressing the issue of missing detail information due to the absence of a dedicated person image feature extractor. Then, the ILC module achieves an alignment of images and poses to adapt to flexible and diverse user scenarios by injecting a variable number of source image conditions and introducing a masking strategy.Finally, the CVA module introduces decomposing global and local cross-attention, ensuring local fidelity and global consistency of the person image when multiple source image prompts. The three modules of IMAGPose work together to unify the task of person image generation under various user scenarios.Extensive experiment results demonstrate the consistency and photorealism of our proposed IMAGPose under challenging user scenarios.

artificial intelligence, module, proceedings, (11 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsFeb-7-2026, 18:23:36 GMT

0bd32794b26cfc99214b89313764da8e-Paper-Conference.pdf

artificial intelligence, machine learning, source image, (18 more...)

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Industry: Media (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Neural Information Processing SystemsOct-9-2025, 18:19:55 GMT

0bd32794b26cfc99214b89313764da8e-Paper-Conference.pdf

imagpose, source image, target image, (16 more...)

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Industry: Media (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Neural Information Processing SystemsMay-26-2025, 15:57:20 GMT

IMAGPose: A Unified Conditional Framework for Pose-Guided Person Generation

Diffusion models represent a promising avenue for image generation, having demonstrated competitive performance in pose-guided person image generation. However, existing methods are limited to generating target images from a source image and a target pose, overlooking two critical user scenarios: generating multiple target images with different poses simultaneously and generating target images from multi-view source images.To overcome these limitations, we propose IMAGPose, a unified conditional framework for pose-guided image generation, which incorporates three pivotal modules: a feature-level conditioning (FLC) module, an image-level conditioning (ILC) module, and a cross-view attention (CVA) module. Firstly, the FLC module combines the low-level texture feature from the VAE encoder with the high-level semantic feature from the image encoder, addressing the issue of missing detail information due to the absence of a dedicated person image feature extractor. Then, the ILC module achieves an alignment of images and poses to adapt to flexible and diverse user scenarios by injecting a variable number of source image conditions and introducing a masking strategy.Finally, the CVA module introduces decomposing global and local cross-attention, ensuring local fidelity and global consistency of the person image when multiple source image prompts. The three modules of IMAGPose work together to unify the task of person image generation under various user scenarios.Extensive experiment results demonstrate the consistency and photorealism of our proposed IMAGPose under challenging user scenarios.

artificial intelligence, imagpose, module, (10 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceOct-28-2024

Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion Models

Li, Wenda, Zhang, Huijie, Qu, Qing

Watermarking is a crucial technique for identifying these AI-generated images and preventing their misuse. In this paper, we introduce Shallow Diffuse, a new watermarking technique that embeds robust and invisible watermarks into diffusion model outputs. Unlike existing approaches that integrate watermarking throughout the entire diffusion sampling process, Shallow Diffuse decouples these steps by leveraging the presence of a low-dimensional subspace in the image generation process. This method ensures that a substantial portion of the watermark lies in the null space of this subspace, effectively separating it from the image generation process. Our theoretical and empirical analyses show that this decoupling strategy greatly enhances the consistency of data generation and the detectability of the watermark. Extensive experiments further validate that our Shallow Diffuse outperforms existing watermarking methods in terms of robustness and consistency.

artificial intelligence, machine learning, watermark, (18 more...)

2410.21088

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Michigan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Kumar, Ashok S., Nayak, Nancy, Kalyani, Sheetal, Suraweera, Himal A.

Energy Efficient Fair STAR-RIS for Mobile Users

arXiv.org Artificial IntelligenceJul-9-2024

In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS elements allocated to each user. We then formulate a novel optimization problem by concurrently optimizing the phase shifts of the STAR-RIS and subsurface assignment variable. We leverage the deep reinforcement learning (DRL) technique to address this optimization problem. The DRL model predicts the phase shifts of the STAR-RIS and efficiently allocates elements of STAR-RIS to the users. Additionally, we incorporate a penalty term in the DRL model to facilitate intelligent deactivation of STAR-RIS elements when not in use to enhance energy efficiency. Through extensive experiments, we show that the proposed method can achieve fairly high and nearly equal data rates for all users in both the transmission and reflection spaces in an energy-efficient manner.

energy efficiency, star-ris, subsurface assignment variable, (12 more...)

2407.06868

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Sri Lanka (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report (1.00)

Industry:

Telecommunications (0.85)
Information Technology > Networks (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.55)

arXiv.org Artificial IntelligenceJun-17-2024

MDCR: A Dataset for Multi-Document Conditional Reasoning

Chen, Peter Baile, Zhang, Yi, Liu, Chunwei, Gupta, Sejal, Kim, Yoon, Cafarella, Michael

The same real-life questions posed to different individuals may lead to different answers based on their unique situations. For instance, whether a student is eligible for a scholarship depends on eligibility conditions, such as major or degree required. ConditionalQA was proposed to evaluate models' capability of reading a document and answering eligibility questions, considering unmentioned conditions. However, it is limited to questions on single documents, neglecting harder cases that may require cross-document reasoning and optimization, for example, "What is the maximum number of scholarships attainable?" Such questions over multiple documents are not only more challenging due to more context having to understand, but also because the model has to (1) explore all possible combinations of unmentioned conditions and (2) understand the relationship between conditions across documents, to reason about the optimal outcome. To evaluate models' capability of answering such questions, we propose a new dataset MDCR, which can reflect real-world challenges and serve as a new test bed for complex conditional reasoning that requires optimization. We evaluate this dataset using the most recent LLMs and demonstrate their limitations in solving this task. We believe this dataset will facilitate future research in answering optimization questions with unknown conditions.

condition relationship, conditional answer, user scenario, (14 more...)

2406.11784

Country:

Oceania > Palau (0.04)
Oceania > Micronesia (0.04)
Oceania > Marshall Islands (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Le, Huy Q., Nguyen, Minh N. H., Pandey, Shashi Raj, Zhang, Chaoning, Hong, Choong Seon

CDKT-FL: Cross-Device Knowledge Transfer using Proxy Dataset in Federated Learning

arXiv.org Artificial IntelligenceJun-8-2024

In a practical setting, how to enable robust Federated Learning (FL) systems, both in terms of generalization and personalization abilities, is one important research question. It is a challenging issue due to the consequences of non-i.i.d. properties of client's data, often referred to as statistical heterogeneity, and small local data samples from the various data distributions. Therefore, to develop robust generalized global and personalized models, conventional FL methods need to redesign the knowledge aggregation from biased local models while considering huge divergence of learning parameters due to skewed client data. In this work, we demonstrate that the knowledge transfer mechanism achieves these objectives and develop a novel knowledge distillation-based approach to study the extent of knowledge transfer between the global model and local models. Henceforth, our method considers the suitability of transferring the outcome distribution and (or) the embedding vector of representation from trained models during cross-device knowledge transfer using a small proxy dataset in heterogeneous FL. In doing so, we alternatively perform cross-device knowledge transfer following general formulations as 1) global knowledge transfer and 2) on-device knowledge transfer. Through simulations on three federated datasets, we show the proposed method achieves significant speedups and high personalized performance of local models. Furthermore, the proposed approach offers a more stable algorithm than other baselines during the training, with minimal communication data load when exchanging the trained model's outcomes and representation.

cdkt, dataset, global round, (15 more...)

2204.01542

Country:

Asia > Vietnam > Da Nang > Da Nang (0.14)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.93)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceDec-19-2022

Bridging The Gap: Entailment Fused-T5 for Open-retrieval Conversational Machine Reading Comprehension

Zhang, Xiao, Huang, Heyan, Chi, Zewen, Mao, Xian-Ling

Open-retrieval conversational machine reading comprehension (OCMRC) simulates reallife conversational interaction scenes. Machines are required to make a decision of Yes/No/Inquire or generate a follow-up question when the decision is Inquire based on retrieved rule texts, user scenario, user question, and dialogue history. Recent studies explored the methods to reduce the information gap between decision-making and question generation and thus improve the performance of generation. However, the information gap still exists because these pipeline structures are still limited in decision-making, span extraction, and question rephrasing three stages. Decision-making and generation are reasoning separately, and the entailment reasoning utilized in decision-making is hard to share through all stages. To tackle the above problem, we proposed a novel one-stage endto-end framework, called Entailment Fused-Figure 1: An example in the OCMRC dataset. Given T5 (EFT), to bridge the information gap between the user scenario and user question, machines are decision-making and generation in a required to first retrieve related rule texts in the global understanding manner. The extensive knowledge database, and then make a decision of experimental results demonstrate that our proposed Yes/No/Inquire or generate a follow-up question framework achieves new state-of-the-art when the decision is Inquire based on retrieved rule performance on the OR-ShARC benchmark.

machine learning, natural language, question answering, (20 more...)

2212.09353

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Beijing > Beijing (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education > Assessment & Standards > Student Performance (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.49)