AITopics | Tang, Hao

Collaborating Authors

Tang, Hao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

Yang, Jiancheng, Shi, Rui, Jin, Liang, Huang, Xiaoyang, Kuang, Kaiming, Wei, Donglai, Gu, Shixuan, Liu, Jianying, Liu, Pengfei, Chai, Zhizhong, Xiao, Yongjie, Chen, Hao, Xu, Liming, Du, Bang, Yan, Xiangyi, Tang, Hao, Alessio, Adam, Holste, Gregory, Zhang, Jiapeng, Wang, Xiaoming, He, Jianye, Che, Lixuan, Pfister, Hanspeter, Li, Ming, Ni, Bingbing

arXiv.org Artificial IntelligenceFeb-14-2024

Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2402.09372

Country:

Asia > China (0.94)
Europe (0.67)
North America > United States > Texas > Travis County > Austin (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Orthopedics/Orthopedic Surgery (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement

Zhang, Xiaofeng, Xu, Zishan, Tang, Hao, Gu, Chaochen, Chen, Wei, Zhu, Shanying, Guan, Xinping

arXiv.org Artificial IntelligenceFeb-1-2024

Low-light image enhancement is a crucial visual task, and many unsupervised methods tend to overlook the degradation of visible information in low-light scenes, which adversely affects the fusion of complementary information and hinders the generation of satisfactory results. To address this, our study introduces "Enlighten-Your-Voice", a multimodal enhancement framework that innovatively enriches user interaction through voice and textual commands. This approach does not merely signify a technical leap but also represents a paradigm shift in user engagement. Our model is equipped with a Dual Collaborative Attention Module (DCAM) that meticulously caters to distinct content and color discrepancies, thereby facilitating nuanced enhancements. Complementarily, we introduce a Semantic Feature Fusion (SFM) plug-and-play module that synergizes semantic context with low-light enhancement operations, sharpening the algorithm's efficacy. Crucially, "Enlighten-Your-Voice" showcases remarkable generalization in unsupervised zero-shot scenarios. The source code can be accessed from https://github.com/zhangbaijin/Enlighten-Your-Voice

artificial intelligence, human computer interaction, meet zero-shot low-light image enhancement, (1 more...)

arXiv.org Artificial Intelligence

2312.10109

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (0.60)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

Liu, Oli, Tang, Hao, Goldwater, Sharon

arXiv.org Artificial IntelligenceDec-11-2023

Self-supervised speech representations are known to encode both In this work, we explicitly investigate how speaker and speaker and phonetic information, but how they are distributed phonetic information are distributed in the representation space in the high-dimensional space remains largely unexplored. We learned by SSL models. We hypothesize that a good representation hypothesize that they are encoded in orthogonal subspaces, a (one that is efficient and works well for predicting speech) property that lends itself to simple disentanglement. Applying should implicitly disentangle these two sources of information, principal component analysis to representations of two predictive since they vary independently in the processes that generate the coding models, we identify two subspaces that capture speaker speech signal. If so, then the two types of information would be and phonetic variances, and confirm that they are nearly orthogonal.

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.12464

Country: Oceania > Australia > Queensland (0.14)

Genre: Research Report (0.64)

Industry: Law > Litigation (0.42)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Bipartite Graph Diffusion Model for Human Interaction Generation

Chopin, Baptiste, Tang, Hao, Daoudi, Mohamed

arXiv.org Artificial IntelligenceNov-3-2023

The generation of natural human motion interactions is a hot topic in computer vision and computer animation. It is a challenging task due to the diversity of possible human motion interactions. Diffusion models, which have already shown remarkable generative capabilities in other domains, are a good candidate for this task. In this paper, we introduce a novel bipartite graph diffusion method (BiGraphDiff) to generate human motion interactions between two persons. Specifically, bipartite node sets are constructed to model the inherent geometric constraints between skeleton nodes during interactions. The interaction graph diffusion model is transformer-based, combining some state-of-the-art motion methods. We show that the proposed achieves new state-of-the-art results on leading benchmarks for the human interaction generation task.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.10134

Country:

Europe > France (0.28)
North America > United States > Louisiana (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

MelHuBERT: A simplified HuBERT on Mel spectrograms

Lin, Tzu-Quan, Lee, Hung-yi, Tang, Hao

arXiv.org Artificial IntelligenceOct-27-2023

Self-supervised models have had great success in learning speech representations that can generalize to various downstream tasks. However, most self-supervised models require a large amount of compute and multiple GPUs to train, significantly hampering the development of self-supervised learning. In an attempt to reduce the computation of training, we revisit the training of HuBERT, a highly successful self-supervised model. We improve and simplify several key components, including the loss function, input representation, and training in multiple stages. Our model, MelHuBERT, is able to achieve favorable performance on phone recognition, speaker identification, and automatic speech recognition against HuBERT, while saving 31.2% of the pre-training time, or equivalently 33.5% MACs per one second speech. The code and pre-trained models are available in https://github.com/nervjack2/MelHuBERT.

artificial intelligence, hubert, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.09944

Country: Europe > United Kingdom (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Towards Matching Phones and Speech Representations

Yang, Gene-Ping, Tang, Hao

arXiv.org Artificial IntelligenceOct-26-2023

Learning phone types from phone instances has been a long-standing problem, while still being open. In this work, we revisit this problem in the context of self-supervised learning, and pose it as the problem of matching cluster centroids to phone embeddings. We study two key properties that enable matching, namely, whether cluster centroids of self-supervised representations reduce the variability of phone instances and respect the relationship among phones. We then use the matching result to produce pseudo-labels and introduce a new loss function for improving self-supervised representations. Our experiments show that the matching result captures the relationship among phones. Training the new loss function jointly with the regular self-supervised losses, such as APC and CPC, significantly improves the downstream phone classification.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2310.17558

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Blind quantum machine learning with quantum bipartite correlator

Li, Changhao, Li, Boning, Amer, Omar, Shaydulin, Ruslan, Chakrabarti, Shouvanik, Wang, Guoqing, Xu, Haowei, Tang, Hao, Schoch, Isidor, Kumar, Niraj, Lim, Charles, Li, Ju, Cappellaro, Paola, Pistoia, Marco

arXiv.org Artificial IntelligenceOct-19-2023

Distributed quantum computing is a promising computational paradigm for performing computations that are beyond the reach of individual quantum devices. Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes. In this work, we introduce novel blind quantum machine learning protocols based on the quantum bipartite correlator algorithm. Our protocols have reduced communication overhead while preserving the privacy of data from untrusted parties. We introduce robust algorithm-specific privacy-preserving mechanisms with low computational overhead that do not require complex cryptographic techniques. We then validate the effectiveness of the proposed protocols through complexity and privacy analysis. Our findings pave the way for advancements in distributed quantum computing, opening up new possibilities for privacy-aware machine learning applications in the era of quantum technologies.

artificial intelligence, machine learning, server, (19 more...)

arXiv.org Artificial Intelligence

2310.12893

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Does Graph Distillation See Like Vision Dataset Counterpart?

Yang, Beining, Wang, Kai, Sun, Qingyun, Ji, Cheng, Fu, Xingcheng, Tang, Hao, You, Yang, Li, Jianxin

arXiv.org Artificial IntelligenceOct-13-2023

Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have attracted increasing concerns. Existing graph condensation methods primarily focus on optimizing the feature matrices of condensed graphs while overlooking the impact of the structure information from the original graphs. To investigate the impact of the structure information, we conduct analysis from the spectral domain and empirically identify substantial Laplacian Energy Distribution (LED) shifts in previous works. Such shifts lead to poor performance in cross-architecture generalization and specific tasks, including anomaly detection and link prediction. In this paper, we propose a novel Structure-broadcasting Graph Dataset Distillation (SGDD) scheme for broadcasting the original structure information to the generation of the synthetic one, which explicitly prevents overlooking the original structure information. Theoretically, the synthetic graphs by SGDD are expected to have smaller LED shifts than previous works, leading to superior performance in both cross-architecture settings and specific tasks. We validate the proposed SGDD across 9 datasets and achieve state-of-the-art results on all of them: for example, on the YelpChi dataset, our approach maintains 98.6% test accuracy of training on the original graph dataset with 1,000 times saving on the scale of the graph. Moreover, we empirically evaluate there exist 17.6% ~ 31.4% reductions in LED shift crossing 9 datasets. Extensive experiments and analysis verify the effectiveness and necessity of the proposed designs. The code is available in the GitHub repository: https://github.com/RingBDStack/SGDD.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2310.09192

Country:

Europe (1.00)
North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

3D-Aware Video Generation

Bahmani, Sherwin, Park, Jeong Joon, Paschalidou, Despoina, Tang, Hao, Wetzstein, Gordon, Guibas, Leonidas, Van Gool, Luc, Timofte, Radu

arXiv.org Artificial IntelligenceAug-9-2023

Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.

artificial intelligence, computer vision, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2206.14797

Country:

Europe (0.46)
Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report (1.00)

Industry:

Media (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Enlighten Anything: When Segment Anything Model Meets Low-Light Image Enhancement

Zhao, Qihan, Zhang, Xiaofeng, Tang, Hao, Gu, Chaochen, Zhu, Shanying

arXiv.org Artificial IntelligenceJul-31-2023

Image restoration is a low-level visual task, and most CNN methods are designed as black boxes, lacking transparency and intrinsic aesthetics. Many unsupervised approaches ignore the degradation of visible information in low-light scenes, which will seriously affect the aggregation of complementary information and also make the fusion algorithm unable to produce satisfactory fusion results under extreme conditions. In this paper, we propose Enlighten-anything, which is able to enhance and fuse the semantic intent of SAM segmentation with low-light images to obtain fused images with good visual perception. The generalization ability of unsupervised learning is greatly improved, and experiments on LOL dataset are conducted to show that our method improves 3db in PSNR over baseline and 8 in SSIM. Zero-shot learning of SAM introduces a powerful aid for unsupervised low-light enhancement. The source code of Enlighten Anything can be obtained from https://github.com/zhangbaijin/enlighten-anything

artificial intelligence, information management, model meet low-light image enhancement

arXiv.org Artificial Intelligence

2306.10286

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback