Gao, Jian
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts
Du, Wenzhuo, Wang, Gerun, Chen, Guancheng, Zhao, Hang, Li, Xin, Gao, Jian
With the exponential growth of user-generated content on video-sharing platforms, the challenge of facilitating efficient searching and browsing of videos has garnered significant attention. To enhance users' ability to swiftly locate and review pertinent videos, the creation of concise and informative video summaries has become increasingly important. Video-llama is an effective tool for generating video summarization, but it cannot effectively unify and optimize the modeling of temporal and spatial features and requires a lot of computational resources and time. Therefore, we propose MiLoRA-ViSum to more efficiently capture complex temporal dynamics and spatial relationships inherent in video data and to control the number of parameters for training. By extending traditional Low-Rank Adaptation (LoRA) into a sophisticated mixture-of-experts paradigm, MiLoRA-ViSum incorporates a dual temporal-spatial adaptation mechanism tailored specifically for video summarization tasks. This approach dynamically integrates specialized LoRA experts, each fine-tuned to address distinct temporal or spatial dimensions. Extensive evaluations of the VideoXum and ActivityNet datasets demonstrate that MiLoRA-ViSum achieves the best summarization performance compared to state-of-the-art models, while maintaining significantly lower computational costs. The proposed mixture-of-experts strategy, combined with the dual adaptation mechanism, highlights the model's potential to enhance video summarization capabilities, particularly in large-scale applications requiring both efficiency and precision.
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies
Gao, Jian, Cao, Weidong, Yang, Junyi, Zhang, Xuan
The massive and large-scale design of foundational semiconductor integrated circuits (ICs) is crucial to sustaining the advancement of many emerging and future technologies, such as generative AI, 5G/6G, and quantum computing. Excitingly, recent studies have shown the great capabilities of foundational models in expediting the design of digital ICs. Y et, applying generative AI techniques to accelerate the design of analog ICs remains a significant challenge due to critical domain-specific issues, such as the lack of a comprehensive dataset and effective representation methods for analog circuits. This paper proposes, AnalogGenie, a Gen erat i ve e ngine for automatic design/discovery of Analog circuit topologies-the most challenging and creative task in the conventional manual design flow of analog ICs. Experimental results show the remarkable generation performance of AnalogGenie in broadening the variety of analog ICs, increasing the number of devices within a single design, and discovering unseen circuit topologies far beyond any prior arts. Our work paves the way to transform the longstanding time-consuming manual design flow of analog ICs to an automatic and massive manner powered by generative AI. Semiconductor integrated circuits (ICs) are the foundational hardware cornerstone to advance many emerging technologies such as generative AI, 5G/6G, and quantum computing. The demand for and the scale of ICs are soaring to unprecedented levels with the ever-increasing information and computing workloads (e.g., training foundation models with billions of parameters) (Achiam et al., 2023). Thus, accelerating the design of advanced ICs is a key to sustaining the development of future technologies. Excitingly, recent breakthroughs in generative AI have presented transformative opportunities to expedite the conventional design flows of ICs. As an example, NVIDIA's ChipNeMo (Liu et al., 2023a), a powerful domain-adapted LLM, can rapidly generate valuable digital designs with just a few prompts.
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Gao, Jian, Zhang, Xiao, Wu, Ji, Li, Miao
Causal language models acquire vast amount of knowledge from general text corpus during pretraining, but the efficiency of knowledge learning is known to be unsatisfactory, especially when learning from knowledge-dense and small-sized corpora. The deficiency can come from long-distance dependencies which are hard to capture by language models, and overfitting to co-occurrence patterns and distracting clues in the training text. To address these issues, the paper proposes a method to enhance knowledge learning during language model pretraining, by enhancing elusive but important clues in text discovered by the language model themselves. We found that larger language models pay more attention to non-obvious but important clues, which are often overlooked by smaller language models. Therefore, we can identify these clues by contrasting the attention weights of large and small language models. We use the identified clues as a guide to perform token-dropout data augmentation on the training text, and observed a significant boost in both small and large models' performance in fact memorization. This shows that the behavior contrast between more and less-performant language models contains important clues for knowledge learning, and it can be ``amplified" for a straight-forward improvement in knowledge learning efficiency.
Making Task-Oriented Dialogue Datasets More Natural by Synthetically Generating Indirect User Requests
Mannekote, Amogh, Nam, Jinseok, Li, Ziming, Gao, Jian, Boyer, Kristy Elizabeth, Dorr, Bonnie J.
Indirect User Requests (IURs), such as "It's cold in here" instead of "Could you please increase the temperature?" are common in human-human task-oriented dialogue and require world knowledge and pragmatic reasoning from the listener. While large language models (LLMs) can handle these requests effectively, smaller models deployed on virtual assistants often struggle due to resource constraints. Moreover, existing task-oriented dialogue benchmarks lack sufficient examples of complex discourse phenomena such as indirectness. To address this, we propose a set of linguistic criteria along with an LLM-based pipeline for generating realistic IURs to test natural language understanding (NLU) and dialogue state tracking (DST) models before deployment in a new domain. We also release IndirectRequests, a dataset of IURs based on the Schema Guided Dialog (SGD) corpus, as a comparative testbed for evaluating the performance of smaller models in handling indirect requests.
Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge
Nan, Yang, Xing, Xiaodan, Wang, Shiyi, Tang, Zeyu, Felder, Federico N, Zhang, Sheng, Ledda, Roberta Eufrasia, Ding, Xiaoliu, Yu, Ruiqi, Liu, Weiping, Shi, Feng, Sun, Tianyang, Cao, Zehong, Zhang, Minghui, Gu, Yun, Zhang, Hanxiao, Gao, Jian, Tang, Wen, Yu, Pengxin, Kang, Han, Chen, Junqiang, Lu, Xing, Zhang, Boyu, Mamalakis, Michail, Prinzi, Francesco, Carlini, Gianluca, Cuneo, Lisa, Banerjee, Abhirup, Xing, Zhaohu, Zhu, Lei, Mesbah, Zacharia, Jain, Dhruv, Mayet, Tsiry, Yuan, Hongyu, Lyu, Qing, Wells, Athol, Walsh, Simon LF, Yang, Guang
Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers.
An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning
Yang, Xubo, Gao, Jian, Wang, Ting, He, Yaozhen
Implementing intelligent control of robots is a difficult task, especially when dealing with complex black-box systems, because of the lack of visibility and understanding of how these robots work internally. This paper proposes an Intelligent Social Learning (ISL) algorithm to enable intelligent control of black-box robotic systems. Inspired by mutual learning among individuals in human social groups, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use the Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.
Human-AI Interactions and Societal Pitfalls
Castro, Francisco, Gao, Jian, Martin, Sรฉbastien
Generative artificial intelligence (AI) systems, particularly large language models (LLMs), have improved at a rapid pace. For example, ChatGPT recently showcased its advanced capacity to perform complex tasks and human-like behaviors (OpenAI 2023b), reaching 100 million users within two months of its 2022 launch (Hu 2023). This progress is not limited to text generation, as demonstrated by other recent generative AI systems such as Midjourney (Midjourney 2023) (a text-to-image generative AI) and GitHub Copilot (Github 2023) (an AI pair programmer that can autocomplete code). Eloundou et al. (2023) estimated that about 80% of the U.S. workforce could be affected by the introduction of LLMs, and 19% of the workers may have at least 50% of their tasks impacted. In particular, AI can make users more productive by generating complex content in seconds, while users can simply communicate their preferences. For example, Noy and Zhang (2023) highlighted that ChatGPT can substantially improve productivity in writing tasks, and GitHub claims that Copilot increases developer productivity by up to 55% (Kalliamvakou 2023). However, content generated with the help of AI is not exactly the same as content generated without AI. The boost in productivity may come at the expense of users' idiosyncrasies, such as personal style and tastes, preferences we would naturally express without AI. To let users express their preferences, many AI systems let users edit their prompt (e.g., Midjourney) or allow more
Human Transcription Quality Improvement
Gao, Jian, Sun, Hanbo, Cao, Cheng, Du, Zheng
High quality transcription data is crucial for training automatic speech recognition (ASR) systems. However, the existing industry-level data collection pipelines are expensive to researchers, while the quality of crowdsourced transcription is low. In this paper, we propose a reliable method to collect speech transcriptions. We introduce two mechanisms to improve transcription quality: confidence estimation based reprocessing at labeling stage, and automatic word error correction at post-labeling stage. We collect and release LibriCrowd - a large-scale crowdsourced dataset of audio transcriptions on 100 hours of English speech. Experiment shows the Transcription WER is reduced by over 50%. We further investigate the impact of transcription error on ASR model performance and found a strong correlation. The transcription quality improvement provides over 10% relative WER reduction for ASR models. We release the dataset and code to benefit the research community.
HTEC: Human Transcription Error Correction
Sun, Hanbo, Gao, Jian, Wu, Xiaomin, Fang, Anjie, Cao, Cheng, Du, Zheng
High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models. Recent study~\cite{libricrowd} has found that every 1% worse transcription Word Error Rate (WER) increases approximately 2% ASR WER by using the transcriptions to train ASR models. Transcription errors are inevitable for even highly-trained annotators. However, few studies have explored human transcription correction. Error correction methods for other problems, such as ASR error correction and grammatical error correction, do not perform sufficiently for this problem. Therefore, we propose HTEC for Human Transcription Error Correction. HTEC consists of two stages: Trans-Checker, an error detection model that predicts and masks erroneous words, and Trans-Filler, a sequence-to-sequence generative model that fills masked positions. We propose a holistic list of correction operations, including four novel operations handling deletion errors. We further propose a variant of embeddings that incorporates phoneme information into the input of the transformer. HTEC outperforms other methods by a large margin and surpasses human annotators by 2.2% to 4.5% in WER. Finally, we deployed HTEC to assist human annotators and showed HTEC is particularly effective as a co-pilot, which improves transcription quality by 15.1% without sacrificing transcription velocity.
Quantifying the Benefit of Artificial Intelligence for Scientific Research
Gao, Jian, Wang, Dashun
The ongoing artificial intelligence (AI) revolution has the potential to change almost every line of work. As AI capabilities continue to improve in accuracy, robustness, and reach, AI may outperform and even replace human experts across many valuable tasks. Despite enormous efforts devoted to understanding AI's impact on labor and the economy and its recent success in accelerating scientific discovery and progress, we lack a systematic understanding of how advances in AI may benefit scientific research across disciplines and fields. Here we develop a measurement framework to estimate both the direct use of AI and the potential benefit of AI in scientific research by applying natural language processing techniques to 87.6 million publications and 7.1 million patents. We find that the use of AI in research appears widespread throughout the sciences, growing especially rapidly since 2015, and papers that use AI exhibit an impact premium, more likely to be highly cited both within and outside their disciplines. While almost every discipline contains some subfields that benefit substantially from AI, analyzing 4.6 million course syllabi across various educational disciplines, we find a systematic misalignment between the education of AI and its impact on research, suggesting the supply of AI talents in scientific disciplines is not commensurate with AI research demands. Lastly, examining who benefits from AI within the scientific workforce, we find that disciplines with a higher proportion of women or black scientists tend to be associated with less benefit, suggesting that AI's growing impact on research may further exacerbate existing inequalities in science. As the connection between AI and scientific research deepens, our findings may have an increasing value, with important implications for the equity and sustainability of the research enterprise.