Huang, Weilin
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
Ji, Yatai, Zhang, Jiacheng, Wu, Jie, Zhang, Shilong, Chen, Shoufa, GE, Chongjian, Sun, Peize, Chen, Weifeng, Shao, Wenqi, Xiao, Xuefeng, Huang, Weilin, Luo, Ping
Text-to-video models have made remarkable advancements through optimization on high-quality text-video pairs, where the textual prompts play a pivotal role in determining quality of output videos. However, achieving the desired output often entails multiple revisions and iterative inference to refine user-provided prompts. Current automatic methods for refining prompts encounter challenges such as Modality-Inconsistency, Cost-Discrepancy, and Model-Unaware when applied to text-to-video diffusion models. To address these problem, we introduce an LLM-based prompt adaptation framework, termed as Prompt-A-Video, which excels in crafting Video-Centric, Labor-Free and Preference-Aligned prompts tailored to specific video diffusion model. Our approach involves a meticulously crafted two-stage optimization and alignment system. Initially, we conduct a reward-guided prompt evolution pipeline to automatically create optimal prompts pool and leverage them for supervised fine-tuning (SFT) of the LLM. Then multi-dimensional rewards are employed to generate pairwise data for the SFT model, followed by the direct preference optimization (DPO) algorithm to further facilitate preference alignment. Through extensive experimentation and comparative analyses, we validate the effectiveness of Prompt-A-Video across diverse generation models, highlighting its potential to push the boundaries of video generation.
Fast Prompt Alignment for Text-to-Image Generation
Mrini, Khalil, Lu, Hanlin, Yang, Linjie, Huang, Weilin, Wang, Heng
Text-to-image generation has advanced rapidly, yet aligning complex textual prompts with generated visuals remains challenging, especially with intricate object relationships and fine-grained details. This paper introduces Fast Prompt Alignment (FPA), a prompt optimization framework that leverages a one-pass approach, enhancing text-to-image alignment efficiency without the iterative overhead typical of current methods like OPT2I. FPA uses large language models (LLMs) for single-iteration prompt paraphrasing, followed by fine-tuning or in-context learning with optimized prompts to enable real-time inference, reducing computational demands while preserving alignment fidelity. Extensive evaluations on the COCO Captions and PartiPrompts datasets demonstrate that FPA achieves competitive text-image alignment scores at a fraction of the processing time, as validated through both automated metrics (TIFA, VQA) and human evaluation. A human study with expert annotators further reveals a strong correlation between human alignment judgments and automated scores, underscoring the robustness of FPA's improvements. The proposed method showcases a scalable, efficient alternative to iterative prompt optimization, enabling broader applicability in real-time, high-demand settings. The codebase is provided to facilitate further research: https://github.com/tiktok/fast_prompt_alignment
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Bakas, Spyridon, Reyes, Mauricio, Jakab, Andras, Bauer, Stefan, Rempfler, Markus, Crimi, Alessandro, Shinohara, Russell Takeshi, Berger, Christoph, Ha, Sung Min, Rozycki, Martin, Prastawa, Marcel, Alberts, Esther, Lipkova, Jana, Freymann, John, Kirby, Justin, Bilello, Michel, Fathallah-Shaykh, Hassan, Wiest, Roland, Kirschke, Jan, Wiestler, Benedikt, Colen, Rivka, Kotrotsou, Aikaterini, Lamontagne, Pamela, Marcus, Daniel, Milchenko, Mikhail, Nazeri, Arash, Weber, Marc-Andre, Mahajan, Abhishek, Baid, Ujjwal, Kwon, Dongjin, Agarwal, Manu, Alam, Mahbubul, Albiol, Alberto, Albiol, Antonio, Alex, Varghese, Tran, Tuan Anh, Arbel, Tal, Avery, Aaron, B., Pranjal, Banerjee, Subhashis, Batchelder, Thomas, Batmanghelich, Kayhan, Battistella, Enzo, Bendszus, Martin, Benson, Eze, Bernal, Jose, Biros, George, Cabezas, Mariano, Chandra, Siddhartha, Chang, Yi-Ju, Chazalon, Joseph, Chen, Shengcong, Chen, Wei, Chen, Jefferson, Cheng, Kun, Christoph, Meinel, Chylla, Roger, Clérigues, Albert, Costa, Anthony, Cui, Xiaomeng, Dai, Zhenzhen, Dai, Lutao, Deutsch, Eric, Ding, Changxing, Dong, Chao, Dudzik, Wojciech, Estienne, Théo, Shin, Hyung Eun, Everson, Richard, Fabrizio, Jonathan, Fang, Longwei, Feng, Xue, Fidon, Lucas, Fridman, Naomi, Fu, Huan, Fuentes, David, Gering, David G, Gao, Yaozong, Gates, Evan, Gholami, Amir, Gong, Mingming, González-Villá, Sandra, Pauloski, J. Gregory, Guan, Yuanfang, Guo, Sheng, Gupta, Sudeep, Thakur, Meenakshi H, Maier-Hein, Klaus H., Han, Woo-Sup, He, Huiguang, Hernández-Sabaté, Aura, Herrmann, Evelyn, Himthani, Naveen, Hsu, Winston, Hsu, Cheyu, Hu, Xiaojun, Hu, Xiaobin, Hu, Yan, Hu, Yifan, Hua, Rui, Huang, Teng-Yi, Huang, Weilin, Huo, Quan, HV, Vivek, Isensee, Fabian, Islam, Mobarakol, Albiol, Francisco J., Wang, Chiatse J., Jambawalikar, Sachin, Jose, V Jeya Maria, Jian, Weijian, Jin, Peter, Jungo, Alain, Nuechterlein, Nicholas K, Kao, Po-Yu, Kermi, Adel, Keutzer, Kurt, Khened, Mahendra, Kickingereder, Philipp, King, Nik, Knapp, Haley, Knecht, Urspeter, Kohli, Lisa, Kong, Deren, Kong, Xiangmao, Koppers, Simon, Kori, Avinash, Krishnamurthi, Ganapathy, Kumar, Piyush, Kushibar, Kaisar, Lachinov, Dmitrii, Lee, Joon, Lee, Chengen, Lee, Yuehchou, Lefkovits, Szidonia, Lefkovits, Laszlo, Li, Tengfei, Li, Hongwei, Li, Wenqi, Li, Hongyang, Li, Xiaochuan, Lin, Zheng-Shen, Lin, Fengming, Liu, Chang, Liu, Boqiang, Liu, Xiang, Liu, Mingyuan, Liu, Ju, Lladó, Xavier, Luo, Lin, Iftekharuddin, Khan M., Tsai, Yuhsiang M., Ma, Jun, Ma, Kai, Mackie, Thomas, Mahmoudi, Issam, Marcinkiewicz, Michal, McKinley, Richard, Mehta, Sachin, Mehta, Raghav, Meier, Raphael, Merhof, Dorit, Meyer, Craig, Mitra, Sushmita, Moiyadi, Aliasgar, Mrukwa, Grzegorz, Monteiro, Miguel A. B., Myronenko, Andriy, Carver, Eric N, Nalepa, Jakub, Ngo, Thuyen, Niu, Chen, Oermann, Eric, Oliveira, Arlindo, Oliver, Arnau, Ourselin, Sebastien, French, Andrew P., Pound, Michael P., Pridmore, Tony P., Serrano-Rubio, Juan Pablo, Paragios, Nikos, Paschke, Brad, Pei, Linmim, Peng, Suting, Pham, Bao, Piella, Gemma, Pillai, G. N., Piraud, Marie, Popli, Anmol, Prčkovska, Vesna, Puch, Santi, Puybareau, Élodie, Qiao, Xu, Suter, Yannick R, Scott, Matthew R., Rane, Swapnil, Rebsamen, Michael, Ren, Hongliang, Ren, Xuhua, Rezaei, Mina, Lorenzo, Pablo Ribalta, Rippel, Oliver, Robert, Charlotte, Choudhury, Ahana Roy, Jackson, Aaron S., Manjunath, B. S., Salem, Mostafa, Salvi, Joaquim, Sánchez, Irina, Schellingerhout, Dawid, Shboul, Zeina, Shen, Haipeng, Shen, Dinggang, Shenoy, Varun, Shi, Feng, Shu, Hai, Snyder, James, Han, Il Song, Soni, Mehul, Stawiaski, Jean, Subramanian, Shashank, Sun, Li, Sun, Roger, Sun, Jiawei, Sun, Kay, Sun, Yu, Sun, Guoxia, Sun, Shuang, Park, Moo Sung, Szilagyi, Laszlo, Talbar, Sanjay, Tao, Dacheng, Tao, Dacheng, Khadir, Mohamed Tarek, Thakur, Siddhesh, Tochon, Guillaume, Tran, Tuan, Tseng, Kuan-Lun, Turlapov, Vadim, Tustison, Nicholas, Shankar, B. Uma, Vakalopoulou, Maria, Valverde, Sergi, Vanguri, Rami, Vasiliev, Evgeny, Vercauteren, Tom, Vidyaratne, Lasitha, Vivekanandan, Ajeet, Wang, Guotai, Wang, Qian, Wang, Weichung, Wen, Ning, Wen, Xin, Weninger, Leon, Wick, Wolfgang, Wu, Shaocheng, Wu, Qiang, Xia, Yong, Xu, Yanwu, Xu, Xiaowen, Xu, Peiyuan, Yang, Tsai-Ling, Yang, Xiaoping, Yang, Hao-Yu, Yang, Junlin, Yang, Haojin, Yao, Hongdou, Young-Moxon, Brett, Yue, Xiangyu, Zhang, Songtao, Zhang, Angela, Zhang, Kun, Zhang, Xuejie, Zhang, Lichi, Zhang, Xiaoyue, Zhao, Sicheng, Zhao, Yu, Zheng, Yefeng, Zhong, Liming, Zhou, Chenhong, Zhou, Xiaobing, Zhu, Hongtu, Zong, Weiwei, Kalpathy-Cramer, Jayashree, Farahani, Keyvan, Davatzikos, Christos, van Leemput, Koen, Menze, Bjoern
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e. 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that undergone gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
Reading Scene Text in Deep Convolutional Sequences
He, Pan (Chinese Academy of Sciences) | Huang, Weilin (Chinese Academy of Sciences) | Qiao, Yu (Chinese Academy of Sciences) | Loy, Chen Change (The Chinese University of Hong Kong) | Tang, Xiaoou (The Chinese University of Hong Kong)
We develop a Deep-Text Recurrent Network (DTRN)that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered highlevel sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information in word image, which is essential to discriminate word strings; (iv) the model does not depend on pre-defined dictionary, and it can process unknown words and arbitrary strings. It achieves impressive results on several benchmarks, advancing the-state-of-the-art substantially.