Personal
Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application
Chen, Keyu, Fei, Cheng, Bi, Ziqian, Liu, Junyu, Peng, Benji, Zhang, Sen, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Yin, Caitlyn Heqi, Zhang, Yichao, Feng, Pohsun, Wen, Yizhu, Wang, Tianyang, Li, Ming, Ren, Jintao, Niu, Qian, Chen, Silin, Hsieh, Weiche, Yan, Lawrence K. Q., Liang, Chia Xin, Xu, Han, Tseng, Hong-Ming, Song, Xinyuan, Liu, Ming
With a focus on natural language processing (NLP) and the role of large language models (LLMs), we explore the intersection of machine learning, deep learning, and artificial intelligence. As artificial intelligence continues to revolutionize fields from healthcare to finance, NLP techniques such as tokenization, text classification, and entity recognition are essential for processing and understanding human language. This paper discusses advanced data preprocessing techniques and the use of frameworks like Hugging Face for implementing transformer-based models. Additionally, it highlights challenges such as handling multilingual data, reducing bias, and ensuring model robustness. By addressing key aspects of data processing and model fine-tuning, this work aims to provide insights into deploying effective and ethically sound AI solutions.
DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation
Wanner, Miriam, Van Durme, Benjamin, Dredze, Mark
The decompose-then-verify strategy for verification of Large Language Model (LLM) generations decomposes claims that are then independently verified. Decontextualization augments text (claims) to ensure it can be verified outside of the original context, enabling reliable verification. While decomposition and decontextualization have been explored independently, their interactions in a complete system have not been investigated. Their conflicting purposes can create tensions: decomposition isolates atomic facts while decontextualization inserts relevant information. Furthermore, a decontextualized subclaim presents a challenge to the verification step: what part of the augmented text should be verified as it now contains multiple atomic facts? We conduct an evaluation of different decomposition, decontextualization, and verification strategies and find that the choice of strategy matters in the resulting factuality scores. Additionally, we introduce DnDScore, a decontextualization aware verification method which validates subclaims in the context of contextual information.
Context-DPO: Aligning Language Models for Context-Faithfulness
Bi, Baolong, Huang, Shaohan, Wang, Yiwei, Yang, Tianchi, Zhang, Zihan, Huang, Haizhen, Mei, Lingrui, Fang, Junfeng, Li, Zehao, Wei, Furu, Deng, Weiwei, Sun, Feng, Zhang, Qi, Liu, Shenghua
Reliable responses from large language models (LLMs) require adherence to user instructions and retrieved information. While alignment techniques help LLMs align with human intentions and values, improving context-faithfulness through alignment remains underexplored. To address this, we propose $\textbf{Context-DPO}$, the first alignment method specifically designed to enhance LLMs' context-faithfulness. We introduce $\textbf{ConFiQA}$, a benchmark that simulates Retrieval-Augmented Generation (RAG) scenarios with knowledge conflicts to evaluate context-faithfulness. By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization. Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models. Further analysis demonstrates that Context-DPO preserves LLMs' generative capabilities while providing interpretable insights into context utilization. Our code and data are released at https://github.com/byronBBL/Context-DPO
CharacterBench: Benchmarking Character Customization of Large Language Models
Zhou, Jinfeng, Huang, Yongkang, Wen, Bosi, Bi, Guanqun, Chen, Yuxuan, Ke, Pei, Chen, Zhuang, Xiao, Xiyao, Peng, Libiao, Tang, Kuntian, Zhang, Rongsheng, Zhang, Le, Lv, Tangjie, Hu, Zhipeng, Wang, Hongning, Huang, Minlie
Character-based dialogue (aka role-playing) enables users to freely customize characters for interaction, which often relies on LLMs, raising the need to evaluate LLMs' character customization capability. However, existing benchmarks fail to ensure a robust evaluation as they often only involve a single character category or evaluate limited dimensions. Moreover, the sparsity of character features in responses makes feature-focused generative evaluation both ineffective and inefficient. To address these issues, we propose CharacterBench, the largest bilingual generative benchmark, with 22,859 human-annotated samples covering 3,956 characters from 25 detailed character categories. We define 11 dimensions of 6 aspects, classified as sparse and dense dimensions based on whether character features evaluated by specific dimensions manifest in each response. We enable effective and efficient evaluation by crafting tailored queries for each dimension to induce characters' responses related to specific dimensions. Further, we develop CharacterJudge model for cost-effective and stable evaluations. Experiments show its superiority over SOTA automatic judges (e.g., GPT-4) and our benchmark's potential to optimize LLMs' character customization. Our repository is at https://github.com/thu-coai/CharacterBench.
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
Li, Xiaoxi, Jin, Jiajie, Zhou, Yujia, Wu, Yongkang, Li, Zhonghua, Ye, Qi, Dou, Zhicheng
Large language models (LLMs) exhibit remarkable generative capabilities but often suffer from hallucinations. Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, but existing methods still face several limitations: additional deployment costs of separate retrievers, redundant input tokens from retrieved text chunks, and the lack of joint optimization of retrieval and generation. To address these issues, we propose \textbf{RetroLLM}, a unified framework that integrates retrieval and generation into a single, cohesive process, enabling LLMs to directly generate fine-grained evidence from the corpus with constrained decoding. Moreover, to mitigate false pruning in the process of constrained evidence generation, we introduce (1) hierarchical FM-Index constraints, which generate corpus-constrained clues to identify a subset of relevant documents before evidence generation, reducing irrelevant decoding space; and (2) a forward-looking constrained decoding strategy, which considers the relevance of future sequences to improve evidence accuracy. Extensive experiments on five open-domain QA datasets demonstrate RetroLLM's superior performance across both in-domain and out-of-domain tasks. The code is available at \url{https://github.com/sunnynexus/RetroLLM}.
LLMs Can Simulate Standardized Patients via Agent Coevolution
Du, Zhuoyun, Zheng, Lujie, Hu, Renjun, Xu, Yuyang, Li, Xiawei, Sun, Ying, Chen, Wei, Wu, Jian, Cai, Haolei, Ying, Haohao
Training medical personnel using standardized patients (SPs) remains a complex challenge, requiring extensive domain expertise and role-specific practice. Most research on Large Language Model (LLM)-based simulated patients focuses on improving data retrieval accuracy or adjusting prompts through human feedback. However, this focus has overlooked the critical need for patient agents to learn a standardized presentation pattern that transforms data into human-like patient responses through unsupervised simulations. To address this gap, we propose EvoPatient, a novel simulated patient framework in which a patient agent and doctor agents simulate the diagnostic process through multi-turn dialogues, simultaneously gathering experience to improve the quality of both questions and answers, ultimately enabling human doctor training. Extensive experiments on various cases demonstrate that, by providing only overall SP requirements, our framework improves over existing reasoning methods by more than 10% in requirement alignment and better human preference, while achieving an optimal balance of resource consumption after evolving over 200 cases for 10 hours, with excellent generalizability. The code will be available at https://github.com/ZJUMAI/EvoPatient.
'Trump has been explicit about revenge': Asif Kapadia on his new film about the threat to democracy
It was some time in the early 2000s and Asif Kapadia, already a successful film director, a wunderkind whose first feature in 2001, The Warrior, won the Bafta for outstanding British film, was travelling back from New York. I'm in a limo being taken to the airport. And I was taking photos of Manhattan because I was driving over Brooklyn Bridge and it's just all so cinematic and I became subconsciously aware of the driver watching me in the rear view mirror. "I get to the airport and I'm in the Virgin lounge when my name is called out. And I thought: 'Have I left a bag or something?' But then five or six people come: homeland security. And they stop me in the lounge in front of everyone, the only person of colour in there, and empty out my bag, and they say: 'Someone's reported you.' And it's like: 'Who are you? An itinerary of his trip and its purpose proved his credentials and he was eventually allowed to go and boarded his flight. But for nearly a decade afterwards, he found himself on a "watch list". "I would get stopped and interviewed two times before I got on a plane, pulled out in a room.
Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models
Zhu, Xiaochen, Karadzhov, Georgi, Whitehouse, Chenxi, Vlachos, Andreas
Diffusion models have shown promise in text generation but often struggle with generating long, coherent, and contextually accurate text. Token-level diffusion overlooks word-order dependencies and enforces short output windows, while passage-level diffusion struggles with learning robust representation for long-form text. To address these challenges, we propose Segment-Level Diffusion (SLD), a framework that enhances diffusion-based text generation through text segmentation, robust representation training with adversarial and contrastive learning, and improved latent-space guidance. By segmenting long-form outputs into separate latent representations and decoding them with an autoregressive decoder, SLD simplifies diffusion predictions and improves scalability. Experiments on XSum, ROCStories, DialogSum, and DeliData demonstrate that SLD achieves competitive or superior performance in fluency, coherence, and contextual compatibility across automatic and human evaluation metrics comparing with other diffusion and autoregressive baselines. Ablation studies further validate the effectiveness of our segmentation and representation learning strategies.
The Game Awards 2024: The 15 biggest announcements and new trailers including The Witcher 4 and Elden Ring
Our review of Astro Bot earlier this year called it "one of the best games Sony has ever made," and it seems the industry and game-playing public agree. As always, the long, long stream was a hybrid award ceremony, advertising reel and game announcement marathon. There were countless announcements interspersed throughout the awards, including all-new games like Intergalactic: The Heretic Prophet from Naughty Dog, The Witcher 4 from CD Projekt RED and Split Fiction from It Takes Two studio Hazelight. It was also a show of revivals, with long-dormant franchises like Okami, Onimusha, Ninja Gaiden and Virtua Fighter returning. You can view all of the winners at the Game Awards' official site.
Former ByteDance Intern Accused of Sabotage Among Winners of Prestigious AI Award
A former ByteDance intern who was allegedly dismissed for professional misconduct, including sabotaging colleagues' work, was announced as a winner of one of the most prestigious annual awards for AI research this week. Keyu Tian, whose LinkedIn and Google Scholar pages list him as a master's student in computer science at Peking University, is the first author of one of two papers chosen Tuesday for the main "Best Paper Award" at the Neural Information Processing Systems (NeurIPS) conference, the largest gathering of machine learning researchers in the world. The paper, titled "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction," presents a new method for creating AI-generated images that Tian and four coauthors--all affiliated with either ByteDance or Peking University--claim is faster and more efficient than its predecessors. "The overall quality of the paper presentation, experimental validation and insights (scaling laws) give compelling reasons to experiment with this model," the NeurIPS Best Paper Award committee wrote in a statement. The committee's decision to grant the honor to Tian, whom ByteDance reportedly sued for over 1 million in damages last month, claiming deliberate sabotage of other company research projects, quickly became the focus of wider discussions online about how NeurIPS is run and the way top AI researchers evaluate the work of their colleagues.