Goto

Collaborating Authors

 plum


PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding

Blume, Ansel, Kim, Jeonghwan, Ha, Hyeonjeong, Chatikyan, Elen, Jin, Xiaomeng, Nguyen, Khanh Duy, Peng, Nanyun, Chang, Kai-Wei, Hoiem, Derek, Ji, Heng

arXiv.org Artificial Intelligence

Real-world objects are composed of distinctive, object-specific parts. Identifying these parts is key to performing fine-grained, compositional reasoning-yet, large multimodal models (LMMs) struggle to perform this seemingly straightforward task. In this work, we introduce PARTONOMY, an LMM benchmark designed for pixel-level part grounding. We construct PARTONOMY from existing part datasets and our own rigorously annotated set of images, encompassing 862 part labels and 534 object labels for evaluation. Unlike existing datasets that simply ask models to identify generic parts, PARTONOMY uses specialized concepts (e.g., agricultural airplane), and challenges models to compare objects' parts, consider part-whole relationships, and justify textual predictions with visual segmentations. Our experiments demonstrate significant limitations in state-of-the-art LMMs (e.g., LISA-13B achieves only 5.9% gIoU), highlighting a critical gap in their part grounding abilities. We note that existing segmentation-enabled LMMs (segmenting LMMs) have two key architectural shortcomings: they use special [SEG] tokens not seen during pretraining which induce distribution shift, and they discard predicted segmentations instead of using past predictions to guide future ones. To address these deficiencies, we train several part-centric LMMs and propose PLUM, a novel segmenting LMM that uses span tagging instead of segmentation tokens and that conditions on prior predictions in a feedback loop. We find that pretrained PLUM outperforms existing segmenting LMMs on reasoning segmentation, VQA, and visual hallucination benchmarks. In addition, PLUM finetuned on our proposed Explanatory Part Segmentation task is competitive with segmenting LMMs trained on significantly more segmentation data. Our work opens up new avenues towards enabling fine-grained, grounded visual understanding in LMMs.


PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations

He, Ruining, Heldt, Lukasz, Hong, Lichan, Keshavan, Raghunandan, Mao, Shifan, Mehta, Nikhil, Su, Zhengyang, Tsai, Alicia, Wang, Yueqi, Wang, Shao-Chuan, Yi, Xinyang, Baugher, Lexi, Cakici, Baykal, Chi, Ed, Goodrow, Cristos, Han, Ningren, Ma, He, Rosales, Romer, Van Soest, Abby, Tandon, Devansh, Wu, Su-Lin, Yang, Weilong, Zheng, Yilin

arXiv.org Artificial Intelligence

Large Language Models (LLMs) pose a new paradigm of modeling and computation for information tasks. Recommendation systems are a critical application domain poised to benefit significantly from the sequence modeling capabilities and world knowledge inherent in these large models. In this paper, we introduce PLUM, a framework designed to adapt pre-trained LLMs for industry-scale recommendation tasks. PLUM consists of item tokenization using Semantic IDs, continued pre-training (CPT) on domain-specific data, and task-specific fine-tuning for recommendation objectives. For fine-tuning, we focus particularly on generative retrieval, where the model is directly trained to generate Semantic IDs of recommended items based on user context. We conduct comprehensive experiments on large-scale internal video recommendation datasets. Our results demonstrate that PLUM achieves substantial improvements for retrieval compared to a heavily-optimized production model built with large embedding tables. We also present a scaling study for the model's retrieval performance, our learnings about CPT, a few enhancements to Semantic IDs, along with an overview of the training and inference methods that enable launching this framework to billions of users in YouTube.


On the Way to LLM Personalization: Learning to Remember User Conversations

Magister, Lucie Charlotte, Metcalf, Katherine, Zhang, Yizhe, ter Hoeve, Maartje

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks. However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization. Prior work in LLM personalization has largely focused on style transfer or incorporating small factoids about the user, as knowledge injection remains an open challenge. In this paper, we explore injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations. We identify two real-world constraints: (1) conversations are sequential in time and must be treated as such during training, and (2) per-user personalization is only viable in parameter-efficient settings. To this aim, we propose PLUM, a pipeline performing data augmentation for up-sampling conversations as question-answer pairs, that are then used to finetune a low-rank adaptation adapter with a weighted cross entropy loss. Even in this first exploration of the problem, we perform competitively with baselines such as RAG, attaining an accuracy of 81.5% across 100 conversations.


Plum: Prompt Learning using Metaheuristic

Pan, Rui, Xing, Shuo, Diao, Shizhe, Sun, Wenhe, Liu, Xiang, Shum, Kashun, Pi, Renjie, Zhang, Jipeng, Zhang, Tong

arXiv.org Artificial Intelligence

Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunately, few existing prompt learning methods satisfy the criteria of being truly "general", i.e., automatic, discrete, black-box, gradient-free, and interpretable all at once. In this paper, we introduce metaheuristics, a branch of discrete non-convex optimization methods with over 100 options, as a promising approach to prompt learning. Within our paradigm, we test six typical methods: hill climbing, simulated annealing, genetic algorithms with/without crossover, tabu search, and harmony search, demonstrating their effectiveness in white-box and black-box prompt learning. Furthermore, we show that these methods can be used to discover more human-understandable prompts that were previously unknown in both reasoning and image generation tasks, opening the door to a cornucopia of possibilities in prompt optimization. We release all the codes in \url{https://github.com/research4pan/Plum}.


PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models

Zhang, Dylan, Diao, Shizhe, Zou, Xueyan, Peng, Hao

arXiv.org Artificial Intelligence

Instruction-finetuned code language models (LMs) have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language instructions and gold code snippet pairs. Recent evidence suggests that these models, never exposed to incorrect solutions during training, often struggle to distinguish between correct and incorrect solutions. This observation raises our inquiry: Can preference learning, which trains models to prefer correct solutions over incorrect ones, help push the boundaries of code LMs even further? We propose PLUM, a novel \textbf{p}reference \textbf{l}earning framework a\textbf{u}gmented with test cases tailored for code L\textbf{M}s.PLUM aims to investigate the key success factors and potential benefits of preference learning in code LMs, which remain elusive despite its success in aligning LMs with human values. PLUM consists of three stages: (1) Generating test cases for natural language instructions, (2) sampling candidate solutions from the policy and evaluating them against the test cases to create a preference dataset, which is then used to (3) train the policy with a preference learning algorithm. Experiments demonstrate that PLUM substantially improves the performance of existing code LMs on established code generation benchmarks such as HumanEval (+) and MBPP (+), even for the state-of-the-art open-source language model CodeQwen-1.5-7B-Chat. PLUM complements the supervised fine-tuning (SFT) stage, demonstrating synergistic effects.


LLMs for Robotic Object Disambiguation

Jiang, Connie, Xu, Yiqing, Hsu, David

arXiv.org Artificial Intelligence

The advantages of pre-trained large language models (LLMs) are apparent in a variety of language processing tasks. But can a language model's knowledge be further harnessed to effectively disambiguate objects and navigate decision-making challenges within the realm of robotics? Our study reveals the LLM's aptitude for solving complex decision making challenges that are often previously modeled by Partially Observable Markov Decision Processes (POMDPs). A pivotal focus of our research is the object disambiguation capability of LLMs. We detail the integration of an LLM into a tabletop environment disambiguation task, a decision making problem where the robot's task is to discern and retrieve a user's desired object from an arbitrarily large and complex cluster of objects. Despite multiple query attempts with zero-shot prompt engineering (details can be found in the Appendix), the LLM struggled to inquire about features not explicitly provided in the scene description. In response, we have developed a few-shot prompt engineering system to improve the LLM's ability to pose disambiguating queries. The result is a model capable of both using given features when they are available and inferring new relevant features when necessary, to successfully generate and navigate down a precise decision tree to the correct object--even when faced with identical options.


The surprising future of fintech

#artificialintelligence

Thanks to open banking, fintech early adopters likely already have accounts that round up transactions to boost savings or connect to third-party tools for loan applications, budget management and more. But the new wave of fintech startups are proving there's much more that can be done using open banking, the two-year-old mandate from UK regulators that required banks to easily allow their customers to share their data with third parties such as apps. "Open banking offers people the chance to get personalised, tailored support to help them manage their money by allowing regulated companies to securely analyse their bank data," says Lubaina Manji, senior programme manager at Nesta Challenges, one of the organisations behind the Open Up 2020 Challenge, alongside the Open Banking Implementation Entity (OBIE). "It's enabled the creation of new services and tools to help people with every aspect of money management – from budgeting to investing, and much, much more, all in a safe and secure way." And some of the innovations from finalists in the Open Up 2020 Challenge have surprised with their ingenuity and customer focus, she says, citing Sustainably's round-up tool for automated charity donations, and Kalgera's neuroscience-informed AI to help spot fraud targeting people with dementia – two projects that highlight the purpose-driven idea behind open banking and the aim to get financial support to show who need it the most.


Amazon just upgraded the popular Echo Dot--is it worth buying?

USATODAY - Tech Top Stories

If you're considering getting or giving a smart speaker in the near future, the Echo Dot is a great place to start. It doesn't take up much real estate on the counter, it's relatively easy on the wallet, and it comes loaded with Alexa and her many, many capabilities. But now that there is a new generation of Echo Dots available, should you spring for the latest and greatest? Let's look at the new Dot, what makes it different, and whether it's worth the extra cash. The display on the third-generation Echo Dot with Clock can also display timers and weather.


AI's desire

#artificialintelligence

At the Artificial Intelligence Conference in New York, Kathryn Hume pointed me to Ellen Ullman's excellent book, Life in Code: A Personal History of Technology. In Part 3 of her book "Life, Artificial," Ullman talks about artificial intelligence, robotics, and the desire to create artificial life. What these views of human sentience have in common, and why they fail to describe us, is a certain disdain for the body: the utter lack of a body in early AI and in later formulations like Kurzweil's (the lonely cortex, scanned and downloaded, a brain-in-a-jar); and the disregard for this body, this mammalian flesh, in robotics and ALife [Artificial Life]. By connecting the poverty of AI with its denial of the body, Ullman follows an important thread in feminist theory: our thinking needs to be connected to bodies, to physical human process, to blood and meat. The male-dominated Western tradition is all about abstraction, for which Plato is the poster child.


Plum uses AI to hire people 'that never would have been discovered through a traditional hiring process'

#artificialintelligence

Recruiters have a bias problem. A 2017 meta study from Northwestern, Harvard, and the Institute of Social Research in Norway found that hiring prejudice against black candidates hasn't changed in the last 25 years, and that Latinos have only seen a "moderate" drop. It is not just races and ethnicities that employers are discriminating against -- according to a recent paper authored by Harvard and Stanford researchers, women earn 78 cents on the dollar compared to men and are less likely to advance to the top of their fields. The solution, Caitlin MacGregor says, is artificial intelligence. She's the CEO and founder of Waterloo, Ontario-based Plum.io, a hiring platform that emphasizes "raw talent," as opposed to skills and knowledge.