Goto

Collaborating Authors

 tuna


Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags

Qi, Daiqing, Zhao, Handong, Wei, Zijun, Li, Sheng

arXiv.org Artificial Intelligence

Despite recent advances in the general visual instruction-following ability of Multimodal Large Language Models (MLLMs), they still struggle with critical problems when required to provide a precise and detailed response to a visual instruction: (1) failure to identify novel objects or entities, (2) mention of non-existent objects, and (3) neglect of object's attributed details. Intuitive solutions include improving the size and quality of data or using larger foundation models. They show effectiveness in mitigating these issues, but at an expensive cost of collecting a vast amount of new data and introducing a significantly larger model. Standing at the intersection of these approaches, we examine the three object-oriented problems from the perspective of the image-to-text mapping process by the multimodal connector. In this paper, we first identify the limitations of multimodal connectors stemming from insufficient training data. Driven by this, we propose to enhance the mapping with retrieval-augmented tag tokens, which contain rich object-aware information such as object names and attributes. With our Tag-grounded visual instruction tuning with retrieval Augmentation (TUNA), we outperform baselines that share the same language model and training data on 12 benchmarks. Furthermore, we show the zero-shot capability of TUNA when provided with specific datastores.


Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

Toubal, Imad Eddine, Avinash, Aditya, Alldrin, Neil Gordon, Dlabal, Jan, Zhou, Wenlei, Luo, Enming, Stretcu, Otilia, Xiong, Hao, Lu, Chun-Ta, Zhou, Howard, Krishna, Ranjay, Fuxman, Ariel, Duerig, Tom

arXiv.org Artificial Intelligence

From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, developing classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, which enable rapid bootstrapping of image classifiers, users are still required to spend 30 minutes or more of monotonous, repetitive data labeling just to train a single classifier. Drawing on Fiske's Cognitive Miser theory, we propose a new framework that alleviates manual effort by replacing human labeling with natural language interactions, reducing the total effort required to define a concept by an order of magnitude: from labeling 2,000 images to only 100 plus some natural language interactions. Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points. Most importantly, our framework eliminates the need for crowd-sourced annotations. Moreover, our framework ultimately produces lightweight classification models that are deployable in cost-sensitive scenarios. Across 15 subjective concepts and across 2 public image classification datasets, our trained models outperform traditional Agile Modeling as well as state-of-the-art zero-shot classification models like ALIGN, CLIP, CuPL, and large visual question-answering models like PaLI-X.


Tuna: Instruction Tuning using Feedback from Large Language Models

Li, Haoran, Liu, Yiran, Zhang, Xingxing, Lu, Wei, Wei, Furu

arXiv.org Artificial Intelligence

Instruction tuning of open-source large language models (LLMs) like LLaMA, using direct outputs from more powerful LLMs such as Instruct-GPT and GPT-4, has proven to be a cost-effective way to align model behaviors with human preferences. However, the instruction-tuned model has only seen one response per instruction, lacking the knowledge of potentially better responses. In this paper, we propose finetuning an instruction-tuned LLM using our novel \textit{probabilistic ranking} and \textit{contextual ranking} approaches to increase the likelihood of generating better responses. Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM. On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of stronger LLMs. Furthermore, we apply probabilistic ranking and contextual ranking sequentially to the instruction-tuned LLM. The resulting model, which we call \textbf{Tuna}, consistently improves the performance on Super Natural Instructions (119 test tasks), LMentry (25 test tasks), Vicuna QA, and can even obtain better results than several strong reinforcement learning baselines. Our code and data are available at \url{ https://github.com/microsoft/LMOps}.


Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee

Chen, George H.

arXiv.org Artificial Intelligence

Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets


A Japanese conveyor-belt restaurant will use AI cameras to combat 'sushi terrorism'

Engadget

A viral video trend in Japan has got sushi conveyor-belt restaurants racing to prevent food tampering. One chain, Kura Sushi, said it will use artificial intelligence to look for "suspicious opening and closing of sushi plate covers," Nikkei Asia reported this week. Kura Sushi plans to start upgrading existing cameras, which are used to track the dishes customers take from conveyor belts to determine their bill, by early March. If the system detects suspicious behavior, it will alert employees. "We want to deploy our AI-operated cameras to monitor if customers put the sushi they picked up with their hands back on the plates," a spokesman told CNN. "We are confident we will be able to upgrade the systems we already have in place to deal with these kind of behaviors."


How do tuna schools associate to dFADs? A study using echo-sounder buoys to identify global patterns

Navarro-García, Manuel, Precioso, Daniel, Gavira-O'Neill, Kathryn, Torres-Barrán, Alberto, Gordo, David, Gallego, Víctor, Gómez-Ullate, David

arXiv.org Artificial Intelligence

As fishermen have noticed this behaviour, they have used both natural and man-made floating objects, or drifting Fish Aggregating Devices (dFADs), as a tool for finding and catching tropical tunas. The use of dFADs in tuna purse-seine fisheries has gradually increased since the 1980s to the present time, where vessels using dFADs now contribute to 36% of the world's total tropical tuna catch (Davies et al., 2014; Wain et al., 2021; ISSF, 2021). These widespread changes have highlighted the need to better understand the potential ecological effects of dFADs on tuna ecology and the marine environment, in order to ensure adequate management of fish stocks and dFAD usage. Indeed, both the dynamics of how and why tuna associate to dFADs are still poorly understood. Regarding the reasons behind tuna aggregation to dFADs, a number of hypotheses have been suggested (Fréon and Dagorn, 2000; Dempster and Taquet, 2004; Castro et al., 2002). Of these, two have gained traction: the "meeting-point" hypothesis, which considers that dFADs facilitate the encounter between individuals or schools, thus constituting larger schools that could benefit survival rates (Castro et al., 2002); and the "indicator-log" hypothesis, by which tunas may be safeguarding the survival of their eggs, larvae and juvenile stages by using drifting objects as indicators of areas where plankton and food is readily available (Hall et al., 1992). This scenario has led some authors to postulate that man-made dFADs could have detrimental effects on tuna populations by creating a so-called "ecological trap" which would lead tuna to remain associated to dFADs even as these drift into areas that could negatively affect the tuna's behaviour and biology (Marsac et al., 2000; Hallier and Gaertner, 2008). To the best of our knowledge, there is yet no sufficient evidence to either confirm or reject this hypothesis (see Dagorn et al. (2012) and references therein). Given the concerns around the widespread use of dFADs in tuna fisheries today, it is not surprising that a considerable amount of research has been devoted to characterizing the dynamics at play when tunas aggregate to dFADs.


To swim like a tuna, robotic fish need to change how stiff their tails are in real time

Robohub

Underwater vehicles haven't changed much since the submarines of World War II. They're rigid, fairly boxy and use propellers to move. And whether they are large manned vessels or small robots, most underwater vehicles have one cruising speed where they are most energy efficient. Fish take a very different approach to moving through water: Their bodies and fins are very flexible, and this flexibility allows them to interact with water more efficiently than rigid machines. Researchers have been designing and building flexible fishlike robots for years, but they still trail far behind real fish in terms of efficiency.


Understanding Uncertainty in Bayesian Deep Learning

Lorsung, Cooper

arXiv.org Machine Learning

Neural Linear Models (NLM) are deep Bayesian models that produce predictive uncertainty by learning features from the data and then performing Bayesian linear regression over these features. Despite their popularity, few works have focused on formally evaluating the predictive uncertainties of these models. Furthermore, existing works point out the difficulties of encoding domain knowledge in models like NLMs, making them unsuitable for applications where interpretability is required. In this work, we show that traditional training procedures for NLMs can drastically underestimate uncertainty in data-scarce regions. We identify the underlying reasons for this behavior and propose a novel training method that can both capture useful predictive uncertainties as well as allow for incorporation of domain knowledge.


Japanese seafood industry taps AI for fish selection

The Japan Times

Japanese fish industries are starting to use artificial intelligence to select high-quality fish at markets and find good fishing grounds, areas where they have traditionally relied largely on experience and intuition. AI tools are drawing attention because they can easily replicate proficient skills, including those needed to evaluate tuna quality and determine good spots to catch saury. When judging the quality of fish, buyers look at how fresh and firm the meat is and how much fat it puts on. "You need over 10 years of experience" to acquire an excellent eye, a fish market worker said. Advertising giant Dentsu Inc. and others jointly developed and put into practical use a smartphone app that enables users to easily pick out delicious tuna.


This New App Uses AI To Grade Tuna Freshness

#artificialintelligence

Sushi is only as good as the fish wrapped inside its barrel of rice and seaweed. If the tuna, yellowtail, or salmon isn't fresh, it not only looks gross, but renders the whole roll underwhelming in flavor and texture. To keep things from getting fishy, a Japanese company has developed a new mobile app that uses artificial intelligence to grade the freshness of cuts of tuna on sight. Aptly named Tuna Scope, the system uses thousands of cross-sectional images of tuna tails as training data to learn what good quality tuna looks like. According to the Tuna Scope website, trained fishmongers use the tuna tail as a "road map" detailing the fish's flavor, texture, freshness, and overall excellence.