AITopics | Zhao, Hongyu

Collaborating Authors

Zhao, Hongyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BenTo: Benchmark Task Reduction with In-Context Transferability

Zhao, Hongyu, Li, Ming, Sun, Lichao, Zhou, Tianyi

arXiv.org Artificial IntelligenceOct-21-2024

Evaluating large language models (LLMs) is costly: it requires the generation and examination of LLM outputs on a large-scale benchmark of various tasks. This paper investigates how to efficiently reduce the tasks used to benchmark LLMs without affecting the evaluation quality. Our study reveals that task transferability and relevance provide critical information to identify the most representative subset of tasks via optimizing a facility location function. We propose a practically efficient metric for estimating the transferability between two tasks via in-context learning (ICL). By analyzing the pairwise transferability, we can reduce tasks in a modern LLM benchmark (e.g., MMLU or FLAN) to 5% while inducing only a <4% difference to the evaluation on the original benchmark. Compared to prior works, our method is training-free, gradient-free, and highly efficient requiring ICL only.

benchmark, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.13804

Country: North America > Mexico (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Mosaic IT: Enhancing Instruction Tuning with Data Mosaics

Li, Ming, Chen, Pei, Wang, Chenguang, Zhao, Hongyu, Liang, Yijun, Hou, Yupeng, Liu, Fuxiao, Zhou, Tianyi

arXiv.org Artificial IntelligenceMay-22-2024

Finetuning large language models with a variety of instruction-response pairs has enhanced their capability to understand and follow instructions. Current instruction tuning primarily relies on teacher models or human intervention to generate and refine the instructions and responses, which are costly, non-sustainable, and may lack diversity. In this paper, we introduce Mosaic Instruction Tuning (Mosaic-IT), a human/model-free method that can efficiently create rich and diverse augmentations from existing instruction tuning data to enhance the finetuned LLM.Mosaic-IT randomly concatenates multiple instruction data into one and trains the model to produce the corresponding responses with predefined higher-level meta-instructions to strengthen its multi-step instruction-following and format-following skills. Our extensive evaluations demonstrate a superior performance and training efficiency of Mosaic-IT, which achieves consistent performance improvements over various benchmarks and an 80% reduction in training costs compared with original instruction tuning. Our codes and data are available at https://github.com/tianyi-lab/Mosaic-IT.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.13326

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

Li, Ming, Zhang, Yong, He, Shwai, Li, Zhitao, Zhao, Hongyu, Wang, Jianzong, Cheng, Ning, Zhou, Tianyi

arXiv.org Artificial IntelligenceFeb-1-2024

Instruction tuning is critical to improve LLMs but usually suffers from low-quality and redundant data. Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process. But it also leads to extra cost and computation due to the involvement of LLMs in this process. To reduce the filtering cost, we study Superfiltering: Can we use a smaller and weaker model to select data for finetuning a larger and stronger model? Despite the performance gap between weak and strong language models, we find their highly consistent capability to perceive instruction difficulty and data selection results. This enables us to use a much smaller and more efficient model to filter the instruction data used to train a larger language model. Not only does it largely speed up the data filtering, but the filtered-data-finetuned LLM achieves even better performance on standard benchmarks. Extensive experiments validate the efficacy and efficiency of our approach.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.0053

Country:

North America > Canada (0.14)
Europe > Italy (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Real-time object detection and robotic manipulation for agriculture using a YOLO-based learning approach

Zhao, Hongyu, Tang, Zezhi, Li, Zhenhong, Dong, Yi, Si, Yuancheng, Lu, Mingyang, Panoutsos, George

arXiv.org Artificial IntelligenceJan-28-2024

The optimisation of crop harvesting processes for commonly cultivated crops is of great importance in the aim of agricultural industrialisation. Nowadays, the utilisation of machine vision has enabled the automated identification of crops, leading to the enhancement of harvesting efficiency, but challenges still exist. This study presents a new framework that combines two separate architectures of convolutional neural networks (CNNs) in order to simultaneously accomplish the tasks of crop detection and harvesting (robotic manipulation) inside a simulated environment. Crop images in the simulated environment are subjected to random rotations, cropping, brightness, and contrast adjustments to create augmented images for dataset generation. The you only look once algorithmic framework is employed with traditional rectangular bounding boxes for crop localization. The proposed method subsequently utilises the acquired image data via a visual geometry group model in order to reveal the grasping positions for the robotic manipulation.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2401.15785

Country:

North America > United States > Iowa (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine

Xue, Siqiao, Zhou, Fan, Xu, Yi, Jin, Ming, Wen, Qingsong, Hao, Hongyan, Dai, Qingyang, Jiang, Caigao, Zhao, Hongyu, Xie, Shuo, He, Jianshan, Zhang, James, Mei, Hongyuan

arXiv.org Artificial IntelligenceDec-2-2023

We present WeaverBird, an intelligent dialogue system designed specifically for the finance domain. Our system harnesses a large language model of GPT architecture that has been tuned using extensive corpora of finance-related text. As a result, our system possesses the capability to understand complex financial queries, such as "How should I manage my investments during inflation?", and provide informed responses. Furthermore, our system incorporates a local knowledge base and a search engine to retrieve relevant information. The final responses are conditioned on the search results and include proper citations to the sources, thus enjoying an enhanced credibility. Through a range of finance-related questions, we have demonstrated the superior performance of our system compared to other models. To experience our system firsthand, users can interact with our live demo at https://weaverbird.ttic.edu, as well as watch our 2-min video illustration at https://www.youtube.com/watch?v=fyV2qQkX6Tc.

large language model, machine learning, weaverbird, (15 more...)

arXiv.org Artificial Intelligence

2308.05361

Country:

Asia (0.14)
Europe (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Trading (1.00)
Information Technology (0.94)
Banking & Finance > Financial Services (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Explicit Planning Helps Language Models in Logical Reasoning

Zhao, Hongyu, Wang, Kangrui, Yu, Mo, Mei, Hongyuan

arXiv.org Artificial IntelligenceNov-7-2023

Language models have been shown to perform remarkably well on a wide range of natural language processing tasks. In this paper, we propose LEAP, a novel system that uses language models to perform multi-step logical reasoning and incorporates explicit planning into the inference procedure. Explicit planning enables the system to make more informed reasoning decisions at each step by looking ahead into their future effects. Moreover, we propose a training strategy that safeguards the planning process from being led astray by spurious features. Our full system significantly outperforms other competing methods on multiple standard datasets. When using small T5 models as its core selection and deduction components, our system performs competitively compared to GPT-3 despite having only about 1B parameters (i.e., 175 times smaller than GPT-3). When using GPT-3.5, it significantly outperforms chain-of-thought prompting on the challenging PrOntoQA dataset. We have conducted extensive empirical studies to demonstrate that explicit planning plays a crucial role in the system's performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.15714

Country: North America > United States (0.28)

Genre:

Workflow (0.66)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.75)
(4 more...)

Add feedback

MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data

Liu, Tianyu, Wang, Yuge, Ying, Rex, Zhao, Hongyu

arXiv.org Artificial IntelligenceSep-29-2023

Discovering genes with similar functions across diverse biomedical contexts poses a significant challenge in gene representation learning due to data heterogeneity. In this study, we resolve this problem by introducing a novel model called Multimodal Similarity Learning Graph Neural Network, which combines Multimodal Machine Learning and Deep Graph Neural Networks to learn gene representations from single-cell sequencing and spatial transcriptomic data. Leveraging 82 training datasets from 10 tissues, three sequencing techniques, and three species, we create informative graph structures for model training and gene representations generation, while incorporating regularization with weighted similarity learning and contrastive learning to learn cross-data gene-gene relationships. This novel design ensures that we can offer gene representations containing functional similarity across different contexts in a joint space. Comprehensive benchmarking analysis shows our model's capacity to effectively capture gene function similarity across multiple modalities, outperforming state-of-the-art methods in gene representation learning by up to 97.5%. Moreover, we employ bioinformatics tools in conjunction with gene representations to uncover pathway enrichment, regulation causal networks, and functions of disease-associated or dosage-sensitive genes. Therefore, our model efficiently produces unified gene representations for the analysis of gene functions, tissue functions, diseases, and species evolution.

artificial intelligence, learning unified gene representation, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2310.02275

Genre: Research Report > Promising Solution (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)

Add feedback

Robustness of Learning from Task Instructions

Gu, Jiasheng, Zhao, Hongyu, Xu, Hanzi, Nie, Liangyu, Mei, Hongyuan, Yin, Wenpeng

arXiv.org Artificial IntelligenceMay-23-2023

Traditional supervised learning mostly works on individual tasks and requires training on a large set of task-specific examples. This paradigm seriously hinders the development of task generalization since preparing a task-specific example set is costly. To build a system that can quickly and easily generalize to new tasks, task instructions have been adopted as an emerging trend of supervision recently. These instructions give the model the definition of the task and allow the model to output the appropriate answer based on the instructions and inputs. However, task instructions are often expressed in different forms, which can be interpreted from two threads: first, some instructions are short sentences and are pretrained language model (PLM) oriented, such as prompts, while other instructions are paragraphs and are human-oriented, such as those in Amazon MTurk; second, different end-users very likely explain the same task with instructions of different textual expressions. A robust system for task generalization should be able to handle any new tasks regardless of the variability of instructions. However, the system robustness in dealing with instruction-driven task generalization is still unexplored. This work investigates the system robustness when the instructions of new tasks are (i) manipulated, (ii) paraphrased, or (iii) from different levels of conciseness. To our knowledge, this is the first work that systematically studies how robust a PLM is when it is supervised by instructions with different factors of variability.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.03813

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters

Zhao, Hongyu, Tan, Hao, Mei, Hongyuan

arXiv.org Artificial IntelligenceOct-18-2022

Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters. Previously proposed adapter architectures are all feed-forward neural networks. In this paper, we investigate the effectiveness of using tiny-attention -- i.e., attention with extremely small per-head dimensionality -- as adapters. Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions, which is missed by the previously proposed adapters. Moreover, we view its multiple attention heads as a mixture of experts and propose to average their weights during deployment, which further reduces its inference computation cost. On the GLUE benchmark, our tiny-attention adapter outperforms the other parameter-efficient transfer learning methods as well as full fine-tuning while only updating 0.05% of the parameters. On the FewGLUE benchmark, its performance is comparable to that of GPT-3 and PET.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.01979

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Manifold Proximal Linear Method for Sparse Spectral Clustering with Application to Single-Cell RNA Sequencing Data Analysis

Wang, Zhongruo, Liu, Bingyuan, Chen, Shixiang, Ma, Shiqian, Xue, Lingzhou, Zhao, Hongyu

arXiv.org Machine LearningOct-30-2020

Spectral clustering is one of the fundamental unsupervised learning methods widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering and it improves the interpretability of the model. This paper considers a widely adopted model for SSC, which can be formulated as an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth part using certain smoothing techniques. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation. We also extend the algorithm to solve the multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via the single-cell RNA sequencing data analysis.

algorithm, health & medicine, optimization problem, (13 more...)

arXiv.org Machine Learning

2007.09524

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback