AITopics | Zhao, Bowen

Collaborating Authors

Zhao, Bowen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models

Ding, Jiexin, Zhao, Bowen, Wang, Yuntao, Liu, Xinyun, Hao, Rui, Chatterjee, Ishan, Shi, Yuanchun

arXiv.org Artificial IntelligenceFeb-14-2025

English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.

information retrieval, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.10378

Country:

North America > United States (1.00)
Europe (0.92)
Asia > Japan > Honshū (0.47)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Education > Focused Education > Reading & Literacy > English As A Second Language (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart

Zhao, Bowen, Cheng, Tianhao, Zhang, Yuejie, Cheng, Ying, Feng, Rui, Zhang, Xiaobo

arXiv.org Artificial IntelligenceOct-28-2024

Multimodal Question Answering (MMQA) is crucial as it enables comprehensive understanding and accurate responses by integrating insights from diverse data representations such as tables, charts, and text. Most existing researches in MMQA only focus on two modalities such as image-text QA, table-text QA and chart-text QA, and there remains a notable scarcity in studies that investigate the joint analysis of text, tables, and charts. In this paper, we present C$\text{T}^2$C-QA, a pioneering Chinese reasoning-based QA dataset that includes an extensive collection of text, tables, and charts, meticulously compiled from 200 selectively sourced webpages. Our dataset simulates real webpages and serves as a great test for the capability of the model to analyze and reason with multimodal data, because the answer to a question could appear in various modalities, or even potentially not exist at all. Additionally, we present AED (\textbf{A}llocating, \textbf{E}xpert and \textbf{D}esicion), a multi-agent system implemented through collaborative deployment, information interaction, and collective decision-making among different agents. Specifically, the Assignment Agent is in charge of selecting and activating expert agents, including those proficient in text, tables, and charts. The Decision Agent bears the responsibility of delivering the final verdict, drawing upon the analytical insights provided by these expert agents. We execute a comprehensive analysis, comparing AED with various state-of-the-art models in MMQA, including GPT-4. The experimental outcomes demonstrate that current methodologies, including GPT-4, are yet to meet the benchmarks set by our dataset.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3664647.3681053

2410.21414

Country: Oceania > Australia (0.15)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?

Zhao, Bowen, Dirac, Leo Parker, Varshavskaya, Paulina

arXiv.org Artificial IntelligenceSep-25-2024

Large vision-language models (VLMs) have become state-of-the-art for many computer vision tasks, with in-context learning (ICL) as a popular adaptation strategy for new ones. But can VLMs learn novel concepts purely from visual demonstrations, or are they limited to adapting to the output format of ICL examples? We propose a new benchmark we call Spatial Visual Ambiguity Tasks (SVAT) that challenges state-of-the-art VLMs to learn new visuospatial tasks in-context. We find that VLMs fail to do this zero-shot, and sometimes continue to fail after finetuning. However, adding simpler data to the training by curriculum learning leads to improved ICL performance.

large language model, natural language, vlm, (14 more...)

arXiv.org Artificial Intelligence

2409.1708

Country:

North America > Canada > Quebec (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.53)

Add feedback

Set the Clock: Temporal Alignment of Pretrained Language Models

Zhao, Bowen, Brumbaugh, Zander, Wang, Yizhong, Hajishirzi, Hannaneh, Smith, Noah A.

arXiv.org Artificial IntelligenceJun-9-2024

Language models (LMs) are trained on web text originating from many points in time and, in general, without any explicit temporal grounding. This work investigates the temporal chaos of pretrained LMs and explores various methods to align their internal knowledge to a target time, which we call "temporal alignment." To do this, we first automatically construct a dataset containing 20K time-sensitive questions and their answers for each year from 2000 to 2023. Based on this dataset, we empirically show that pretrained LMs (e.g., LLaMa2), despite having a recent pretraining cutoff (e.g., 2022), mostly answer questions using earlier knowledge (e.g., in 2019). We then develop several methods, from prompting to finetuning, to align LMs to use their most recent knowledge when answering questions, and investigate various factors in this alignment. Our experiments demonstrate that aligning LLaMa2 to the year 2022 can enhance its performance by up to 62% according to that year's answers. This improvement occurs even without explicitly mentioning time information, indicating the possibility of aligning models' internal sense of time after pretraining. Finally, we find that alignment to a historical time is also possible, with up to 2.8$\times$ the performance of the unaligned LM in 2010 if finetuning models to that year. These findings hint at the sophistication of LMs' internal knowledge organization and the necessity of tuning them properly.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.16797

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.81)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Basketball (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

Zhao, Bowen, Hajishirzi, Hannaneh, Cao, Qingqing

arXiv.org Artificial IntelligenceJan-22-2024

Fine-tuning and inference with large Language Models (LM) are generally known to be expensive. Parameter-efficient fine-tuning over pretrained LMs reduces training memory by updating a small number of LM parameters but does not improve inference efficiency. Structured pruning improves LM inference efficiency by removing consistent parameter blocks, yet often increases training memory and time. To improve both training and inference efficiency, we introduce APT that adaptively prunes and tunes parameters for the LMs. At the early stage of fine-tuning, APT dynamically adds salient tuning parameters for fast and accurate convergence while discarding unimportant parameters for efficiency. Compared to baselines, our experiments show that APT maintains up to 98% task performance when pruning RoBERTa and T5 models with 40% parameters left while keeping 86.4% LLaMA models' performance with 70% parameters remained. Furthermore, APT speeds up LMs fine-tuning by up to 8x and reduces large LMs memory training footprint by up to 70%.

large language model, machine learning, pruning, (19 more...)

arXiv.org Artificial Intelligence

2401.122

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Controller-Guided Partial Label Consistency Regularization with Unlabeled Data

Wang, Qian-Wei, Zhao, Bowen, Zhu, Mingyan, Li, Tianxiang, Liu, Zimo, Xia, Shu-Tao

arXiv.org Artificial IntelligenceDec-14-2023

Partial label learning (PLL) learns from training examples each associated with multiple candidate labels, among which only one is valid. In recent years, benefiting from the strong capability of dealing with ambiguous supervision and the impetus of modern data augmentation methods, consistency regularization-based PLL methods have achieved a series of successes and become mainstream. However, as the partial annotation becomes insufficient, their performances drop significantly. In this paper, we leverage easily accessible unlabeled examples to facilitate the partial label consistency regularization. In addition to a partial supervised loss, our method performs a controller-guided consistency regularization at both the label-level and representation-level with the help of unlabeled data. To minimize the disadvantages of insufficient capabilities of the initial supervised model, we use the controller to estimate the confidence of each current prediction to guide the subsequent consistency regularization. Furthermore, we dynamically adjust the confidence thresholds so that the number of samples of each class participating in consistency regularization remains roughly equal to alleviate the problem of class-imbalance. Experiments show that our method achieves satisfactory performances in more practical situations, and its modules can be applied to existing PLL methods to enhance their capabilities.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2210.11194

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Large Language Models are Complex Table Parsers

Zhao, Bowen, Ji, Changkai, Zhang, Yuejie, He, Wen, Wang, Yingwen, Wang, Qing, Feng, Rui, Zhang, Xiaobo

arXiv.org Artificial IntelligenceDec-12-2023

With the Generative Pre-trained Transformer 3.5 (GPT-3.5) exhibiting remarkable reasoning and comprehension abilities in Natural Language Processing (NLP), most Question Answering (QA) research has primarily centered around general QA tasks based on GPT, neglecting the specific challenges posed by Complex Table QA. In this paper, we propose to incorporate GPT-3.5 to address such challenges, in which complex tables are reconstructed into tuples and specific prompt designs are employed for dialogues. Specifically, we encode each cell's hierarchical structure, position information, and content as a tuple. By enhancing the prompt template with an explanatory description of the meaning of each tuple and the logical reasoning process of the task, we effectively improve the hierarchical structure awareness capability of GPT-3.5 to better parse the complex tables. Extensive experiments and results on Complex Table QA datasets, i.e., the open-domain dataset HiTAB and the aviation domain dataset AIT-QA show that our approach significantly outperforms previous work on both datasets, leading to state-of-the-art (SOTA) performance.

header, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2312.11521

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Health Care Providers & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When Evolutionary Computation Meets Privacy

Zhao, Bowen, Chen, Wei-Neng, Li, Xiaoguo, Liu, Ximeng, Pei, Qingqi, Zhang, Jun

arXiv.org Artificial IntelligenceMar-22-2023

Recently, evolutionary computation (EC) has been promoted by machine learning, distributed computing, and big data technologies, resulting in new research directions of EC like distributed EC and surrogate-assisted EC. These advances have significantly improved the performance and the application scope of EC, but also trigger privacy leakages, such as the leakage of optimal results and surrogate model. Accordingly, evolutionary computation combined with privacy protection is becoming an emerging topic. However, privacy concerns in evolutionary computation lack a systematic exploration, especially for the object, motivation, position, and method of privacy protection. To this end, in this paper, we discuss three typical optimization paradigms (i.e., \textit{centralized optimization, distributed optimization, and data-driven optimization}) to characterize optimization modes of evolutionary computation and propose BOOM to sort out privacy concerns in evolutionary computation. Specifically, the centralized optimization paradigm allows clients to outsource optimization problems to the centralized server and obtain optimization solutions from the server. While the distributed optimization paradigm exploits the storage and computational power of distributed devices to solve optimization problems. Also, the data-driven optimization paradigm utilizes data collected in history to tackle optimization problems lacking explicit objective functions. Particularly, this paper adopts BOOM to characterize the object and motivation of privacy protection in three typical optimization paradigms and discusses potential privacy-preserving technologies balancing optimization performance and privacy guarantees in three typical optimization paradigms. Furthermore, this paper attempts to foresee some new research directions of privacy-preserving evolutionary computation.

data mining, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.01205

Country: Asia > China (0.69)

Genre: Research Report (0.65)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

GazeReader: Detecting Unknown Word Using Webcam for English as a Second Language (ESL) Learners

Ding, Jiexin, Zhao, Bowen, Huang, Yuqi, Wang, Yuntao, Shi, Yuanchun

arXiv.org Artificial IntelligenceMar-18-2023

Automatic unknown word detection techniques can enable new applications for assisting English as a Second Language (ESL) learners, thus improving their reading experiences. However, most modern unknown word detection methods require dedicated eye-tracking devices with high precision that are not easily accessible to end-users. In this work, we propose GazeReader, an unknown word detection method only using a webcam. GazeReader tracks the learner's gaze and then applies a transformer-based machine learning model that encodes the text information to locate the unknown word. We applied knowledge enhancement including term frequency, part of speech, and named entity recognition to improve the performance. The user study indicates that the accuracy and F1-score of our method were 98.09% and 75.73%, respectively. Lastly, we explored the design scope for ESL reading and discussed the findings.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.10443

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.72)

Genre: Research Report > New Finding (0.68)

Industry: Education > Focused Education > Reading & Literacy > English As A Second Language (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

A Survey of Secure Computation Using Trusted Execution Environments

Li, Xiaoguo, Zhao, Bowen, Yang, Guomin, Xiang, Tao, Weng, Jian, Deng, Robert H.

arXiv.org Artificial IntelligenceFeb-23-2023

As an essential technology underpinning trusted computing, the trusted execution environment (TEE) allows one to launch computation tasks on both on- and off-premises data while assuring confidentiality and integrity. This article provides a systematic review and comparison of TEE-based secure computation protocols. We first propose a taxonomy that classifies secure computation protocols into three major categories, namely secure outsourced computation, secure distributed computation and secure multi-party computation. To enable a fair comparison of these protocols, we also present comprehensive assessment criteria with respect to four aspects: setting, methodology, security and performance. Based on these criteria, we review, discuss and compare the state-of-the-art TEE-based secure computation protocols for both general-purpose computation functions and special-purpose ones, such as privacy-preserving machine learning and encrypted database queries. To the best of our knowledge, this article is the first survey to review TEE-based secure computation protocols and the comprehensive comparison can serve as a guideline for selecting suitable protocols for deployment in practice. Finally, we also discuss several future research directions and challenges.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2302.1215

Country:

Asia > China (0.28)
North America > United States (0.27)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Databases (1.00)
Information Technology > Cloud Computing (1.00)
(7 more...)

Add feedback