AITopics | Wu, Jason

Collaborating Authors

Wu, Jason

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources

Wu, Jason, Yang, Kang, Kaplan, Lance, Srivastava, Mani

arXiv.org Artificial IntelligenceFeb-11-2025

Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource availability (due to multi-tenancy, device heterogeneity, etc.) and fluctuating quality of inputs (from sensor feed corruption, environmental noise, etc.). Current multimodal systems employ static resource provisioning and cannot easily adapt when compute resources change over time. Additionally, their reliance on processing sensor data with fixed feature extractors is ill-equipped to handle variations in modality quality. Consequently, uninformative modalities, such as those with high noise, needlessly consume resources better allocated towards other modalities. We propose ADMN, a layer-wise Adaptive Depth Multimodal Network capable of tackling both challenges - it adjusts the total number of active layers across all modalities to meet compute resource constraints, and continually reallocates layers across input modalities according to their modality quality. Our evaluations showcase ADMN can match the accuracy of state-of-the-art networks while reducing up to 75% of their floating-point operations.

artificial intelligence, machine learning, modality, (14 more...)

arXiv.org Artificial Intelligence

2502.07862

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Government > Military (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT

Ouyang, Xiaomin, Wu, Jason, Kimura, Tomoyoshi, Lin, Yihan, Verma, Gunjan, Abdelzaher, Tarek, Srivastava, Mani

arXiv.org Artificial IntelligenceNov-18-2024

Multimodal sensing systems are increasingly prevalent in various real-world applications. Most existing multimodal learning approaches heavily rely on training with a large amount of complete multimodal data. However, such a setting is impractical in real-world IoT sensing applications where data is typically collected by distributed nodes with heterogeneous data modalities, and is also rarely labeled. In this paper, we propose MMBind, a new framework for multimodal learning on distributed and heterogeneous IoT data. The key idea of MMBind is to construct a pseudo-paired multimodal dataset for model training by binding data from disparate sources and incomplete modalities through a sufficiently descriptive shared modality. We demonstrate that data of different modalities observing similar events, even captured at different times and locations, can be effectively used for multimodal training. Moreover, we propose an adaptive multimodal learning architecture capable of training models with heterogeneous modality combinations, coupled with a weighted contrastive learning approach to handle domain shifts among disparate data. Evaluations on ten real-world multimodal datasets highlight that MMBind outperforms state-of-the-art baselines under varying data incompleteness and domain shift, and holds promise for advancing multimodal foundation model training in IoT applications.

artificial intelligence, machine learning, modality, (16 more...)

arXiv.org Artificial Intelligence

2411.12126

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.93)
Information Technology (0.68)
Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Peng, Yi-Hao, Huq, Faria, Jiang, Yue, Wu, Jason, Li, Amanda Xin Yue, Bigham, Jeffrey, Pavel, Amy

arXiv.org Artificial IntelligenceSep-30-2024

Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.00201

Country:

North America > United States > Texas (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.69)
Education (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Murthy, Rithesh, Yang, Liangwei, Tan, Juntao, Awalgaonkar, Tulika Manoj, Zhou, Yilun, Heinecke, Shelby, Desai, Sachin, Wu, Jason, Xu, Ran, Tan, Sarah, Zhang, Jianguo, Liu, Zhiwei, Kokane, Shirley, Liu, Zuxin, Zhu, Ming, Wang, Huan, Xiong, Caiming, Savarese, Silvio

arXiv.org Artificial IntelligenceJun-12-2024

The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.1029

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

Wu, Jason, Schoop, Eldon, Leung, Alan, Barik, Titus, Bigham, Jeffrey P., Nichols, Jeffrey

arXiv.org Artificial IntelligenceJun-11-2024

Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In this paper, we explore the use of automated feedback (compilers and multi-modal models) to guide LLMs to generate high-quality UI code. Our method starts with an existing LLM and iteratively produces improved models by self-generating a large synthetic dataset using an original model, applying automated tools to aggressively filter, score, and de-duplicate the data into a refined higher quality dataset. The original LLM is improved by finetuning on this refined dataset. We applied our approach to several open-source LLMs and compared the resulting performance to baseline models with both automated metrics and human preferences. Our evaluation shows the resulting models outperform all other downloadable baselines and approach the performance of larger proprietary models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.07739

Country: North America > Canada (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

FlexLoc: Conditional Neural Networks for Zero-Shot Sensor Perspective Invariance in Object Localization with Distributed Multimodal Sensors

Wu, Jason, Wang, Ziqi, Ouyang, Xiaomin, Jeong, Ho Lyun, Samplawski, Colin, Kaplan, Lance, Marlin, Benjamin, Srivastava, Mani

arXiv.org Artificial IntelligenceJun-10-2024

Localization is a critical technology for various applications ranging from navigation and surveillance to assisted living. Localization systems typically fuse information from sensors viewing the scene from different perspectives to estimate the target location while also employing multiple modalities for enhanced robustness and accuracy. Recently, such systems have employed end-to-end deep neural models trained on large datasets due to their superior performance and ability to handle data from diverse sensor modalities. However, such neural models are often trained on data collected from a particular set of sensor poses (i.e., locations and orientations). During real-world deployments, slight deviations from these sensor poses can result in extreme inaccuracies. To address this challenge, we introduce FlexLoc, which employs conditional neural networks to inject node perspective information to adapt the localization pipeline. Specifically, a small subset of model weights are derived from node poses at run time, enabling accurate generalization to unseen perspectives with minimal additional overhead. Our evaluations on a multimodal, multiview indoor tracking dataset showcase that FlexLoc improves the localization accuracy by almost 50% in the zero-shot case (no calibration data available) compared to the baselines. The source code of FlexLoc is available at https://github.com/nesl/FlexLoc.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.06796

Country:

North America > United States > California (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages

Aars, Corinne, Adams, Lauren, Tian, Xiaokan, Wang, Zhaoyu, Wismer, Colton, Wu, Jason, Rivas, Pablo, Sooksatra, Korn, Fendt, Matthew

arXiv.org Artificial IntelligenceMay-30-2024

This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented with sample translations, suggest the model can improve accessibility to sacred texts. It effectively handles the distinctive biblical lexicon and structure, thus bridging the linguistic divide. The study also discusses the model's limitations and suggests pathways for future enhancements, focusing on expanding access to sacred literature across linguistic boundaries.

artificial intelligence, natural language, translation, (14 more...)

arXiv.org Artificial Intelligence

2405.1335

Country:

Europe > France (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)

Add feedback

UIClip: A Data-driven Model for Assessing User Interface Design

Wu, Jason, Peng, Yi-Hao, Li, Amanda, Swearngin, Amanda, Bigham, Jeffrey P., Nichols, Jeffrey

arXiv.org Artificial IntelligenceApr-18-2024

User interface (UI) design is a difficult yet important task for ensuring the usability, accessibility, and aesthetic qualities of applications. In our paper, we develop a machine-learned model, UIClip, for assessing the design quality and visual relevance of a UI given its screenshot and natural language description. To train UIClip, we used a combination of automated crawling, synthetic augmentation, and human ratings to construct a large-scale dataset of UIs, collated by description and ranked by design quality. Through training on the dataset, UIClip implicitly learns properties of good and bad designs by i) assigning a numerical score that represents a UI design's relevance and quality and ii) providing design suggestions. In an evaluation that compared the outputs of UIClip and other baselines to UIs rated by 12 human designers, we found that UIClip achieved the highest agreement with ground-truth rankings. Finally, we present three example applications that demonstrate how UIClip can facilitate downstream applications that rely on instantaneous assessment of UI design quality: i) UI code generation, ii) UI design tips generation, and iii) quality-aware UI example search.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2404.125

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

Liu, Zhiwei, Yao, Weiran, Zhang, Jianguo, Yang, Liangwei, Liu, Zuxin, Tan, Juntao, Choubey, Prafulla K., Lan, Tian, Wu, Jason, Wang, Huan, Heinecke, Shelby, Xiong, Caiming, Savarese, Silvio

arXiv.org Artificial IntelligenceFeb-23-2024

The booming success of LLMs initiates rapid development in LLM agents. Though the foundation of an LLM agent is the generative model, it is critical to devise the optimal reasoning strategies and agent architectures. Accordingly, LLM agent research advances from the simple chain-of-thought prompting to more complex ReAct and Reflection reasoning strategy; agent architecture also evolves from single agent generation to multi-agent conversation, as well as multi-LLM multi-agent group chat. However, with the existing intricate frameworks and libraries, creating and evaluating new reasoning strategies and agent architectures has become a complex challenge, which hinders research investigation into LLM agents. Thus, we open-source a new AI agent library, AgentLite, which simplifies this process by offering a lightweight, user-friendly platform for innovating LLM agent reasoning, architectures, and applications with ease. AgentLite is a task-oriented framework designed to enhance the ability of agents to break down tasks and facilitate the development of multi-agent systems. Furthermore, we introduce multiple practical applications developed with AgentLite to demonstrate its convenience and flexibility. Get started now at: \url{https://github.com/SalesforceAIResearch/AgentLite}.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.15538

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Chess (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

GDTM: An Indoor Geospatial Tracking Dataset with Distributed Multimodal Sensors

Jeong, Ho Lyun, Wang, Ziqi, Samplawski, Colin, Wu, Jason, Fang, Shiwei, Kaplan, Lance M., Ganesan, Deepak, Marlin, Benjamin, Srivastava, Mani

arXiv.org Artificial IntelligenceFeb-21-2024

Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for multimodal object tracking with distributed multimodal sensors and reconfigurable sensor node placements. Our dataset enables the exploration of several research problems, such as optimizing architectures for processing multimodal data, and investigating models' robustness to adverse sensing conditions and sensor placement variances. A GitHub repository containing the code, sample data, and checkpoints of this work is available at https://github.com/nesl/GDTM.

artificial intelligence, information fusion, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.14136

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.16)
North America > United States > Massachusetts > Hampshire County > Amherst (0.15)

Genre: Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Sports > Motorsports (0.46)
Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Communications > Networks (1.00)
(4 more...)

Add feedback