AITopics | Wang, Shiyu

Collaborating Authors

Wang, Shiyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Leveraging sinusoidal representation networks to predict fMRI signals from EEG

Li, Yamin, Lou, Ange, Xu, Ziyuan, Wang, Shiyu, Chang, Catie

arXiv.org Artificial IntelligenceJan-24-2024

In modern neuroscience, functional magnetic resonance imaging (fMRI) has been a crucial and irreplaceable tool that provides a non-invasive window into the dynamics of whole-brain activity. Nevertheless, fMRI is limited by hemodynamic blurring as well as high cost, immobility, and incompatibility with metal implants. Electroencephalography (EEG) is complementary to fMRI and can directly record the cortical electrical activity at high temporal resolution, but has more limited spatial resolution and is unable to recover information about deep subcortical brain structures. The ability to obtain fMRI information from EEG would enable cost-effective, imaging across a wider set of brain regions. Further, beyond augmenting the capabilities of EEG, cross-modality models would facilitate the interpretation of fMRI signals. However, as both EEG and fMRI are high-dimensional and prone to artifacts, it is currently challenging to model fMRI from EEG. To address this challenge, we propose a novel architecture that can predict fMRI signals directly from multi-channel EEG without explicit feature engineering. Our model achieves this by implementing a Sinusoidal Representation Network (SIREN) to learn frequency information in brain dynamics from EEG, which serves as the input to a subsequent encoder-decoder to effectively reconstruct the fMRI signal from a specific brain region. We evaluate our model using a simultaneous EEG-fMRI dataset with 8 subjects and investigate its potential for predicting subcortical fMRI signals. The present results reveal that our model outperforms a recent state-of-the-art model, and indicates the potential of leveraging periodic activation functions in deep neural networks to model functional neuroimaging data.

artificial intelligence, fmri signal, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2311.04234

Country: North America > United States (0.16)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

DeepSeek-AI, null, :, null, Bi, Xiao, Chen, Deli, Chen, Guanting, Chen, Shanhuang, Dai, Damai, Deng, Chengqi, Ding, Honghui, Dong, Kai, Du, Qiushi, Fu, Zhe, Gao, Huazuo, Gao, Kaige, Gao, Wenjun, Ge, Ruiqi, Guan, Kang, Guo, Daya, Guo, Jianzhong, Hao, Guangbo, Hao, Zhewen, He, Ying, Hu, Wenjie, Huang, Panpan, Li, Erhang, Li, Guowei, Li, Jiashi, Li, Yao, Li, Y. K., Liang, Wenfeng, Lin, Fangyun, Liu, A. X., Liu, Bo, Liu, Wen, Liu, Xiaodong, Liu, Xin, Liu, Yiyuan, Lu, Haoyu, Lu, Shanghao, Luo, Fuli, Ma, Shirong, Nie, Xiaotao, Pei, Tian, Piao, Yishi, Qiu, Junjie, Qu, Hui, Ren, Tongzheng, Ren, Zehui, Ruan, Chong, Sha, Zhangli, Shao, Zhihong, Song, Junxiao, Su, Xuecheng, Sun, Jingxiang, Sun, Yaofeng, Tang, Minghui, Wang, Bingxuan, Wang, Peiyi, Wang, Shiyu, Wang, Yaohui, Wang, Yongji, Wu, Tong, Wu, Y., Xie, Xin, Xie, Zhenda, Xie, Ziwei, Xiong, Yiliang, Xu, Hanwei, Xu, R. X., Xu, Yanhong, Yang, Dejian, You, Yuxiang, Yu, Shuiping, Yu, Xingkai, Zhang, B., Zhang, Haowei, Zhang, Lecong, Zhang, Liyue, Zhang, Mingchuan, Zhang, Minghua, Zhang, Wentao, Zhang, Yichao, Zhao, Chenggang, Zhao, Yao, Zhou, Shangyan, Zhou, Shunfeng, Zhu, Qihao, Zou, Yuheng

arXiv.org Artificial IntelligenceJan-5-2024

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2401.02954

Country:

Europe (1.00)
North America > United States > New York (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Law (1.00)
Education (1.00)
Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

Bai, Guangji, Chai, Zheng, Ling, Chen, Wang, Shiyu, Lu, Jiaying, Zhang, Nan, Shi, Tingwei, Yu, Ziyang, Zhu, Mengdan, Zhang, Yifei, Yang, Carl, Cheng, Yue, Zhao, Liang

arXiv.org Artificial IntelligenceJan-3-2024

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs. We categorize methods based on their optimization focus: computational, memory, energy, financial, and network resources and their applicability across various stages of an LLM's lifecycle, including architecture design, pretraining, finetuning, and system design. Additionally, the survey introduces a nuanced categorization of resource efficiency techniques by their specific resource types, which uncovers the intricate relationships and mappings between various resources and corresponding optimization techniques. A standardized set of evaluation metrics and datasets is also presented to facilitate consistent and fair comparisons across different models and techniques. By offering a comprehensive overview of the current sota and identifying open research avenues, this survey serves as a foundational reference for researchers and practitioners, aiding them in developing more sustainable and efficient LLMs in a rapidly evolving landscape.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2401.00625

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Energy (0.93)
Health & Medicine (0.92)
Education (0.92)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Intelligent Virtual Assistants with LLM-based Process Automation

Guan, Yanchu, Wang, Dong, Chu, Zhixuan, Wang, Shiyu, Ni, Feiyue, Song, Ruihua, Li, Longfei, Gu, Jinjie, Zhuang, Chenyi

arXiv.org Artificial IntelligenceDec-4-2023

While intelligent virtual assistants like Siri, Alexa, and Google Assistant have become ubiquitous in modern life, they still face limitations in their ability to follow multi-step instructions and accomplish complex goals articulated in natural language. However, recent breakthroughs in large language models (LLMs) show promise for overcoming existing barriers by enhancing natural language processing and reasoning capabilities. Though promising, applying LLMs to create more advanced virtual assistants still faces challenges like ensuring robust performance and handling variability in real-world user commands. This paper proposes a novel LLM-based virtual assistant that can automatically perform multi-step operations within mobile apps based on high-level user requests. The system represents an advance in assistants by providing an end-to-end solution for parsing instructions, reasoning about goals, and executing actions. LLM-based Process Automation (LLMPA) has modules for decomposing instructions, generating descriptions, detecting interface elements, predicting next actions, and error checking. Experiments demonstrate the system completing complex mobile operation tasks in Alipay based on natural language instructions. This showcases how large language models can enable automated assistants to accomplish real-world tasks. The main contributions are the novel LLMPA architecture optimized for app process automation, the methodology for applying LLMs to mobile apps, and demonstrations of multi-step task completion in a real-world environment. Notably, this work represents the first real-world deployment and extensive evaluation of a large language model-based virtual assistant in a widely used mobile application with an enormous user base numbering in the hundreds of millions.

instruction chain, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2312.06677

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre:

Research Report (0.82)
Workflow (0.69)
Overview (0.68)

Industry: Banking & Finance (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

Liu, Yong, Hu, Tengge, Zhang, Haoran, Wu, Haixu, Wang, Shiyu, Ma, Lintao, Long, Mingsheng

arXiv.org Artificial IntelligenceDec-1-2023

The recent boom of linear forecasting models questions the ongoing passion for architectural modifications of Transformer-based forecasters. These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp. However, Transformers are challenged in forecasting series with larger lookback windows due to performance degradation and computation explosion. Besides, the embedding for each temporal token fuses multiple variates that represent potential delayed events and distinct physical measurements, which may fail in learning variate-centric representations and result in meaningless attention maps. In this work, we reflect on the competent duties of Transformer components and repurpose the Transformer architecture without any modification to the basic components. We propose iTransformer that simply applies the attention and feed-forward network on the inverted dimensions. Specifically, the time points of individual series are embedded into variate tokens which are utilized by the attention mechanism to capture multivariate correlations; meanwhile, the feed-forward network is applied for each variate token to learn nonlinear representations. The iTransformer model achieves state-of-the-art on challenging real-world datasets, which further empowers the Transformer family with promoted performance, generalization ability across different variates, and better utilization of arbitrary lookback windows, making it a nice alternative as the fundamental backbone of time series forecasting.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.06625

Country:

Asia > China (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TpopT: Efficient Trainable Template Optimization on Low-Dimensional Manifolds

Yan, Jingkai, Wang, Shiyu, Wei, Xinyu Rain, Wang, Jimmy, Márka, Zsuzsanna, Márka, Szabolcs, Wright, John

arXiv.org Artificial IntelligenceOct-15-2023

In scientific and engineering scenarios, a recurring task is the detection of low-dimensional families of signals or patterns. A classic family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank. While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality. In this work, we study TpopT (TemPlate OPTimization) as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability. We provide a theoretical analysis of the convergence of Riemannian gradient descent for TpopT, and prove that it has a superior dimension scaling to covering. We also propose a practical TpopT framework for nonparametric signal sets, which incorporates techniques of embedding and kernel interpolation, and is further configurable into a trainable network architecture by unrolled optimization. The proposed trainable TpopT exhibits significantly improved efficiency-accuracy tradeoffs for gravitational wave detection, where matched filtering is currently a method of choice. We further illustrate the general applicability of this approach with experiments on handwritten digit data.

artificial intelligence, machine learning, tpopt, (18 more...)

arXiv.org Artificial Intelligence

2310.10039

Country:

Europe > United Kingdom (0.14)
North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Controllable Data Generation Via Iterative Data-Property Mutual Mappings

Pan, Bo, Qin, Muran, Wang, Shiyu, Zhang, Yifei, Zhao, Liang

arXiv.org Artificial IntelligenceOct-11-2023

Deep generative models have been widely used for their ability to generate realistic data samples in various areas, such as images, molecules, text, and speech. One major goal of data generation is controllability, namely to generate new data with desired properties. Despite growing interest in the area of controllable generation, significant challenges still remain, including 1) disentangling desired properties with unrelated latent variables, 2) out-of-distribution property control, and 3) objective optimization for out-of-distribution property control. To address these challenges, in this paper, we propose a general framework to enhance VAE-based data generators with property controllability and ensure disentanglement. Our proposed objective can be optimized on both data seen and unseen in the training set. We propose a training procedure to train the objective in a semi-supervised manner by iteratively conducting mutual mappings between the data and properties. The proposed framework is implemented on four VAE-based controllable generators to evaluate its performance on property error, disentanglement, generation quality, and training time. The results indicate that our proposed framework enables more precise control over the properties of generated samples in a short training time, ensuring the disentanglement and keeping the validity of the generated samples.

artificial intelligence, iterative data-property mutual mapping, machine learning

arXiv.org Artificial Intelligence

2310.07683

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Survey on Knowledge Graphs for Healthcare: Resources, Applications, and Promises

Cui, Hejie, Lu, Jiaying, Wang, Shiyu, Xu, Ran, Ma, Wenjing, Yu, Shaojun, Yu, Yue, Kan, Xuan, Ling, Chen, Zhao, Liang, Ho, Joyce, Wang, Fei, Yang, Carl

arXiv.org Artificial IntelligenceAug-25-2023

Healthcare knowledge graphs (HKGs) have emerged as a promising tool for organizing medical knowledge in a structured and interpretable way, which provides a comprehensive view of medical concepts and their relationships. However, challenges such as data heterogeneity and limited coverage remain, emphasizing the need for further research in the field of HKGs. This survey paper serves as the first comprehensive overview of HKGs. We summarize the pipeline and key techniques for HKG construction (i.e., from scratch and through integration), as well as the common utilization approaches (i.e., model-free and model-based). To provide researchers with valuable resources, we organize existing HKGs (The resource is available at https://github.com/lujiaying/Awesome-HealthCare-KnowledgeBase) based on the data types they capture and application domains, supplemented with pertinent statistical information. In the application section, we delve into the transformative impact of HKGs across various healthcare domains, spanning from fine-grained basic science research to high-level clinical decision support. Lastly, we shed light on the opportunities for creating comprehensive and accurate HKGs in the era of large language models, presenting the potential to revolutionize healthcare delivery and enhance the interpretability and reliability of clinical prediction.

knowledge graph, natural language, survey article, (3 more...)

arXiv.org Artificial Intelligence

2306.04802

Genre:

Research Report (0.69)
Overview (0.53)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)
Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

End-to-End Modeling Hierarchical Time Series Using Autoregressive Transformer and Conditional Normalizing Flow based Reconciliation

Wang, Shiyu, Zhou, Fan, Sun, Yinbo, Ma, Lintao, Zhang, James, Zheng, Yangfei, Zheng, Bo, Lei, Lei, Hu, Yun

arXiv.org Artificial IntelligenceJun-2-2023

Multivariate time series forecasting with hierarchical structure is pervasive in real-world applications, demanding not only predicting each level of the hierarchy, but also reconciling all forecasts to ensure coherency, i.e., the forecasts should satisfy the hierarchical aggregation constraints. Moreover, the disparities of statistical characteristics between levels can be huge, worsened by non-Gaussian distributions and non-linear correlations. To this extent, we propose a novel end-to-end hierarchical time series forecasting model, based on conditioned normalizing flow-based autoregressive transformer reconciliation, to represent complex data distribution while simultaneously reconciling the forecasts to ensure coherency. Unlike other state-of-the-art methods, we achieve the forecasting and reconciliation simultaneously without requiring any explicit post-processing step. In addition, by harnessing the power of deep model, we do not rely on any assumption such as unbiased estimates or Gaussian distribution. Our evaluation experiments are conducted on four real-world hierarchical datasets from different industrial domains (three public ones and a dataset from the application servers of Alipay's data center) and the preliminary results demonstrate efficacy of our proposed method.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2212.13706

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry:

Consumer Products & Services > Travel (0.48)
Information Technology > Services (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Domain Generalization Deep Graph Transformation

Wang, Shiyu, Bai, Guangji, Zhu, Qingyang, Qin, Zhaohui, Zhao, Liang

arXiv.org Artificial IntelligenceMay-23-2023

Graph transformation that predicts graph transition from one mode to another is an important and common problem. Despite much progress in developing advanced graph transformation techniques in recent years, the fundamental assumption typically required in machine-learning models that the testing and training data preserve the same distribution does not always hold. As a result, domain generalization graph transformation that predicts graphs not available in the training data is under-explored, with multiple key challenges to be addressed including (1) the extreme space complexity when training on all input-output mode combinations, (2) difference of graph topologies between the input and the output modes, and (3) how to generalize the model to (unseen) target domains that are not in the training data. To fill the gap, we propose a multi-input, multi-output, hypernetwork-based graph neural network (MultiHyperGNN) that employs a encoder and a decoder to encode topologies of both input and output modes and semi-supervised link prediction to enhance the graph transformation task. Instead of training on all mode combinations, MultiHyperGNN preserves a constant space complexity with the encoder and the decoder produced by two novel hypernetworks. Comprehensive experiments show that MultiHyperGNN has a superior performance than competing models in both prediction and domain generalization tasks.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.11389

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback