AITopics | Jiang, Fan

Collaborating Authors

Jiang, Fan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analysis of Learning-based Offshore Wind Power Prediction Models with Various Feature Combinations

Fang, Linhan, Jiang, Fan, Toms, Ann Mary, Li, Xingpeng

arXiv.org Artificial IntelligenceMar-10-2025

Accurate wind speed prediction is crucial for designing and selecting sites for offshore wind farms. This paper investigates the effectiveness of various machine learning models in predicting offshore wind power for a site near the Gulf of Mexico by analyzing meteorological data. After collecting and preprocessing meteorological data, nine different input feature combinations were designed to assess their impact on wind power predictions at multiple heights. The results show that using wind speed as the output feature improves prediction accuracy by approximately 10% compared to using wind power as the output. In addition, the improvement of multi-feature input compared with single-feature input is not obvious mainly due to the poor correlation among key features and limited generalization ability of models. These findings underscore the importance of selecting appropriate output features and highlight considerations for using machine learning in wind power forecasting, offering insights that could guide future wind power prediction models and conversion techniques.

artificial intelligence, machine learning, wind speed, (13 more...)

arXiv.org Artificial Intelligence

2503.13493

Country:

Asia (1.00)
North America > United States > Texas (0.15)

Genre: Research Report > New Finding (0.69)

Industry: Energy > Renewable > Wind (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

Few-Shot Multilingual Open-Domain QA from 5 Examples

Jiang, Fan, Drummond, Tom, Cohn, Trevor

arXiv.org Artificial IntelligenceFeb-26-2025

Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a \emph{few-shot learning} approach to synthesise large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, \textsc{FsModQA}, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a \emph{cross-lingual prompting} strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.19722

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.14)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.93)
Government > Regional Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Franken-Adapter: Cross-Lingual Adaptation of LLMs by Embedding Surgery

Jiang, Fan, Yu, Honglin, Chung, Grace, Cohn, Trevor

arXiv.org Artificial IntelligenceFeb-11-2025

The capabilities of Large Language Models (LLMs) in low-resource languages lag far behind those in English, making their universal accessibility a significant challenge. To alleviate this, we present $\textit{Franken-Adapter}$, a modular language adaptation approach for decoder-only LLMs with embedding surgery. Our method begins by creating customized vocabularies for target languages and performing language adaptation through embedding tuning on multilingual data. These pre-trained embeddings are subsequently integrated with LLMs that have been instruction-tuned on English alignment data to enable zero-shot cross-lingual transfer. Our experiments on $\texttt{Gemma2}$ models with up to 27B parameters demonstrate improvements of up to 20% across 96 languages, spanning both discriminative and generative tasks, with minimal regressions ($<$1%) in English. Further in-depth analysis reveals the critical role of customizing tokenizers in enhancing language adaptation, while boosting inference efficiency. Additionally, we show the versatility of our method by achieving a 14% improvement over a math-optimized LLM across 20 languages, offering a modular solution to transfer reasoning abilities across languages post hoc.

franken-adapter, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.08037

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Human-In-the-Loop Software Development Agents

Takerngsaksiri, Wannita, Pasuksmit, Jirat, Thongtanunam, Patanamon, Tantithamthavorn, Chakkrit, Zhang, Ruixiong, Jiang, Fan, Li, Jing, Cook, Evan, Chen, Kun, Wu, Ming

arXiv.org Artificial IntelligenceJan-9-2025

Recently, Large Language Models (LLMs)-based multi-agent paradigms for software engineering are introduced to automatically resolve software development tasks (e.g., from a given issue to source code). However, existing work is evaluated based on historical benchmark datasets, rarely considers human feedback at each stage of the automated software development process, and has not been deployed in practice. In this paper, we introduce a Human-in-the-loop LLM-based Agents framework (HULA) for software development that allows software engineers to refine and guide LLMs when generating coding plans and source code for a given task. We design, implement, and deploy the HULA framework into Atlassian JIRA for internal uses. Through a multi-stage evaluation of the HULA framework, Atlassian software engineers perceive that HULA can minimize the overall development time and effort, especially in initiating a coding plan and writing code for straightforward tasks. On the other hand, challenges around code quality remain a concern in some cases. We draw lessons learned and discuss opportunities for future work, which will pave the way for the advancement of LLM-based agents in software development.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.12924

Country: Oceania > Australia (0.14)

Genre:

Workflow (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

Sun, Xingwu, Chen, Yanfeng, Huang, Yiqing, Xie, Ruobing, Zhu, Jiaqi, Zhang, Kai, Li, Shuaipeng, Yang, Zhen, Han, Jonny, Shu, Xiaobo, Bu, Jiahao, Chen, Zhongzhi, Huang, Xuemeng, Lian, Fengzong, Yang, Saiyong, Yan, Jianfeng, Zeng, Yuyuan, Ren, Xiaoqin, Yu, Chao, Wu, Lulu, Mao, Yue, Xia, Jun, Yang, Tao, Zheng, Suncong, Wu, Kan, Jiao, Dian, Xue, Jinbao, Zhang, Xipeng, Wu, Decheng, Liu, Kai, Wu, Dengpeng, Xu, Guanghui, Chen, Shaohua, Chen, Shuang, Feng, Xiao, Hong, Yigeng, Zheng, Junqiang, Xu, Chengcheng, Li, Zongwei, Kuang, Xiong, Hu, Jianglu, Chen, Yiqi, Deng, Yuchi, Li, Guiyang, Liu, Ao, Zhang, Chenchen, Hu, Shihui, Zhao, Zilong, Wu, Zifan, Ding, Yao, Wang, Weichao, Liu, Han, Wang, Roberts, Fei, Hao, Yu, Peijie, Zhao, Ze, Cao, Xun, Wang, Hai, Xiang, Fusheng, Huang, Mengyuan, Xiong, Zhiyuan, Hu, Bin, Hou, Xuebin, Jiang, Lei, Ma, Jianqiang, Wu, Jiajia, Deng, Yaping, Shen, Yi, Wang, Qian, Liu, Weijie, Liu, Jie, Chen, Meng, Dong, Liang, Jia, Weiwen, Chen, Hu, Liu, Feifei, Yuan, Rui, Xu, Huilin, Yan, Zhenxiang, Cao, Tengfei, Hu, Zhichao, Feng, Xinhua, Du, Dong, Yu, Tinghao, Tao, Yangyu, Zhang, Feng, Zhu, Jianchen, Xu, Chengzhong, Li, Xirui, Zha, Chong, Ouyang, Wen, Xia, Yinben, Li, Xiang, He, Zekun, Chen, Rongpeng, Song, Jiawei, Chen, Ruibin, Jiang, Fan, Zhao, Chongqing, Wang, Bo, Gong, Hao, Gan, Rong, Hu, Winston, Kang, Zhanhui, Yang, Yong, Liu, Yuhong, Wang, Di, Jiang, Jie

arXiv.org Artificial IntelligenceNov-6-2024

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

hunyuan-large, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2411.02265

Genre:

Research Report (0.50)
Workflow (0.46)

Industry:

Education (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic Supervision

Jiang, Fan, Drummond, Tom, Cohn, Trevor

arXiv.org Artificial IntelligenceJun-16-2024

Cross-lingual open domain question answering (CLQA) is a complex problem, comprising cross-lingual retrieval from a multilingual knowledge base, followed by answer generation in the query language. Both steps are usually tackled by separate models, requiring substantial annotated datasets, and typically auxiliary resources, like machine translation systems to bridge between languages. In this paper, we show that CLQA can be addressed using a single encoder-decoder model. To effectively train this model, we propose a self-supervised method based on exploiting the cross-lingual link structure within Wikipedia. We demonstrate how linked Wikipedia pages can be used to synthesise supervisory signals for cross-lingual retrieval, through a form of cloze query, and generate more natural questions to supervise answer generation. Together, we show our approach, \texttt{CLASS}, outperforms comparable methods on both supervised and zero-shot language adaptation settings, including those using machine translation.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2402.16508

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Washington > King County > Seattle (0.14)
Oceania > Australia > Victoria > Melbourne (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer

Liu, Ji, Tang, Dehua, Huang, Yuanxian, Zhang, Li, Zeng, Xiaocheng, Li, Dong, Lu, Mingjie, Peng, Jinzhang, Wang, Yu, Jiang, Fan, Tian, Lu, Sirasao, Ashish

arXiv.org Artificial IntelligenceJan-12-2024

Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, finetuning subnet by directly removing activation layers would corrupt the original model weights, hindering the pruned model from achieving high performance. To address these issues, we propose a novel depth pruning method for efficient models. Our approach proposes a novel block pruning strategy and progressive training method for the subnet. Additionally, we extend our pruning method to vision transformer models. Experimental results demonstrate that our method consistently outperforms existing depth pruning methods across various pruning configurations. We obtained three pruned ConvNeXtV1 models with our method applying on ConvNeXtV1, which surpass most SOTA efficient models with comparable inference performance. Our method also achieves state-of-the-art pruning performance on the vision transformer model.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2401.06426

Country: Asia > China (0.14)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Boot and Switch: Alternating Distillation for Zero-Shot Dense Retrieval

Jiang, Fan, Xu, Qiongkai, Drummond, Tom, Cohn, Trevor

arXiv.org Artificial IntelligenceNov-27-2023

Neural 'dense' retrieval models are state of the art for many datasets, however these models often exhibit limited domain transfer ability. Existing approaches to adaptation are unwieldy, such as requiring explicit supervision, complex model architectures, or massive external models. We present $\texttt{ABEL}$, a simple but effective unsupervised method to enhance passage retrieval in zero-shot settings. Our technique follows a straightforward loop: a dense retriever learns from supervision signals provided by a reranker, and subsequently, the reranker is updated based on feedback from the improved retriever. By iterating this loop, the two components mutually enhance one another's performance. Experimental results demonstrate that our unsupervised $\texttt{ABEL}$ model outperforms both leading supervised and unsupervised retrievers on the BEIR benchmark. Meanwhile, it exhibits strong adaptation abilities to tasks and domains that were unseen during training. By either fine-tuning $\texttt{ABEL}$ on labelled data or integrating it with existing supervised dense retrievers, we achieve state-of-the-art results.\footnote{Source code is available at \url{https://github.com/Fantabulous-J/BootSwitch}.}

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.15564

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Noisy Self-Training with Synthetic Queries for Dense Retrieval

Jiang, Fan, Drummond, Tom, Cohn, Trevor

arXiv.org Artificial IntelligenceNov-27-2023

Although existing neural retrieval models reveal promising results when training data is abundant and the performance keeps improving as training data increases, collecting high-quality annotated data is prohibitively costly. To this end, we introduce a novel noisy self-training framework combined with synthetic queries, showing that neural retrievers can be improved in a self-evolution manner with no reliance on any external models. Experimental results show that our method improves consistently over existing methods on both general-domain (e.g., MS-MARCO) and out-of-domain (i.e., BEIR) retrieval benchmarks. Extra analysis on low-resource settings reveals that our method is data efficient and outperforms competitive baselines, with as little as 30% of labelled training data. Further extending the framework for reranker training demonstrates that the proposed method is general and yields additional gains on tasks of diverse domains.\footnote{Source code is available at \url{https://github.com/Fantabulous-J/Self-Training-DPR}}

artificial intelligence, machine learning, query, (17 more...)

arXiv.org Artificial Intelligence

2311.15563

Country:

Europe (1.00)
Oceania > Australia > Victoria > Melbourne (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Robust Indoor Localization with Ranging-IMU Fusion

Jiang, Fan, Caruso, David, Dhekne, Ashutosh, Qu, Qi, Engel, Jakob Julian, Dong, Jing

arXiv.org Artificial IntelligenceSep-15-2023

Indoor wireless ranging localization is a promising approach for low-power and high-accuracy localization of wearable devices. A primary challenge in this domain stems from non-line of sight propagation of radio waves. This study tackles a fundamental issue in wireless ranging: the unpredictability of real-time multipath determination, especially in challenging conditions such as when there is no direct line of sight. We achieve this by fusing range measurements with inertial measurements obtained from a low cost Inertial Measurement Unit (IMU). For this purpose, we introduce a novel asymmetric noise model crafted specifically for non-Gaussian multipath disturbances. Additionally, we present a novel Levenberg-Marquardt (LM)-family trust-region adaptation of the iSAM2 fusion algorithm, which is optimized for robust performance for our ranging-IMU fusion problem. We evaluate our solution in a densely occupied real office environment. Our proposed solution can achieve temporally consistent localization with an average absolute accuracy of $\sim$0.3m in real-world settings. Furthermore, our results indicate that we can achieve comparable accuracy even with infrequent (1Hz) range measurements.

artificial intelligence, localization, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2309.08803

Country:

Europe (1.00)
North America > Canada (0.28)
Asia > Japan > Honshū (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)

Add feedback