AITopics | Yang, Shiming

Collaborating Authors

Yang, Shiming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Yi-Lightning Technical Report

Wake, Alan, Chen, Bei, Lv, C. X., Li, Chao, Huang, Chengen, Cai, Chenglin, Zheng, Chujie, Cooper, Daniel, Zhou, Fan, Hu, Feng, Wang, Guoyin, Ji, Heng, Qiu, Howard, Zhu, Jiangcheng, Tian, Jun, Su, Katherine, Zhang, Lihuan, Li, Liying, Song, Ming, Li, Mou, Liu, Peng, Hu, Qicheng, Wang, Shawn, Zhou, Shijun, Yang, Shiming, Li, Shiyong, Zhu, Tianhang, Xie, Wen, He, Xiang, Chen, Xiaobo, Hu, Xiaohui, Ren, Xiaoyi, Niu, Xinyao, Li, Yanpeng, Zhao, Yongke, Luo, Yongzhen, Xu, Yuchi, Sha, Yuxuan, Yan, Zhaodong, Liu, Zhiyuan, Zhang, Zirui, Dai, Zonghong

arXiv.org Artificial IntelligenceDec-20-2024

This technical report presents Yi-Lightning, our latest flagship large language model (LLM). It achieves exceptional performance, ranking 6th overall on Chatbot Arena, with particularly strong results (2nd to 4th place) in specialized categories including Chinese, Math, Coding, and Hard Prompts. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, featuring advanced expert segmentation and routing mechanisms coupled with optimized KV-caching techniques. Our development process encompasses comprehensive pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), where we devise deliberate strategies for multi-stage training, synthetic data construction, and reward modeling. Furthermore, we implement RAISE (Responsible AI Safety Engine), a four-component framework to address safety issues across pre-training, post-training, and serving phases. Empowered by our scalable super-computing infrastructure, all these innovations substantially reduce training, deployment and inference costs while maintaining high-performance standards. With further evaluations on public academic benchmarks, Yi-Lightning demonstrates competitive performance against top-tier LLMs, while we observe a notable disparity between traditional, static benchmark results and real-world, dynamic human preferences. This observation prompts a critical reassessment of conventional benchmarks' utility in guiding the development of more intelligent and powerful AI systems for practical applications. Yi-Lightning is now available through our developer platform at https://platform.lingyiwanwu.com.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.01253

Country: North America (0.28)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Yi: Open Foundation Models by 01.AI

AI, 01., :, null, Young, Alex, Chen, Bei, Li, Chao, Huang, Chengen, Zhang, Ge, Zhang, Guanwei, Li, Heng, Zhu, Jiangcheng, Chen, Jianqun, Chang, Jing, Yu, Kaidong, Liu, Peng, Liu, Qiang, Yue, Shawn, Yang, Senbin, Yang, Shiming, Yu, Tao, Xie, Wen, Huang, Wenhao, Hu, Xiaohui, Ren, Xiaoyi, Niu, Xinyao, Nie, Pengcheng, Xu, Yuchi, Liu, Yudong, Wang, Yue, Cai, Yuxuan, Gu, Zhenyu, Liu, Zhiyuan, Dai, Zonghong

arXiv.org Artificial IntelligenceMar-7-2024

We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU, and our finetuned chat models deliver strong human preference rate on major evaluation platforms like AlpacaEval and Chatbot Arena. Building upon our scalable super-computing infrastructure and the classical transformer architecture, we attribute the performance of Yi models primarily to its data quality resulting from our data-engineering efforts. For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora using a cascaded data deduplication and quality filtering pipeline. For finetuning, we polish a small scale (less than 10K) instruction dataset over multiple iterations such that every single instance has been verified directly by our machine learning engineers. For vision-language, we combine the chat language model with a vision transformer encoder and train the model to align visual representations to the semantic space of the language model. We further extend the context length to 200K through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. We show that extending the depth of the pretrained checkpoint through continual pretraining further improves performance. We believe that given our current results, continuing to scale up model parameters using thoroughly optimized data will lead to even stronger frontier models.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2403.04652

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.92)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)

Add feedback

ESGReveal: An LLM-based approach for extracting structured data from ESG reports

Zou, Yi, Shi, Mengying, Chen, Zhongjie, Deng, Zhu, Lei, ZongXiong, Zeng, Zihan, Yang, Shiming, Tong, HongXiang, Xiao, Lei, Zhou, Wenwen

arXiv.org Artificial IntelligenceDec-25-2023

ESGReveal is an innovative method proposed for efficiently extracting and analyzing Environmental, Social, and Governance (ESG) data from corporate reports, catering to the critical need for reliable ESG information retrieval. This approach utilizes Large Language Models (LLM) enhanced with Retrieval Augmented Generation (RAG) techniques. The ESGReveal system includes an ESG metadata module for targeted queries, a preprocessing module for assembling databases, and an LLM agent for data extraction. Its efficacy was appraised using ESG reports from 166 companies across various sectors listed on the Hong Kong Stock Exchange in 2022, ensuring comprehensive industry and market capitalization representation. Utilizing ESGReveal unearthed significant insights into ESG reporting with GPT-4, demonstrating an accuracy of 76.9% in data extraction and 83.7% in disclosure analysis, which is an improvement over baseline models. This highlights the framework's capacity to refine ESG data analysis precision. Moreover, it revealed a demand for reinforced ESG disclosures, with environmental and social data disclosures standing at 69.5% and 57.2%, respectively, suggesting a pursuit for more corporate transparency. While current iterations of ESGReveal do not process pictorial information, a functionality intended for future enhancement, the study calls for continued research to further develop and compare the analytical capabilities of various LLMs. In summary, ESGReveal is a stride forward in ESG data processing, offering stakeholders a sophisticated tool to better evaluate and advance corporate sustainability efforts. Its evolution is promising in promoting transparency in corporate reporting and aligning with broader sustainable development aims.

esgreveal, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2312.17264

Country: Asia > China > Hong Kong (0.26)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Oil & Gas (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback