Generative AI
FinRobot: AI Agent for Equity Research and Valuation with Large Language Models
Zhou, Tianyu, Wang, Pinqiao, Wu, Yilin, Yang, Hongyang
As financial markets grow increasingly complex, there is a rising need for automated tools that can effectively assist human analysts in equity research, particularly within sell-side research. While Generative AI (GenAI) has attracted significant attention in this field, existing AI solutions often fall short due to their narrow focus on technical factors and limited capacity for discretionary judgment. These limitations hinder their ability to adapt to new data in real-time and accurately assess risks, which diminishes their practical value for investors. This paper presents FinRobot, the first AI agent framework specifically designed for equity research. FinRobot employs a multi-agent Chain of Thought (CoT) system, integrating both quantitative and qualitative analyses to emulate the comprehensive reasoning of a human analyst. The system is structured around three specialized agents: the Data-CoT Agent, which aggregates diverse data sources for robust financial integration; the Concept-CoT Agent, which mimics an analysts reasoning to generate actionable insights; and the Thesis-CoT Agent, which synthesizes these insights into a coherent investment thesis and report. FinRobot provides thorough company analysis supported by precise numerical data, industry-appropriate valuation metrics, and realistic risk assessments. Its dynamically updatable data pipeline ensures that research remains timely and relevant, adapting seamlessly to new financial information. Unlike existing automated research tools, such as CapitalCube and Wright Reports, FinRobot delivers insights comparable to those produced by major brokerage firms and fundamental research vendors. We open-source FinRobot at \url{https://github. com/AI4Finance-Foundation/FinRobot}.
Generative AI for Data Augmentation in Wireless Networks: Analysis, Applications, and Case Study
Wen, Jinbo, Kang, Jiawen, Niyato, Dusit, Zhang, Yang, Wang, Jiacheng, Sikdar, Biplab, Zhang, Ping
Data augmentation is a powerful technique to mitigate data scarcity. However, owing to fundamental differences in wireless data structures, traditional data augmentation techniques may not be suitable for wireless data. Fortunately, Generative Artificial Intelligence (GenAI) can be an effective alternative to wireless data augmentation due to its excellent data generation capability. This article systemically explores the potential and effectiveness of GenAI-driven data augmentation in wireless networks. We first briefly review data augmentation techniques, discuss their limitations in wireless networks, and introduce generative data augmentation, including reviewing GenAI models and their applications in data augmentation. We then explore the application prospects of GenAI-driven data augmentation in wireless networks from the physical, network, and application layers, which provides a GenAI-driven data augmentation architecture for each application. Subsequently, we propose a general generative diffusion model-based data augmentation framework for Wi-Fi gesture recognition, which uses transformer-based diffusion models to generate high-quality channel state information data. Furthermore, we develop residual neural network models for Wi-Fi gesture recognition to evaluate the role of augmented data and conduct a case study based on a real dataset. Simulation results demonstrate the effectiveness of the proposed framework. Finally, we discuss research directions for generative data augmentation.
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
He, Yancheng, Li, Shilong, Liu, Jiaheng, Tan, Yingshui, Wang, Weixun, Huang, Hui, Bu, Xingyuan, Guo, Hangyu, Hu, Chengwei, Zheng, Boren, Lin, Zhuoran, Liu, Xuepeng, Sun, Dekai, Lin, Shirong, Zheng, Zhicheng, Zhu, Xiaoyong, Su, Wenbo, Zheng, Bo
New LLM evaluation benchmarks are important to align with the rapid development of Large Language Models (LLMs). In this work, we present Chinese SimpleQA, the first comprehensive Chinese benchmark to evaluate the factuality ability of language models to answer short questions, and Chinese SimpleQA mainly has five properties (i.e., Chinese, Diverse, High-quality, Static, Easy-to-evaluate). Specifically, first, we focus on the Chinese language over 6 major topics with 99 diverse subtopics. Second, we conduct a comprehensive quality control process to achieve high-quality questions and answers, where the reference answers are static and cannot be changed over time. Third, following SimpleQA, the questions and answers are very short, and the grading process is easy-to-evaluate based on OpenAI API. Based on Chinese SimpleQA, we perform a comprehensive evaluation on the factuality abilities of existing LLMs. Finally, we hope that Chinese SimpleQA could guide the developers to better understand the Chinese factuality abilities of their models and facilitate the growth of foundation models.
Students' Perceptions and Use of Generative AI Tools for Programming Across Different Computing Courses
Keuning, Hieke, Alpizar-Chacon, Isaac, Lykourentzou, Ioanna, Beehler, Lauren, Kรถppe, Christian, de Jong, Imke, Sosnovsky, Sergey
Investigation of students' perceptions and opinions on the use of generative artificial intelligence (GenAI) in education is a topic gaining much interest. Studies addressing this are typically conducted with large heterogeneous groups, at one moment in time. However, how students perceive and use GenAI tools can potentially depend on many factors, including their background knowledge, familiarity with the tools, and the learning goals and policies of the courses they are taking. In this study we explore how students following computing courses use GenAI for programming-related tasks across different programs and courses: Bachelor and Master, in courses in which learning programming is the learning goal, courses that require programming as a means to achieve another goal, and in courses in which programming is optional, but can be useful. We are also interested in changes over time, since GenAI capabilities are changing at a fast pace, and users are adopting GenAI increasingly. We conducted three consecutive surveys (fall `23, winter `23, and spring `24) among students of all computing programs of a large European research university. We asked questions on the use in education, ethics, and job prospects, and we included specific questions on the (dis)allowed use of GenAI tools in the courses they were taking at the time. We received 264 responses, which we quantitatively and qualitatively analyzed, to find out how students have employed GenAI tools across 59 different computing courses, and whether the opinion of an average student about these tools evolves over time. Our study contributes to the emerging discussion of how to differentiate GenAI use across different courses, and how to align its use with the learning goals of a computing course.
Baidu announces its own pair of AI smart glasses
Baidu, which is often called China's answer to Google, has launched its own pair of AI-powered smart glasses at its annual World Conference event in Shanghai. The device will run on the company's ERNIE generative AI technology and was designed to "become a private assistant," according to the Financial Times. Users will reportedly be able to interact with the device using their voice and ask it questions about what it sees in their current environment. They can also tell it to play music and even track their calories consumption. And since the glasses are equipped with cameras, they can ask it to snap photos or take videos.
The Download: parkour for robot dogs, and Africa's AI ambitions
Teaching robots to navigate new environments is tough. You can train them on physical, real-world data taken from recordings made by humans, but that's scarce, and expensive to collect. Digital simulations are a rapid, scalable way to teach them to do new things, but the robots often fail when they're pulled out of virtual worlds and asked to do the same tasks in the real one. Now, there's potentially a better option: a new system that uses generative AI models in conjunction with a physics simulator to develop virtual training grounds that more accurately mirror the physical world. Robots trained using this method worked with a higher success rate than those trained using more traditional techniques during real-world tests.
Generative AI taught a robot dog to scramble around a new environment
Researchers used the system, called LucidSim, to train a robot dog in parkour, getting it to scramble over a box and climb stairs even though it had never seen any real-world data. The approach demonstrates how helpful generative AI could be when it comes to teaching robots to do challenging tasks. It also raises the possibility that we could ultimately train them in entirely virtual worlds. The research was presented at the Conference on Robot Learning (CoRL) last week. "We're in the middle of an industrial revolution for robotics," says Ge Yang, a postdoc at MIT's Computer Science and Artificial Intelligence Laboratory, who worked on the project. "This is our attempt at understanding the impact of these [generative AI] models outside of their original intended purposes, with the hope that it will lead us to the next generation of tools and models."
Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering
Sammour, Farouq, Xu, Jia, Wang, Xi, Hu, Mo, Zhang, Zhenyu
Construction remains one of the most hazardous sectors. Recent advancements in AI, particularly Large Language Models (LLMs), offer promising opportunities for enhancing workplace safety. However, responsible integration of LLMs requires systematic evaluation, as deploying them without understanding their capabilities and limitations risks generating inaccurate information, fostering misplaced confidence, and compromising worker safety. This study evaluates the performance of two widely used LLMs, GPT-3.5 and GPT-4o, across three standardized exams administered by the Board of Certified Safety Professionals (BCSP). Using 385 questions spanning seven safety knowledge areas, the study analyzes the models' accuracy, consistency, and reliability. Results show that both models consistently exceed the BCSP benchmark, with GPT-4o achieving an accuracy rate of 84.6% and GPT-3.5 reaching 73.8%. Both models demonstrate strengths in safety management systems and hazard identification and control, but exhibit weaknesses in science, mathematics, emergency response, and fire prevention. An error analysis identifies four primary limitations affecting LLM performance: lack of knowledge, reasoning flaws, memory issues, and calculation errors. Our study also highlights the impact of prompt engineering strategies, with variations in accuracy reaching 13.5% for GPT-3.5 and 7.9% for GPT-4o. However, no single prompt configuration proves universally effective. This research advances knowledge in three ways: by identifying areas where LLMs can support safety practices and where human oversight remains essential, by offering practical insights into improving LLM implementation through prompt engineering, and by providing evidence-based direction for future research and development. These contributions support the responsible integration of AI in construction safety management toward achieving zero injuries.
Evaluating AI-Generated Essays with GRE Analytical Writing Assessment
Zhong, Yang, Hao, Jiangang, Fauss, Michael, Li, Chen, Wang, Yuan
The recent revolutionary advance in generative AI enables the generation of realistic and coherent texts by large language models (LLMs). Despite many existing evaluation metrics on the quality of the generated texts, there is still a lack of rigorous assessment of how well LLMs perform in complex and demanding writing assessments. This study examines essays generated by ten leading LLMs for the analytical writing assessment of the Graduate Record Exam (GRE). We assessed these essays using both human raters and the e-rater automated scoring engine as used in the GRE scoring pipeline. Notably, the top-performing Gemini and GPT-4o received an average score of 4.78 and 4.67, respectively, falling between "generally thoughtful, well-developed analysis of the issue and conveys meaning clearly" and "presents a competent analysis of the issue and conveys meaning with acceptable clarity" according to the GRE scoring guideline. We also evaluated the detection accuracy of these essays, with detectors trained on essays generated by the same and different LLMs.
Using Generative AI and Multi-Agents to Provide Automatic Feedback
Guo, Shuchen, Latif, Ehsan, Zhou, Yifan, Huang, Xuan, Zhai, Xiaoming
This study investigates the use of generative AI and multi-agent systems to provide automatic feedback in educational contexts, particularly for student constructed responses in science assessments. The research addresses a key gap in the field by exploring how multi-agent systems, called AutoFeedback, can improve the quality of GenAI-generated feedback, overcoming known issues such as over-praise and over-inference that are common in single-agent large language models (LLMs). The study developed a multi-agent system consisting of two AI agents: one for generating feedback and another for validating and refining it. The system was tested on a dataset of 240 student responses, and its performance was compared to that of a single-agent LLM. Results showed that AutoFeedback significantly reduced the occurrence of over-praise and over-inference errors, providing more accurate and pedagogically sound feedback. The findings suggest that multi-agent systems can offer a more reliable solution for generating automated feedback in educational settings, highlighting their potential for scalable and personalized learning support. These results have important implications for educators and researchers seeking to leverage AI in formative assessments, offering a pathway to more effective feedback mechanisms that enhance student learning outcomes.