AITopics | Zhang, Yitong

Collaborating Authors

Zhang, Yitong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

Huang, Yiming, Luo, Jianwen, Yu, Yan, Zhang, Yitong, Lei, Fangyu, Wei, Yifan, He, Shizhu, Huang, Lifu, Liu, Xiao, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceOct-10-2024

We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously design the evaluation suite to ensure the accuracy and robustness of the evaluation. We develop the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at https://da-code-bench.github.io.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.07331

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.67)
Transportation > Electric Vehicle (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic Surgery

Cartucho, Joao, Weld, Alistair, Tukra, Samyakh, Xu, Haozheng, Matsuzaki, Hiroki, Ishikawa, Taiyo, Kwon, Minjun, Jang, Yong Eun, Kim, Kwang-Ju, Lee, Gwang, Bai, Bizhe, Kahrs, Lueder, Boecking, Lars, Allmendinger, Simeon, Muller, Leopold, Zhang, Yitong, Jin, Yueming, Bano, Sophia, Vasconcelos, Francisco, Reiter, Wolfgang, Hajek, Jonas, Silva, Bruno, Lima, Estevao, Vilaca, Joao L., Queiros, Sandro, Giannarou, Stamatia

arXiv.org Artificial IntelligenceAug-30-2023

This paper introduces the ``SurgT: Surgical Tracking" challenge which was organised in conjunction with MICCAI 2022. There were two purposes for the creation of this challenge: (1) the establishment of the first standardised benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated data in surgery. A dataset of 157 stereo endoscopic videos from 20 clinical cases, along with stereo camera calibration parameters, have been provided. Participants were assigned the task of developing algorithms to track the movement of soft tissues, represented by bounding boxes, in stereo endoscopic videos. At the end of the challenge, the developed methods were assessed on a previously hidden test subset. This assessment uses benchmarking metrics that were purposely developed for this challenge, to verify the efficacy of unsupervised deep learning algorithms in tracking soft-tissue. The metric used for ranking the methods was the Expected Average Overlap (EAO) score, which measures the average overlap between a tracker's and the ground truth bounding boxes. Coming first in the challenge was the deep learning submission by ICVS-2Ai with a superior EAO score of 0.617. This method employs ARFlow to estimate unsupervised dense optical flow from cropped images, using photometric and regularization losses. Second, Jmees with an EAO of 0.583, uses deep learning for surgical tool segmentation on top of a non-deep learning baseline method: CSRT. CSRT by itself scores a similar EAO of 0.563. The results from this challenge show that currently, non-deep learning methods are still competitive. The dataset and benchmarking tool created for this challenge have been made publicly available at https://surgt.grand-challenge.org/.

deep learning, machine learning, soft-tissue tracker, (3 more...)

arXiv.org Artificial Intelligence

2302.03022

Genre: Research Report (0.40)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback