AITopics | Ma, Lin

Collaborating Authors

Ma, Lin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment

Xu, Xiaoxu, Yuan, Yitian, Zhang, Qiudan, Wu, Wenhui, Jie, Zequn, Ma, Lin, Wang, Xu

arXiv.org Artificial IntelligenceDec-15-2023

Learning to ground natural language queries to target objects or regions in 3D point clouds is quite essential for 3D scene understanding. Nevertheless, existing 3D visual grounding approaches require a substantial number of bounding box annotations for text queries, which is time-consuming and labor-intensive to obtain. In this paper, we propose \textbf{3D-VLA}, a weakly supervised approach for \textbf{3D} visual grounding based on \textbf{V}isual \textbf{L}inguistic \textbf{A}lignment. Our 3D-VLA exploits the superior ability of current large-scale vision-language models (VLMs) on aligning the semantics between texts and 2D images, as well as the naturally existing correspondences between 2D images and 3D point clouds, and thus implicitly constructs correspondences between texts and 3D point clouds with no need for fine-grained box annotations in the training procedure. During the inference stage, the learned text-3D correspondence will help us ground the text queries to the 3D target objects even without 2D images. To the best of our knowledge, this is the first work to investigate 3D visual grounding in a weakly supervised manner by involving large scale vision-language models, and extensive experiments on ReferIt3D and ScanRefer datasets demonstrate that our 3D-VLA achieves comparable and even superior results over the fully supervised methods.

artificial intelligence, natural language, proposal, (15 more...)

arXiv.org Artificial Intelligence

2312.09625

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

FastPillars: A Deployment-friendly Pillar-based 3D Detector

Zhou, Sifan, Tian, Zhi, Chu, Xiangxiang, Zhang, Xinyu, Zhang, Bo, Lu, Xiaobo, Feng, Chengjian, Jie, Zequn, Chiang, Patrick Yin, Ma, Lin

arXiv.org Artificial IntelligenceDec-13-2023

The deployment of 3D detectors strikes one of the major challenges in real-world self-driving scenarios. Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference, which puts a hard barrier for deployment, especially for on-device applications. In this paper, to tackle the challenge of efficient 3D object detection from an industry perspective, we devise a deployment-friendly pillar-based 3D detector, termed FastPillars. First, we introduce a novel lightweight Max-and-Attention Pillar Encoding (MAPE) module specially for enhancing small 3D objects. Second, we propose a simple yet effective principle for designing a backbone in pillar-based 3D detection. We construct FastPillars based on these designs, achieving high performance and low latency without SPConv. Extensive experiments on two large-scale datasets demonstrate the effectiveness and efficiency of FastPillars for on-device 3D detection regarding both performance and speed. Specifically, FastPillars delivers state-of-the-art accuracy on Waymo Open Dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based). Our code is publicly available at: https://github.com/StiphyJay/FastPillars.

artificial intelligence, fastpillar, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2302.02367

Genre: Research Report (0.64)

Industry: Energy (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Robots (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

EKGNet: A 10.96{\mu}W Fully Analog Neural Network for Intra-Patient Arrhythmia Classification

Haghi, Benyamin, Ma, Lin, Lale, Sahin, Anandkumar, Anima, Emami, Azita

arXiv.org Artificial IntelligenceOct-23-2023

Abstract--We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification. Experimental evaluations on PhysionNet's MIT-BIH and PTB Diagnostics datasets demonstrate the effectiveness of the proposed Despite the challenges associated with The electrocardiogram (ECG) is crucial for monitoring analog circuits, such as susceptibility to noise and device heart health in medical practice [1], [2]. However, accurately variation, they can be effectively utilized for inferring neural detecting and categorizing different waveforms and network algorithms. The presence of inherent system noise in morphologies in ECG signals is challenging, similar to other analog circuits can be leveraged to enhance robustness and time-series data. Moreover, manual analysis is time-consuming improve classification accuracy, aligning with the desirable and prone to errors. Given the prevalence and potential lethality properties of AI algorithms [24]-[26]. of irregular heartbeats, achieving accurate and cost-effective In this paper, we propose EKGNet, a fully analog neural diagnosis of arrhythmic heartbeats is crucial for effectively network with low power consumption (10.96μW) that achieves managing and preventing cardiovascular conditions [3], [4].

artificial intelligence, classification, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2310.15466

Country: North America > United States (0.16)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

LMEye: An Interactive Perception Network for Large Language Models

Li, Yunxin, Hu, Baotian, Chen, Xinyu, Ma, Lin, Xu, Yong, Zhang, Min

arXiv.org Artificial IntelligenceSep-28-2023

Training a Multimodal Large Language Model (MLLM) from scratch, like GPT-4, is resource-intensive. Regarding Large Language Models (LLMs) as the core processor for multimodal information, our paper introduces LMEye, a human-like eye with a play-and-plug interactive perception network, designed to enable dynamic interaction between LLMs and external vision information. Previous methods incorporate visual information into LLMs with a simple visual mapping network or Q-former from BLIP-2. Such networks project the image feature once yet do not consider the interaction between the image and the human input query. Hence, the obtained visual information without being connected to human intention may be inadequate for LLMs to generate intention-following responses, which we refer to as static visual information. LMEye addresses this issue by allowing the LLM to request the desired visual information aligned with various human instructions, which we term as the dynamic visual information interaction. Specifically, LMEye consists of a simple visual mapping network to provide the basic perception of an image for LLMs. It also contains additional modules responsible for acquiring requests from LLMs, performing request-based visual information interaction, and transmitting the resulting interacted visual information to LLMs, respectively. In this way, LLMs act to understand the human query, deliver the corresponding request to the request-based visual information interaction module, and generate the response based on the interleaved multimodal information. We evaluate LMEye through extensive experiments on some multimodal benchmarks, demonstrating that it significantly improves the zero-shot performance on various multimodal tasks compared to previous methods, with less parameters.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.03701

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > Michigan (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

SoccerNet 2023 Challenges Results

Cioppa, Anthony, Giancola, Silvio, Somers, Vladimir, Magera, Floriane, Zhou, Xin, Mkhallati, Hassan, Deliège, Adrien, Held, Jan, Hinojosa, Carlos, Mansourian, Amir M., Miralles, Pierre, Barnich, Olivier, De Vleeschouwer, Christophe, Alahi, Alexandre, Ghanem, Bernard, Van Droogenbroeck, Marc, Kamal, Abdullah, Maglo, Adrien, Clapés, Albert, Abdelaziz, Amr, Xarles, Artur, Orcesi, Astrid, Scott, Atom, Liu, Bin, Lim, Byoungkwon, Chen, Chen, Deuser, Fabian, Yan, Feng, Yu, Fufu, Shitrit, Gal, Wang, Guanshuo, Choi, Gyusik, Kim, Hankyul, Guo, Hao, Fahrudin, Hasby, Koguchi, Hidenari, Ardö, Håkan, Salah, Ibrahim, Yerushalmy, Ido, Muhammad, Iftikar, Uchida, Ikuma, Be'ery, Ishay, Rabarisoa, Jaonary, Lee, Jeongae, Fu, Jiajun, Yin, Jianqin, Xu, Jinghang, Nang, Jongho, Denize, Julien, Li, Junjie, Zhang, Junpei, Kim, Juntae, Synowiec, Kamil, Kobayashi, Kenji, Zhang, Kexin, Habel, Konrad, Nakajima, Kota, Jiao, Licheng, Ma, Lin, Wang, Lizhi, Wang, Luping, Li, Menglong, Zhou, Mengying, Nasr, Mohamed, Abdelwahed, Mohamed, Liashuha, Mykola, Falaleev, Nikolay, Oswald, Norbert, Jia, Qiong, Pham, Quoc-Cuong, Song, Ran, Hérault, Romain, Peng, Rui, Chen, Ruilong, Liu, Ruixuan, Baikulov, Ruslan, Fukushima, Ryuto, Escalera, Sergio, Lee, Seungcheon, Chen, Shimin, Ding, Shouhong, Someya, Taiga, Moeslund, Thomas B., Li, Tianjiao, Shen, Wei, Zhang, Wei, Li, Wei, Dai, Wei, Luo, Weixin, Zhao, Wending, Zhang, Wenjie, Yang, Xinquan, Ma, Yanbiao, Joo, Yeeun, Zeng, Yingsen, Gan, Yiyang, Zhu, Yongqiang, Zhong, Yujie, Ruan, Zheng, Li, Zhiheng, Huang, Zhijian, Meng, Ziyu

arXiv.org Artificial IntelligenceSep-12-2023

The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2309.06006

Country:

Europe (1.00)
Asia > Japan > Honshū > Kantō (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Leveraging Global Binary Masks for Structure Segmentation in Medical Images

Kazemimoghadam, Mahdieh, Yang, Zi, Ma, Lin, Chen, Mingli, Lu, Weiguo, Gu, Xuejun

arXiv.org Artificial IntelligenceAug-24-2023

Deep learning (DL) models for medical image segmentation are highly influenced by intensity variations of input images and lack generalization due to primarily utilizing pixels' intensity information for inference. Acquiring sufficient training data is another challenge limiting models' applications. We proposed to leverage the consistency of organs' anatomical shape and position information in medical images. We introduced a framework leveraging recurring anatomical patterns through global binary masks for organ segmentation. Two scenarios were studied.1) Global binary masks were the only model's (i.e. U-Net) input, forcing exclusively encoding organs' position and shape information for segmentation/localization.2) Global binary masks were incorporated as an additional channel functioning as position/shape clues to mitigate training data scarcity. Two datasets of the brain and heart CT images with their ground-truth were split into (26:10:10) and (12:3:5) for training, validation, and test respectively. Training exclusively on global binary masks led to Dice scores of 0.77(0.06) and 0.85(0.04), with the average Euclidian distance of 3.12(1.43)mm and 2.5(0.93)mm relative to the center of mass of the ground truth for the brain and heart structures respectively. The outcomes indicate that a surprising degree of position and shape information is encoded through global binary masks. Incorporating global binary masks led to significantly higher accuracy relative to the model trained on only CT images in small subsets of training data; the performance improved by 4.3-125.3% and 1.3-48.1% for 1-8 training cases of the brain and heart datasets respectively. The findings imply the advantages of utilizing global binary masks for building generalizable models and to compensate for training data scarcity.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1088/1361-6560/acf2e2

2205.09107

Country:

North America > United States > Texas (0.14)
North America > United States > California > Santa Clara County (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

Jiao, Yang, Jie, Zequn, Chen, Jingjing, Ma, Lin, Jiang, Yu-Gang

arXiv.org Artificial IntelligenceAug-21-2023

Recently, one-stage visual grounders attract high attention due to their comparable accuracy but significantly higher efficiency than two-stage grounders. However, inter-object relation modeling has not been well studied for one-stage grounders. Inter-object relationship modeling, though important, is not necessarily performed among all objects, as only part of them are related to the text query and may confuse the model. We call these objects suspected objects. However, exploring their relationships in the one-stage paradigm is non-trivial because: First, no object proposals are available as the basis on which to select suspected objects and perform relationship modeling. Second, suspected objects are more confusing than others, as they may share similar semantics, be entangled with certain relationships, etc, and thereby more easily mislead the model prediction. Toward this end, we propose a Suspected Object Transformation mechanism (SOT), which can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders to encourage the target object selection among the suspected ones. Suspected objects are dynamically discovered from a learned activation map adapted to the model current discrimination ability during training. Afterward, on top of suspected objects, a Keyword-Aware Discrimination module (KAD) and an Exploration by Random Connection strategy (ERC) are concurrently proposed to help the model rethink its initial prediction. On the one hand, KAD leverages keywords contributing high to suspected object discrimination. On the other hand, ERC allows the model to seek the correct object instead of being trapped in a situation that always exploits the current false prediction. Extensive experiments demonstrate the effectiveness of our proposed method.

computer vision, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2203.05186

Country:

North America > Canada > Ontario (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana (0.14)
(3 more...)

Genre:

Research Report (0.82)
Overview (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields

Hu, Wenbo, Wang, Yuling, Ma, Lin, Yang, Bangbang, Gao, Lin, Liu, Xiao, Ma, Yuewen

arXiv.org Artificial IntelligenceJul-20-2023

Despite the tremendous progress in neural radiance fields (NeRF), we still face a dilemma of the trade-off between quality and efficiency, e.g., MipNeRF presents fine-detailed and anti-aliased renderings but takes days for training, while Instant-ngp can accomplish the reconstruction in a few minutes but suffers from blurring or aliasing when rendering at various distances or resolutions due to ignoring the sampling area. To this end, we propose a novel Tri-Mip encoding that enables both instant reconstruction and anti-aliased high-fidelity rendering for neural radiance fields. The key is to factorize the pre-filtered 3D feature spaces in three orthogonal mipmaps. In this way, we can efficiently perform 3D area sampling by taking advantage of 2D pre-filtered feature maps, which significantly elevates the rendering quality without sacrificing efficiency. To cope with the novel Tri-Mip representation, we propose a cone-casting rendering technique to efficiently sample anti-aliased 3D features with the Tri-Mip encoding considering both pixel imaging and observing distance. Extensive experiments on both synthetic and real-world datasets demonstrate our method achieves state-of-the-art rendering quality and reconstruction speed while maintaining a compact representation that reduces 25% model size compared against Instant-ngp.

artificial intelligence, machine learning, tri-miprf, (18 more...)

arXiv.org Artificial Intelligence

2307.11335

Country: Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PaVa: a novel Path-based Valley-seeking clustering algorithm

Ma, Lin, Liu, Conan, Ma, Tiefeng, Liu, Shuangzhe

arXiv.org Artificial IntelligenceJun-12-2023

Clustering methods are being applied to a wider range of scenarios involving more complex datasets, where the shapes of clusters tend to be arbitrary. In this paper, we propose a novel Path-based Valley-seeking clustering algorithm for arbitrarily shaped clusters. This work aims to seek the valleys among clusters and then individually extract clusters. Three vital techniques are used in this algorithm. First, path distance (minmax distance) is employed to transform the irregular boundaries among clusters, that is density valleys, into perfect spherical shells. Second, a suitable density measurement, $k$-distance, is employed to make adjustment on Minimum Spanning Tree, by which a robust minmax distance is calculated. Third, we seek the transformed density valleys by determining their centers and radius. First, the clusters are wrapped in spherical shells after the distance transformation, making the extraction process efficient even with clusters of arbitrary shape. Second, adjusted Minimum Spanning Tree enhances the robustness of minmax distance under different kinds of noise. Last, the number of clusters does not need to be inputted or decided manually due to the individual extraction process. After applying the proposed algorithm to several commonly used synthetic datasets, the results indicate that the Path-based Valley-seeking algorithm is accurate and efficient. The algorithm is based on the dissimilarity of objects, so it can be applied to a wide range of fields. Its performance on real-world datasets illustrates its versatility.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.07503

Country:

Oceania > Australia (0.28)
Asia (0.28)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

Li, Yunxin, Hu, Baotian, Chen, Xinyu, Ding, Yuxin, Ma, Lin, Zhang, Min

arXiv.org Artificial IntelligenceMay-8-2023

Conditional inference on joint textual and visual clues is a multi-modal reasoning task that textual clues provide prior permutation or external knowledge, which are complementary with visual content and pivotal to deducing the correct option. Previous methods utilizing pretrained vision-language models (VLMs) have achieved impressive performances, yet they show a lack of multimodal context reasoning capability, especially for text-modal information. To address this issue, we propose a Multi-modal Context Reasoning approach, named ModCR. Compared to VLMs performing reasoning via cross modal semantic alignment, it regards the given textual abstract semantic and objective image information as the pre-context information and embeds them into the language model to perform context reasoning. Different from recent vision-aided language models used in natural language processing, ModCR incorporates the multi-view semantic alignment information between language and vision by introducing the learnable alignment prefix between image and text in the pretrained language model. This makes the language model well-suitable for such multi-modal reasoning scenario on joint textual and visual clues. We conduct extensive experiments on two corresponding data sets and experimental results show significantly improved performance (exact gain by 4.8% on PMR test set) compared to previous strong baselines. Code Link: \url{https://github.com/YunxinLi/Multimodal-Context-Reasoning}.

artificial intelligence, information, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.0453

Country:

Asia > China (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback