AITopics | Chen, Cheng

Plotting

Chen, Cheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AI Coach Assist: An Automated Approach for Call Recommendation in Contact Centers for Agent Coaching

Laskar, Md Tahmid Rahman, Chen, Cheng, Fu, Xue-Yong, Azizi, Mahsa, Bhushan, Shashi, Corston-Oliver, Simon

arXiv.org Artificial IntelligenceMay-27-2023

In recent years, the utilization of Artificial Intelligence (AI) in the contact center industry is on the rise. One area where AI can have a significant impact is in the coaching of contact center agents. By analyzing call transcripts using Natural Language Processing (NLP) techniques, it would be possible to quickly determine which calls are most relevant for coaching purposes. In this paper, we present AI Coach Assist, which leverages the pre-trained transformer-based language models to determine whether a given call is coachable or not based on the quality assurance (QA) questions asked by the contact center managers or supervisors. The system was trained and evaluated on a large dataset collected from real-world contact centers and provides an effective way to recommend calls to the contact center managers that are more likely to contain coachable moments. Our experimental findings demonstrate the potential of AI Coach Assist to improve the coaching process, resulting in enhancing the performance of contact center agents.

machine learning, natural language, transcript, (18 more...)

arXiv.org Artificial Intelligence

2305.17619

Country:

North America > United States (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

FedMR: Federated Learning via Model Recombination

Hu, Ming, Yue, Zhihao, Ling, Zhiwei, Huang, Yihao, Chen, Cheng, Wei, Xian, Liu, Yang, Chen, Mingsong

arXiv.org Artificial IntelligenceMay-18-2023

Although Federated Learning (FL) enables global model training across clients without compromising their raw data, existing Federated Averaging (FedAvg)- based methods suffer from the problem of low inference performance, especially for unevenly distributed data among clients. This is mainly because i) FedAvg initializes client models with the same global models, which makes the local training hard to escape from the local search for optimal solutions; and ii) by averaging model parameters in a coarse manner, FedAvg eclipses the individual characteristics of local models. To address such issues that strongly limit the inference capability of FL, we propose a novel and effective FL paradigm named FedMR (Federated Model Recombination). Unlike conventional FedAvg-based methods, the cloud server of FedMR shuffles each layer of collected local models and recombines them to achieve new models for local training on clients. Due to the diversified initialization models for clients coupled with fine-grained model recombination, FedMR can converge to a well-generalized global model for all the clients, leading to a superior inference performance. Experimental results show that, compared with state-of-the-art FL methods, FedMR can significantly improve inference accuracy in a quicker manner without exposing client privacy.

artificial intelligence, local model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2305.1073

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Adversary-Aware Partial label learning with Label distillation

Chen, Cheng, Lyu, Yueming, Tsang, Ivor W.

arXiv.org Artificial IntelligenceApr-2-2023

To ensure that the data collected from human subjects is entrusted with a secret, rival labels are introduced to conceal the information provided by the participants on purpose. The corresponding learning task can be formulated as a noisy partial-label learning problem. However, conventional partial-label learning (PLL) methods are still vulnerable to the high ratio of noisy partial labels, especially in a large labelling space. To learn a more robust model, we present Adversary-Aware Partial Label Learning and introduce the $\textit{rival}$, a set of noisy labels, to the collection of candidate labels for each instance. By introducing the rival label, the predictive distribution of PLL is factorised such that a handy predictive label is achieved with less uncertainty coming from the transition matrix, assuming the rival generation process is known. Nonetheless, the predictive accuracy is still insufficient to produce an sufficiently accurate positive sample set to leverage the clustering effect of the contrastive loss function. Moreover, the inclusion of rivals also brings an inconsistency issue for the classifier and risk function due to the intractability of the transition matrix. Consequently, an adversarial teacher within momentum (ATM) disambiguation algorithm is proposed to cope with the situation, allowing us to obtain a provably consistent classifier and risk function. In addition, our method has shown high resiliency to the choice of the label noise transition matrix. Extensive experiments demonstrate that our method achieves promising results on the CIFAR10, CIFAR100 and CUB200 datasets.

artificial intelligence, machine learning, partial label, (15 more...)

arXiv.org Artificial Intelligence

2304.00498

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Diffusion Model based Semi-supervised Learning on Brain Hemorrhage Images for Efficient Midline Shift Quantification

Gong, Shizhan, Chen, Cheng, Gong, Yuqi, Chan, Nga Yan, Ma, Wenao, Mak, Calvin Hoi-Kwan, Abrigo, Jill, Dou, Qi

arXiv.org Artificial IntelligenceJan-1-2023

Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.

brain hemorrhage image, efficient midline shift quantification, semi-supervised learning, (1 more...)

arXiv.org Artificial Intelligence

2301.00409

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)

Add feedback

Finding Second-Order Stationary Points in Nonconvex-Strongly-Concave Minimax Optimization

Luo, Luo, Li, Yujun, Chen, Cheng

arXiv.org Artificial IntelligenceNov-9-2022

We study the smooth minimax optimization problem $\min_{\bf x}\max_{\bf y} f({\bf x},{\bf y})$, where $f$ is $\ell$-smooth, strongly-concave in ${\bf y}$ but possibly nonconvex in ${\bf x}$. Most of existing works focus on finding the first-order stationary points of the function $f({\bf x},{\bf y})$ or its primal function $P({\bf x})\triangleq \max_{\bf y} f({\bf x},{\bf y})$, but few of them focus on achieving second-order stationary points. In this paper, we propose a novel approach for minimax optimization, called Minimax Cubic Newton (MCN), which could find an $\big(\varepsilon,\kappa^{1.5}\sqrt{\rho\varepsilon}\,\big)$-second-order stationary point of $P({\bf x})$ with calling ${\mathcal O}\big(\kappa^{1.5}\sqrt{\rho}\varepsilon^{-1.5}\big)$ times of second-order oracles and $\tilde{\mathcal O}\big(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\big)$ times of first-order oracles, where $\kappa$ is the condition number and $\rho$ is the Lipschitz continuous constant for the Hessian of $f({\bf x},{\bf y})$. In addition, we propose an inexact variant of MCN for high-dimensional problems to avoid calling expensive second-order oracles. Instead, our method solves the cubic sub-problem inexactly via gradient descent and matrix Chebyshev expansion. This strategy still obtains the desired approximate second-order stationary point with high probability but only requires $\tilde{\mathcal O}\big(\kappa^{1.5}\ell\varepsilon^{-2}\big)$ Hessian-vector oracle calls and $\tilde{\mathcal O}\big(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\big)$ first-order oracle calls. To the best of our knowledge, this is the first work that considers the non-asymptotic convergence behavior of finding second-order stationary points for minimax problems without the convex-concave assumptions.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2110.04814

Country: Asia (0.68)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval

Wen, Keyu, Tan, Zhenshan, Cheng, Qingrong, Chen, Cheng, Gu, Xiaodong

arXiv.org Artificial IntelligenceJul-8-2022

Recently, the cross-modal pre-training task has been a hotspot because of its wide application in various down-streaming researches including retrieval, captioning, question answering and so on. However, exiting methods adopt a one-stream pre-training model to explore the united vision-language representation for conducting cross-modal retrieval, which easily suffer from the calculation explosion. Moreover, although the conventional double-stream structures are quite efficient, they still lack the vital cross-modal interactions, resulting in low performances. Motivated by these challenges, we put forward a Contrastive Cross-Modal Knowledge Sharing Pre-training (COOKIE) to grasp the joint text-image representations. Structurally, COOKIE adopts the traditional double-stream structure because of the acceptable time consumption. To overcome the inherent defects of double-stream structure as mentioned above, we elaborately design two effective modules. Concretely, the first module is a weight-sharing transformer that builds on the head of the visual and textual encoders, aiming to semantically align text and image. This design enables visual and textual paths focus on the same semantics. The other one is three specially designed contrastive learning, aiming to share knowledge between different models. The shared cross-modal knowledge develops the study of unimodal representation greatly, promoting the single-modal retrieval tasks. Extensive experimental results on multi-modal matching researches that includes cross-modal retrieval, text matching, and image retrieval reveal the superiors in calculation efficiency and statistical indicators of our pre-training model.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2207.00733

Country: Asia (0.28)

Genre: Research Report (0.81)

Industry: Transportation (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Pati, Sarthak, Baid, Ujjwal, Edwards, Brandon, Sheller, Micah, Wang, Shih-Han, Reina, G Anthony, Foley, Patrick, Gruzdev, Alexey, Karkada, Deepthi, Davatzikos, Christos, Sako, Chiharu, Ghodasara, Satyam, Bilello, Michel, Mohan, Suyash, Vollmuth, Philipp, Brugnara, Gianluca, Preetha, Chandrakanth J, Sahm, Felix, Maier-Hein, Klaus, Zenk, Maximilian, Bendszus, Martin, Wick, Wolfgang, Calabrese, Evan, Rudie, Jeffrey, Villanueva-Meyer, Javier, Cha, Soonmee, Ingalhalikar, Madhura, Jadhav, Manali, Pandey, Umang, Saini, Jitender, Garrett, John, Larson, Matthew, Jeraj, Robert, Currie, Stuart, Frood, Russell, Fatania, Kavi, Huang, Raymond Y, Chang, Ken, Balana, Carmen, Capellades, Jaume, Puig, Josep, Trenkler, Johannes, Pichler, Josef, Necker, Georg, Haunschmidt, Andreas, Meckel, Stephan, Shukla, Gaurav, Liem, Spencer, Alexander, Gregory S, Lombardo, Joseph, Palmer, Joshua D, Flanders, Adam E, Dicker, Adam P, Sair, Haris I, Jones, Craig K, Venkataraman, Archana, Jiang, Meirui, So, Tiffany Y, Chen, Cheng, Heng, Pheng Ann, Dou, Qi, Kozubek, Michal, Lux, Filip, Michálek, Jan, Matula, Petr, Keřkovský, Miloš, Kopřivová, Tereza, Dostál, Marek, Vybíhal, Václav, Vogelbaum, Michael A, Mitchell, J Ross, Farinhas, Joaquim, Maldjian, Joseph A, Yogananda, Chandan Ganesh Bangalore, Pinho, Marco C, Reddy, Divya, Holcomb, James, Wagner, Benjamin C, Ellingson, Benjamin M, Cloughesy, Timothy F, Raymond, Catalina, Oughourlian, Talia, Hagiwara, Akifumi, Wang, Chencai, To, Minh-Son, Bhardwaj, Sargam, Chong, Chee, Agzarian, Marc, Falcão, Alexandre Xavier, Martins, Samuel B, Teixeira, Bernardo C A, Sprenger, Flávia, Menotti, David, Lucio, Diego R, LaMontagne, Pamela, Marcus, Daniel, Wiestler, Benedikt, Kofler, Florian, Ezhov, Ivan, Metz, Marie, Jain, Rajan, Lee, Matthew, Lui, Yvonne W, McKinley, Richard, Slotboom, Johannes, Radojewski, Piotr, Meier, Raphael, Wiest, Roland, Murcia, Derrick, Fu, Eric, Haas, Rourke, Thompson, John, Ormond, David Ryan, Badve, Chaitra, Sloan, Andrew E, Vadmal, Vachan, Waite, Kristin, Colen, Rivka R, Pei, Linmin, Ak, Murat, Srinivasan, Ashok, Bapuraj, J Rajiv, Rao, Arvind, Wang, Nicholas, Yoshiaki, Ota, Moritani, Toshio, Turk, Sevcan, Lee, Joonsang, Prabhudesai, Snehal, Morón, Fanny, Mandel, Jacob, Kamnitsas, Konstantinos, Glocker, Ben, Dixon, Luke V M, Williams, Matthew, Zampakis, Peter, Panagiotopoulos, Vasileios, Tsiganos, Panagiotis, Alexiou, Sotiris, Haliassos, Ilias, Zacharaki, Evangelia I, Moustakas, Konstantinos, Kalogeropoulou, Christina, Kardamakis, Dimitrios M, Choi, Yoon Seong, Lee, Seung-Koo, Chang, Jong Hee, Ahn, Sung Soo, Luo, Bing, Poisson, Laila, Wen, Ning, Tiwari, Pallavi, Verma, Ruchika, Bareja, Rohan, Yadav, Ipsa, Chen, Jonathan, Kumar, Neeraj, Smits, Marion, van der Voort, Sebastian R, Alafandi, Ahmed, Incekara, Fatih, Wijnenga, Maarten MJ, Kapsas, Georgios, Gahrmann, Renske, Schouten, Joost W, Dubbink, Hendrikus J, Vincent, Arnaud JPE, Bent, Martin J van den, French, Pim J, Klein, Stefan, Yuan, Yading, Sharma, Sonam, Tseng, Tzu-Chi, Adabi, Saba, Niclou, Simone P, Keunen, Olivier, Hau, Ann-Christin, Vallières, Martin, Fortin, David, Lepage, Martin, Landman, Bennett, Ramadass, Karthik, Xu, Kaiwen, Chotai, Silky, Chambless, Lola B, Mistry, Akshitkumar, Thompson, Reid C, Gusev, Yuriy, Bhuvaneshwar, Krithika, Sayah, Anousheh, Bencheqroun, Camelia, Belouali, Anas, Madhavan, Subha, Booth, Thomas C, Chelliah, Alysha, Modat, Marc, Shuaib, Haris, Dragos, Carmen, Abayazeed, Aly, Kolodziej, Kenneth, Hill, Michael, Abbassy, Ahmed, Gamal, Shady, Mekhaimar, Mahmoud, Qayati, Mohamed, Reyes, Mauricio, Park, Ji Eun, Yun, Jihye, Kim, Ho Sung, Mahajan, Abhishek, Muzi, Mark, Benson, Sean, Beets-Tan, Regina G H, Teuwen, Jonas, Herrera-Trujillo, Alejandro, Trujillo, Maria, Escobar, William, Abello, Ana, Bernal, Jose, Gómez, Jhon, Choi, Joseph, Baek, Stephen, Kim, Yusung, Ismael, Heba, Allen, Bryan, Buatti, John M, Kotrotsou, Aikaterini, Li, Hongwei, Weiss, Tobias, Weller, Michael, Bink, Andrea, Pouymayou, Bertrand, Shaykh, Hassan F, Saltz, Joel, Prasanna, Prateek, Shrestha, Sampurna, Mani, Kartik M, Payne, David, Kurc, Tahsin, Pelaez, Enrique, Franco-Maldonado, Heydy, Loayza, Francis, Quevedo, Sebastian, Guevara, Pamela, Torche, Esteban, Mendoza, Cristobal, Vera, Franco, Ríos, Elvis, López, Eduardo, Velastin, Sergio A, Ogbole, Godwin, Oyekunle, Dotun, Odafe-Oyibotha, Olubunmi, Osobu, Babatunde, Shu'aibu, Mustapha, Dorcas, Adeleye, Soneye, Mayowa, Dako, Farouk, Simpson, Amber L, Hamghalam, Mohammad, Peoples, Jacob J, Hu, Ricky, Tran, Anh, Cutler, Danielle, Moraes, Fabio Y, Boss, Michael A, Gimpel, James, Veettil, Deepak Kattil, Schmidt, Kendall, Bialecki, Brian, Marella, Sailaja, Price, Cynthia, Cimino, Lisa, Apgar, Charles, Shah, Prashant, Menze, Bjoern, Barnholtz-Sloan, Jill S, Martin, Jason, Bakas, Spyridon

arXiv.org Artificial IntelligenceApr-25-2022

Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25, 256 MRI scans from 6, 314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41467-022-33407-5

2204.10836

Country:

South America (1.00)
North America > Canada (1.00)
Asia (1.00)
(3 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Prediction of liquid fuel properties using machine learning models with Gaussian processes and probabilistic conditional generative learning

Freitas, Rodolfo S. M., Lima, Ágatha P. F., Chen, Cheng, Rochinha, Fernando A., Mira, Daniel, Jiang, Xi

arXiv.org Machine LearningOct-18-2021

Accurate determination of fuel properties of complex mixtures over a wide range of pressure and temperature conditions is essential to utilizing alternative fuels. The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels. Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach. Here, Gaussian Process (GP) and probabilistic generative models are adopted. GP is a popular non-parametric Bayesian approach to build surrogate models mainly due to its capacity to handle the aleatory and epistemic uncertainties. Generative models have shown the ability of deep neural networks employed with the same intent. In this work, ML analysis is focused on a particular property, the fuel density, but it can also be extended to other physicochemical properties. This study explores the versatility of the ML models to handle multi-fidelity data. The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.

artificial intelligence, evolutionary algorithm, machine learning, (24 more...)

arXiv.org Machine Learning

2110.0936

Country:

Europe (0.46)
South America > Brazil > Rio de Janeiro (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Materials (0.95)
Energy > Oil & Gas > Downstream (0.71)
Energy > Renewable > Biofuel > Biodiesel (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
(2 more...)

Add feedback

Learning to Bridge Metric Spaces: Few-shot Joint Learning of Intent Detection and Slot Filling

Hou, Yutai, Lai, Yongkui, Chen, Cheng, Che, Wanxiang, Liu, Ting

arXiv.org Artificial IntelligenceMay-25-2021

In this paper, we investigate few-shot joint learning for dialogue language understanding. Most existing few-shot models learn a single task each time with only a few examples. However, dialogue language understanding contains two closely related tasks, i.e., intent detection and slot filling, and often benefits from jointly learning the two tasks. This calls for new few-shot learning techniques that are able to capture task relations from only a few examples and jointly learn multiple tasks. To achieve this, we propose a similarity-based few-shot learning scheme, named Contrastive Prototype Merging network (ConProm), that learns to bridge metric spaces of intent and slot on data-rich domains, and then adapt the bridged metric space to the specific few-shot domain. Experiments on two public datasets, Snips and FewJoint, show that our model significantly outperforms the strong baselines in one and five shots settings.

artificial intelligence, computational linguistics, neural network, (17 more...)

arXiv.org Artificial Intelligence

2106.07343

Country:

Europe (1.00)
Asia > China (0.30)
North America > United States > Louisiana (0.14)
North America > United States > Minnesota (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.83)

Add feedback

Temporal Memory Relation Network for Workflow Recognition from Surgical Video

Jin, Yueming, Long, Yonghao, Chen, Cheng, Zhao, Zixu, Dou, Qi, Heng, Pheng-Ann

arXiv.org Artificial IntelligenceMar-30-2021

Automatic surgical workflow recognition is a key component for developing context-aware computer-assisted systems in the operating theatre. Previous works either jointly modeled the spatial features with short fixed-range temporal information, or separately learned visual and long temporal cues. In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features. We establish a long-range memory bank to serve as a memory cell storing the rich supportive information. Through our designed temporal variation layer, the supportive cues are further enhanced by multi-scale temporal-only convolutions. To effectively incorporate the two types of cues without disturbing the joint learning of spatio-temporal features, we introduce a non-local bank operator to attentively relate the past to the present. In this regard, our TMRNet enables the current feature to view the long-range temporal dependency, as well as tolerate complex temporal extents. We have extensively validated our approach on two benchmark surgical video datasets, M2CAI challenge dataset and Cholec80 dataset. Experimental results demonstrate the outstanding performance of our method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 67.0% v.s. 78.9% Jaccard on Cholec80 dataset).

deep learning, neural network, recognition, (21 more...)

arXiv.org Artificial Intelligence

2103.16327

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (0.69)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback