AITopics | afnet

c6e954799a0218f6d341ad5cbfb58999-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 20:49:18 GMT

afnet, dataset, recognition, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

c6e954799a0218f6d341ad5cbfb58999-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 20:49:15 GMT

Invideo recognition, weneedtosample multiple frames torepresent eachvideo which makesthe computational cost scale proportionally to the number of sampled frames. In most cases, a small proportion of all the frames is sampled for each input, which only contains limited information of the original video.

afnet, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Look More but Care Less in Video Recognition

Neural Information Processing SystemsDec-25-2025, 06:10:16 GMT

Existing action recognition methods typically sample a few frames to represent each video to avoid the enormous computation, which often limits the recognition performance. To tackle this problem, we propose Ample and Focal Network (AFNet), which is composed of two branches to utilize more frames but with less computation. Specifically, the Ample Branch takes all input frames to obtain abundant information with condensed computation and provides the guidance for Focal Branch by the proposed Navigation Module; the Focal Branch squeezes the temporal size to only focus on the salient frames at each convolution block; in the end, the results of two branches are adaptively fused to prevent the loss of information. With this design, we can introduce more frames to the network but cost less computation. Besides, we demonstrate AFNet can utilize less frames while achieving higher accuracy as the dynamic selection in intermediate features enforces implicit temporal modeling. Further, we show that our method can be extended to reduce spatial redundancy with even less cost. Extensive experiments on five datasets demonstrate the effectiveness and efficiency of our method.

computation, name change, video recognition, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

c6e954799a0218f6d341ad5cbfb58999-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 20:12:51 GMT

afnet, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

Look More but Care Less in Video Recognition

Neural Information Processing SystemsAug-18-2025, 20:12:47 GMT

With this design, we can introduce more frames to the network but cost less computation.

artificial intelligence, machine learning, navigation module, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The Wilhelm Tell Dataset of Affordance Demonstrations

Ringe, Rachel, Pomarlan, Mihai, Tsiogkas, Nikolaos, De Giorgis, Stefano, Hedblom, Maria, Malaka, Rainer

arXiv.org Artificial IntelligenceJul-24-2025

Affordances - i.e. possibilities for action that an environment or objects in it provide - are important for robots operating in human environments to perceive. Existing approaches train such capabilities on annotated static images or shapes. This work presents a novel dataset for affordance learning of common household tasks. Unlike previous approaches, our dataset consists of video sequences demonstrating the tasks from first- and third-person perspectives, along with metadata about the affordances that are manifested in the task, and is aimed towards training perception systems to recognize affordance manifestations. The demonstrations were collected from several participants and in total record about seven hours of human activity. The variety of task performances also allows studying preparatory maneuvers that people may perform for a task, such as how they arrange their task space, which is also relevant for collaborative service robots.

affordance, artificial intelligence, dataset, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/HRI61500.2025.10973984

2507.17401

Country: Europe > Germany > Bremen > Bremen (0.29)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Look More but Care Less in Video Recognition

Neural Information Processing SystemsJan-18-2025, 20:22:49 GMT

Existing action recognition methods typically sample a few frames to represent each video to avoid the enormous computation, which often limits the recognition performance. To tackle this problem, we propose Ample and Focal Network (AFNet), which is composed of two branches to utilize more frames but with less computation. Specifically, the Ample Branch takes all input frames to obtain abundant information with condensed computation and provides the guidance for Focal Branch by the proposed Navigation Module; the Focal Branch squeezes the temporal size to only focus on the salient frames at each convolution block; in the end, the results of two branches are adaptively fused to prevent the loss of information. With this design, we can introduce more frames to the network but cost less computation. Besides, we demonstrate AFNet can utilize less frames while achieving higher accuracy as the dynamic selection in intermediate features enforces implicit temporal modeling. Further, we show that our method can be extended to reduce spatial redundancy with even less cost.

computation, information, video recognition, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.43)

Add feedback

Look More but Care Less in Video Recognition

Zhang, Yitian, Bai, Yue, Wang, Huan, Xu, Yi, Fu, Yun

arXiv.org Artificial IntelligenceNov-17-2022

Existing action recognition methods typically sample a few frames to represent each video to avoid the enormous computation, which often limits the recognition performance. To tackle this problem, we propose Ample and Focal Network (AFNet), which is composed of two branches to utilize more frames but with less computation. Specifically, the Ample Branch takes all input frames to obtain abundant information with condensed computation and provides the guidance for Focal Branch by the proposed Navigation Module; the Focal Branch squeezes the temporal size to only focus on the salient frames at each convolution block; in the end, the results of two branches are adaptively fused to prevent the loss of information. With this design, we can introduce more frames to the network but cost less computation. Besides, we demonstrate AFNet can utilize fewer frames while achieving higher accuracy as the dynamic selection in intermediate features enforces implicit temporal modeling. Further, we show that our method can be extended to reduce spatial redundancy with even less cost. Extensive experiments on five datasets demonstrate the effectiveness and efficiency of our method.

afnet, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.09992

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Filters

Collaborating Authors

afnet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

c6e954799a0218f6d341ad5cbfb58999-Supplemental-Conference.pdf

c6e954799a0218f6d341ad5cbfb58999-Paper-Conference.pdf

Look More but Care Less in Video Recognition

c6e954799a0218f6d341ad5cbfb58999-Supplemental-Conference.pdf

Look More but Care Less in Video Recognition

The Wilhelm Tell Dataset of Affordance Demonstrations

Look More but Care Less in Video Recognition

Look More but Care Less in Video Recognition