Perception-Oriented Video Frame Interpolation via Asymmetric Blending

Guangyang Wu1, Xin Tao2, Changlin Li3, Wenyi Wang4, Xiaohong Liu1+, Qingqing Zheng5+,
1Shanghai Jiao Tong University, 2Kuaishou Technology, 3SeeKoo,
4University of Electronic Science and Technology of China,
5Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences,


Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. These issues can be traced back to two pivotal factors: unavoidable motion errors and misalignment in supervision. In practice, motion estimates often prove to be error-prone, resulting in misaligned features. Furthermore, the reconstruction loss tends to bring blurry results, particularly in misaligned regions.

To mitigate these challenges, we propose a new paradigm called PerVFI (Perception-oriented Video Frame Interpolation). Our approach incorporates an Asymmetric Synergistic Blending module (ASB) that utilizes features from both sides to synergistically blend intermediate features. One reference frame emphasizes primary content, while the other contributes complementary information. To impose a stringent constraint on the blending process, we introduce a self-learned sparse quasi-binary mask which effectively mitigates ghosting and blur artifacts in the output. Additionally, we employ a normalizing flow-based generator and utilize the negative log-likelihood loss to learn the conditional distribution of the output, which further facilitates the generation of clear and fine details. Experimental results validate the superiority of PerVFI, demonstrating significant improvements in perceptual quality compared to existing methods.

Demo Video on DAVIS

The videos above show some of the results of our experiment. In the videos, the one on the left is the low frame rate video, and the one on the right is the video after upsampling by our algorithm. It can be seen that our experiment has achieved very good results.

Competitive Methods

When working on tasks related to video frame interpolation, we often encounter problems such as blurred portraits and scenes. In the video above, the video on the top left is the Ground-truth, the second and third videos are videos filled in by EMA-VFI models and AMT models respectively, and the video on the bottom right is the result of our model. From the video effect, we can find that our model's results are clearer than other models, and there are significant achievements in solving the problem of blurring.

Manuscript (Accepted by CVPR 2024)


  title     ={Perception-Oriented Video Frame Interpolation via Asymmetric Blending},
  author    ={Wu, Guangyang and Tao, Xin and Li, Changlin and Wang, Wenyi and Liu, Xiaohong and Zheng, Qingqing},
  journal   ={arXiv preprint arXiv:2404.06692},
  year      ={2024},