Space-Time Memory Network for Sounding Object Localization in Videos


Leveraging temporal synchronization and association within sight and sound is an essential step towards robust localization of sounding objects. To this end, we propose a space-time memory network for sounding object localization in videos.

BMVC, 2021


  author    = {Sizhe Li and
               Yapeng Tian and
               Chenliang Xu},
  title     = {Space-Time Memory Network for Sounding Object Localization in Videos},
  journal   = {CoRR},
  volume    = {abs/2111.05526},
  year      = {2021},
  url       = {},
  eprinttype = {arXiv},
  eprint    = {2111.05526},
  timestamp = {Tue, 16 Nov 2021 12:12:31 +0100},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}
Sizhe Lester Li
Sizhe Lester Li

My research interests span robot learning, vision, and physics simulation. Currently, I develop methods for robots to learn to interact with deformable objects with challenging dynamics.