v-iashin/SpecVQGAN: Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)