The power of pretraining in non typical use cases: A BLIP VL-VQA model pretrained on a generic object dataset, when finetuned for only 2 epochs on a visual navigation task achieves similar performance with an 18 epochs randomly-initialized training on the same data. #VOXReality https://lnkd.in/dSj436Qn