VLMs
Vision language models are models that can learn simultaneously from images and texts to tackle many tasks, from visual question answering to image captioning
# Resources
- An Introduction to VLMs: The Future of Computer Vision Models | by Ro Isachenko | Nov, 2024 | Towards Data Science
- Vision Language Models Explained