CarlosGG's Knowledge Garden 🪴

Search

VLMs

Last updated Nov 14, 2024 Edit Source

Vision language models are models that can learn simultaneously from images and texts to tackle many tasks, from visual question answering to image captioning

# Resources

# Code

# References

#PAPER An Introduction to Vision-Language Modeling (2024)
#PAPER Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling (2024)