CarlosGG's Knowledge Garden đŸȘŽ

Search

Search IconIcon to open search

Transformers

Last updated Mar 20, 2023 Edit Source

The transformer is a deep learning model that uses self-attention to process sequential input data, such as natural language, all at once. It was introduced in 2017 by a team at Google Brain and has since been used primarily in the fields of AI/NLP and AI/Computer Vision/Computer vision. Unlike AI/Deep learning/RNNs, transformers process the entire input all at once and can learn long-range dependencies between input and output sequences more efficiently. The transformer architecture follows an encoder-decoder structure but does not rely on recurrence and convolutions to generate an output. Instead, it uses multi-headed attention mechanisms to directly model relationships between all words in a sentence, regardless of their respective position. The encoder compresses an input string from the source language into a vector that represents the words and their relations to each other. The decoder module transforms the encoded vector into a string of text in the destination language. Multi-headed attention is a module for attention mechanisms that runs through an attention mechanism several times in parallel. Each of these parallel computations is called an attention head, and the independent outputs are concatenated and linearly transformed into the expected dimension. Multi-head attention allows neural networks to control the mixing of information between pieces of an input sequence, leading to better performance in natural language processing tasks such as machine translation and text summarization.

# Resources

# Courses

# Code

# References

# For NLP

# For Computer Vision

# Self-supervised vision transformers

# Vision transformers with convolutions

# Multi-modal transformers

See AI/Deep learning/Multimodal learning

# For RL

See “Decision transformer” in AI/Reinforcement learning