Reader

Text-Image Global Contrastive Alignment and Token-Patch Local Alignment

| Jina AI | Default
CLIP can visualize token-patch similarities, however, it’s more of a post-hoc interpretability trick than a robust or official "attention" from the model. Here's why.