Reader

Text-Image Global Contrastive Alignment and Token-Patch Local Alignment

2025-01-07 11:23:50 +0000 UTC | Jina AI | Default

CLIP can visualize token-patch similarities, however, it’s more of a post-hoc interpretability trick than a robust or official "attention" from the model. Here's why.