Reader

The What and Why of Text-Image Modality Gap in CLIP Models

| Jina AI | Default
You can't just use a CLIP model to retrieve text and images and sort the results by score. Why? Because of the modality gap. What is it, and where does it come from?