Tokenization, in LLMs, means chopping input texts up into smaller parts for processing. So why are embeddings billed by the token?