msclap

PyPI page
Home page
Author: Benjamin Elizalde
License: MIT
Summary: CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic concepts from natural language supervision and enables “Zero-Shot” inference. The model has been extensively evaluated in 26 audio downstream tasks achieving SoTA in several of them including classification, retrieval, and captioning.
Latest version: 1.3.4
Required dependencies: librosa | numpy | pandas | pyyaml | scikit-learn | torch | torchaudio | torchlibrosa | tqdm | transformers

Downloads last day: 208
Downloads last week: 1,874
Downloads last month: 4,294