PyPI page
Home page
Author:
Benjamin Elizalde
License:
MIT
Summary:
CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic concepts from natural language supervision and enables “Zero-Shot” inference. The model has been extensively evaluated in 26 audio downstream tasks achieving SoTA in several of them including classification, retrieval, and captioning.
Latest version:
1.3.4
Required dependencies:
librosa
|
numpy
|
pandas
|
pyyaml
|
scikit-learn
|
torch
|
torchaudio
|
torchlibrosa
|
tqdm
|
transformers
Downloads last day:
208
Downloads last week:
1,874
Downloads last month:
4,294