diff --git a/doc/sphinxdoc/_templates/sphinxdoc_mtg/static/docs.css b/doc/sphinxdoc/_templates/sphinxdoc_mtg/static/docs.css index ffd44ae7d..4c067824e 100644 --- a/doc/sphinxdoc/_templates/sphinxdoc_mtg/static/docs.css +++ b/doc/sphinxdoc/_templates/sphinxdoc_mtg/static/docs.css @@ -333,3 +333,102 @@ section h3:hover .headerlink, .highlight .il { color: #40a070; } /* Literal.Number.Integer.Long */ + +/* Collapsible sections styling (W3Schools-like) */ +/* Target all details elements generated by sphinx_toolbox.collapse */ +details[class^="summary-"] { + margin-bottom: 10px; +} + +details[class^="summary-"] > summary { + background-color: #087e8b; + color: white; + cursor: pointer; + padding: 12px 18px; + width: 100%; + border: none; + text-align: left; + outline: none; + font-size: 15px; + font-weight: 500; + border-radius: 4px; + transition: background-color 0.3s ease; + list-style: none; + display: block; +} + +details[class^="summary-"] > summary::-webkit-details-marker { + display: none; +} + +details[class^="summary-"] > summary::before { + content: '\002B'; /* + sign */ + color: white; + font-weight: bold; + float: right; + margin-left: 10px; +} + +details[class^="summary-"][open] > summary::before { + content: '\2212'; /* - sign */ +} + +details[class^="summary-"] > summary:hover { + background-color: #055861; +} + +details[class^="summary-"][open] > summary { + background-color: #055861; + border-radius: 4px 4px 0 0; +} + +/* Content area - use details itself as the box container */ +details[class^="summary-"][open] { + background-color: #f9f9f9; + border: 1px solid #ddd; + border-radius: 4px; + margin-bottom: 15px; +} + +details[class^="summary-"][open] > summary { + border-radius: 4px 4px 0 0; + margin: -1px -1px 0 -1px; + width: calc(100% + 2px); +} + +details[class^="summary-"][open] > *:not(summary) { + padding: 5px 15px; + margin: 0; +} + +details[class^="summary-"][open] > *:not(summary):last-child { + padding-bottom: 15px; +} + +/* Hide the empty line-block spacer and reduce top spacing */ +details[class^="summary-"][open] > .line-block:first-of-type { + display: none; +} + +details[class^="summary-"][open] > *:not(summary):not(.line-block):first-of-type, +details[class^="summary-"][open] > .line-block + * { + padding-top: 10px; +} + +/* Model items - gray style for individual models */ +details[class^="summary-"]:not(.summary-reference-bibtex) > summary { + background-color: #f1f1f1; + color: #333; +} + +details[class^="summary-"]:not(.summary-reference-bibtex) > summary:hover { + background-color: #e0e0e0; +} + +details[class^="summary-"]:not(.summary-reference-bibtex)[open] > summary { + background-color: #e0e0e0; +} + +details[class^="summary-"]:not(.summary-reference-bibtex) > summary::before { + color: #333; +} diff --git a/doc/sphinxdoc/models.rst b/doc/sphinxdoc/models.rst index a77be66ba..c965459ad 100644 --- a/doc/sphinxdoc/models.rst +++ b/doc/sphinxdoc/models.rst @@ -15,9 +15,9 @@ Some of our models can work in real-time, opening many possibilities for audio d -.. highlight:: none +If you use any of the models in your research, please cite the following paper: -If you use any of the models in your research, please cite the following paper:: +.. code-block:: bibtex @inproceedings{alonso2020tensorflow, title={Tensorflow Audio Models in {Essentia}}, @@ -26,8 +26,6 @@ If you use any of the models in your research, please cite the following paper:: year={2020} } -.. highlight:: default - Feature extractors @@ -37,105 +35,142 @@ Feature extractors AudioSet-VGGish ^^^^^^^^^^^^^^^ -Audio embedding model accompanying the AudioSet dataset, trained in a supervised manner using tag information for YouTube videos. +Audio embedding model accompanying the AudioSet dataset, trained in a supervised manner using tag information from YouTube videos. + +**Models** + +.. collapse:: audioset-vggish + + | + + ⬇️ `Weights `__ 📄 `Metadata `__ -Models: + Python code for embedding extraction: - .. collapse:: ⬇️ audioset-vggish + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/vggish/audioset-vggish-3_embeddings.py - | +**References** - [`weights `_, `metadata `_] +.. list-table:: + :widths: auto + :header-rows: 0 - Python code for embedding extraction: + * - 📄 `Paper `__ + - 💻 `TensorFlow Models `__ + - 🌐 `AudioSet `__ - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/vggish/audioset-vggish-3_embeddings.py +.. code-block:: bibtex + + @inproceedings{hershey2017cnn, + title={{CNN} Architectures for Large-Scale Audio Classification}, + author={Hershey, Shawn and Chaudhuri, Sourish and Ellis, Daniel P. W. and Gemmeke, Jort F. and Jansen, Aren and Moore, R. Channing and Plakal, Manoj and Platt, Devin and Saurous, Rif A. and Seybold, Bryan and Slaney, Malcolm and Weiss, Ron J. and Wilson, Kevin}, + booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, + year={2017} + } Discogs-EffNet ^^^^^^^^^^^^^^ -Audio embedding models trained with classification and contrastive learning objectives using an in-house dataset annotated with Discogs metadata. +Audio embedding models trained following classification and contrastive learning objectives on an in-house dataset annotated with Discogs metadata. The classification model was trained to predict music style labels. -The contrastive learning models were trained to learn music similarity capable of grouping audio tracks coming from the same artist, ``label`` (record label), ``release`` (album), or segments of the same ``track`` itself (self-supervised learning). +The contrastive learning models were trained to capture music similarity by attracting audio tracks coming from the same artist, ``label`` (record label), ``release`` (album), or segments of the same ``track`` itself (self-supervised learning). Additionally, ``multi`` was trained in multiple similarity targets simultaneously. -Models: +**Models** - .. collapse:: ⬇️ discogs-effnet-bs64 +.. collapse:: discogs-effnet-bs64 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 400 Discogs styles. + Model trained with a multi-label classification objective targeting 400 Discogs styles. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs-effnet-bs64-1_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs-effnet-bs64-1_embeddings.py - .. collapse:: ⬇️ discogs_artist_embeddings-effnet-bs64 +.. collapse:: discogs_artist_embeddings-effnet-bs64 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a contrastive learning objective targeting artist associations. + Model trained with a contrastive learning objective targeting artist associations. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_artist_embeddings-effnet-bs64-1_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_artist_embeddings-effnet-bs64-1_embeddings.py - .. collapse:: ⬇️ discogs_label_embeddings-effnet-bs64 +.. collapse:: discogs_label_embeddings-effnet-bs64 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a contrastive learning objective targeting record label associations. + Model trained with a contrastive learning objective targeting record label associations. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_label_embeddings-effnet-bs64-1_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_label_embeddings-effnet-bs64-1_embeddings.py - .. collapse:: ⬇️ discogs_multi_embeddings-effnet-bs64 +.. collapse:: discogs_multi_embeddings-effnet-bs64 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a contrastive learning objective targeting aritst and track associations in a multi-task setup. + Model trained with a contrastive learning objective targeting aritst and track associations in a multi-task setup. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_multi_embeddings-effnet-bs64-1_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_multi_embeddings-effnet-bs64-1_embeddings.py - .. collapse:: ⬇️ discogs_release_embeddings-effnet-bs64 +.. collapse:: discogs_release_embeddings-effnet-bs64 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a contrastive learning objective targeting release (album) associations. + Model trained with a contrastive learning objective targeting release (album) associations. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_release_embeddings-effnet-bs64-1_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_release_embeddings-effnet-bs64-1_embeddings.py - .. collapse:: ⬇️ discogs_track_embeddings-effnet-bs64 +.. collapse:: discogs_track_embeddings-effnet-bs64 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a contrastive learning objective targeting track (self-supervised) associations. + Model trained with a contrastive learning objective targeting track (self-supervised) associations. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_track_embeddings-effnet-bs64-1_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/discogs-effnet/discogs_track_embeddings-effnet-bs64-1_embeddings.py *Note: We provide models operating with a fixed batch size of 64 samples since it was not possible to port the version with dynamic batch size from ONNX to TensorFlow. Additionally, an ONNX version of the model with* `dynamic batch `_ *size is provided.* +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Paper `__ + - 🌐 `Discogs `__ + +.. code-block:: bibtex + + @inproceedings{alonso2022music, + title={Music Representation Learning Based on Editorial Metadata from Discogs}, + author={Alonso-Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry}, + booktitle={International Society for Music Information Retrieval Conference (ISMIR)}, + year={2022} + } + MAEST ^^^^^ @@ -150,110 +185,127 @@ To train downstream models, we recommend using the embeddings from the ``CLS`` In the following examples, we extract embeddings from the 7th layer of the transformer since this is what performed the best in our downstream classification tasks. To extract embeddings from other layers, change the ``output`` parameter according to the layer names provided in the metadata files. +**Models** -Models: - - .. collapse:: ⬇️ discogs-maest-30s-pw-519l +.. collapse:: discogs-maest-30s-pw-519l - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 519 Discogs styles on an extended dataset of 4M tracks. + Model trained with a multi-label classification objective targeting 519 Discogs styles on an extended dataset of 4M tracks. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-30s-pw-519l-2_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-30s-pw-519l-2_embeddings.py - .. collapse:: ⬇️ discogs-maest-30s-pw +.. collapse:: discogs-maest-30s-pw - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 400 Discogs styles. + Model trained with a multi-label classification objective targeting 400 Discogs styles. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-30s-pw-2_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-30s-pw-2_embeddings.py - .. collapse:: ⬇️ discogs-maest-30s-pw-ts +.. collapse:: discogs-maest-30s-pw-ts - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 400 Discogs styles. + Model trained with a multi-label classification objective targeting 400 Discogs styles. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-30s-pw-ts-2_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-30s-pw-ts-2_embeddings.py - .. collapse:: ⬇️ discogs-maest-20s-pw +.. collapse:: discogs-maest-20s-pw - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 400 Discogs styles. + Model trained with a multi-label classification objective targeting 400 Discogs styles. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-20s-pw-2_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-20s-pw-2_embeddings.py - .. collapse:: ⬇️ discogs-maest-10s-pw +.. collapse:: discogs-maest-10s-pw - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 400 Discogs styles. + Model trained with a multi-label classification objective targeting 400 Discogs styles. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-pw-2_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-pw-2_embeddings.py - .. collapse:: ⬇️ discogs-maest-10s-fs +.. collapse:: discogs-maest-10s-fs - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 400 Discogs styles. + Model trained with a multi-label classification objective targeting 400 Discogs styles. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-fs-2_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-fs-2_embeddings.py - .. collapse:: ⬇️ discogs-maest-10s-dw +.. collapse:: discogs-maest-10s-dw - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 400 Discogs styles. + Model trained with a multi-label classification objective targeting 400 Discogs styles. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-dw-2_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-dw-2_embeddings.py - .. collapse:: ⬇️ discogs-maest-5s-pw +.. collapse:: discogs-maest-5s-pw - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Model trained with a multi-label classification objective targeting 400 Discogs styles. + Model trained with a multi-label classification objective targeting 400 Discogs styles. - Python code for embedding extraction: + Python code for embedding extraction: - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-5s-pw-2_embeddings.py + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-5s-pw-2_embeddings.py *Note:* ``discogs-maest-30s-pw-519l`` *is an updated version of MAEST trained on a larger dataset of 4M tracks and 519 music style lables. It is expected to show slightly better performance.* *Note: We provide TensorFlow models operating with a fixed batch size of 1. Additionally, ONNX version of the models supporting dynamic batch sizes are provided.* +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @inproceedings{alonso2023efficient, + title={Efficient Supervised Training of Audio Transformers for Music Representation Learning}, + author={Alonso-Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry}, + booktitle={International Society for Music Information Retrieval Conference (ISMIR)}, + year={2023} + } + OpenL3 ^^^^^^ @@ -261,101 +313,136 @@ OpenL3 Audio embedding models trained on audio-visual correspondence in a self-supervised manner. There are different versions of OpenL3 trained on environmental sound (``env``) or music (``music``) datasets, using 128 (``mel128``) or 256 (``mel256``) mel-bands, and with 512 (``emb512``) or 6144 (``emb6144``) embedding dimensions. -Models: +**Models** + +.. collapse:: openl3-env-mel128-emb512 + + | + + ⬇️ `Weights `__ 📄 `Metadata `__ + + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + +.. collapse:: openl3-env-mel128-emb6144 - .. collapse:: ⬇️ openl3-env-mel128-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +.. collapse:: openl3-env-mel256-emb512 - .. collapse:: ⬇️ openl3-env-mel128-emb6144 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +.. collapse:: openl3-env-mel256-emb6144 - .. collapse:: ⬇️ openl3-env-mel256-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +.. collapse:: openl3-music-mel128-emb512 - .. collapse:: ⬇️ openl3-env-mel256-emb6144 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +.. collapse:: openl3-music-mel128-emb6144 - .. collapse:: ⬇️ openl3-music-mel128-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +.. collapse:: openl3-music-mel256-emb512 - .. collapse:: ⬇️ openl3-music-mel128-emb6144 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +.. collapse:: openl3-music-mel256-emb6144 - .. collapse:: ⬇️ openl3-music-mel256-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +**References** - .. collapse:: ⬇️ openl3-music-mel256-emb6144 +.. list-table:: + :widths: auto + :header-rows: 0 - | + * - 📄 `Paper `__ + - 💻 `GitHub `__ - [`weights `_, `metadata `_] +.. code-block:: bibtex - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + @inproceedings{cramer2019look, + title={Look, Listen and Learn More: Design Choices for Deep Audio Embeddings}, + author={Cramer, Jason and Wu, Ho-Hsiang and Salamon, Justin and Bello, Juan Pablo}, + booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, + year={2019} + } MSD-MusiCNN ^^^^^^^^^^^ -A Music embedding extractor based on auto-tagging with the 50 most common tags of the `Million Song Dataset `_. +Audio music embedding extractor based on auto-tagging using the 50 most common tags of the `Last.fm/Million Song Dataset `_. + +**Models** +.. collapse:: msd-musicnn -Models: + | - .. collapse:: ⬇️ msd-musicnn + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for embedding extraction: - [`weights `_, `metadata `_] + .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/musicnn/msd-musicnn-1_embeddings.py - Python code for embedding extraction: +**References** - .. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/musicnn/msd-musicnn-1_embeddings.py +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @inproceedings{pons2019musicnn, + title={musicnn: Pre-trained convolutional neural networks for music audio tagging}, + author={Pons, Jordi and Serra, Xavier}, + booktitle={Late-Breaking Demo, International Society for Music Information Retrieval Conference (ISMIR)}, + year={2019} + } Classifiers ----------- -Classification and regression models based on embeddings. -Instead of working with mel-spectrograms, these models require embeddings as input. -The name of these models is a combination of the classification/regression task and the name of the :ref:`embedding model` that should be used to extract embeddings (``-``). +Classification and regression models built on top of audio embeddings. +Unlike models that operate directly on audio or mel-spectrograms, these models take precomputed embeddings as input. +Model names follow the pattern ``-``, where ```` is the classification or regression objective and ```` refers to the :ref:`embedding model ` used to generate the embeddings. -*Note: TensorflowPredict2D has to be configured with the correct output layer name for each classifier. Check the attached JSON file to find the name of the output layer on each case.* +*Note: TensorflowPredict2D must be configured with the correct output layer name for each model. Refer to the attached JSON file to find the appropriate output layer for each case.* Music genre and style @@ -387,87 +474,106 @@ Music style classification by 400 styles from the Discogs taxonomy:: .. highlight:: default -Models: +**Models** + +.. collapse:: genre_discogs400-discogs-effnet - .. collapse:: ⬇️ genre_discogs400-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-effnet-1_predictions.py +.. collapse:: genre_discogs400-discogs-maest-5s-pw - .. collapse:: ⬇️ genre_discogs400-discogs-maest-5s-pw + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + python code for predictions: - python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-5s-pw-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-5s-pw-1_predictions.py +.. collapse:: genre_discogs400-discogs-maest-10-pw - .. collapse:: ⬇️ genre_discogs400-discogs-maest-10-pw + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + python code for predictions: - python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-10s-pw-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-10s-pw-1_predictions.py +.. collapse:: genre_discogs400-discogs-maest-10s-fs - .. collapse:: ⬇️ genre_discogs400-discogs-maest-10s-fs + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + python code for predictions: - python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-10s-fs-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-10s-fs-1_predictions.py +.. collapse:: genre_discogs400-discogs-maest-30s-dw - .. collapse:: ⬇️ genre_discogs400-discogs-maest-30s-dw + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + python code for predictions: - python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-10s-dw-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-10s-dw-1_predictions.py +.. collapse:: genre_discogs400-discogs-maest-20s-pw - .. collapse:: ⬇️ genre_discogs400-discogs-maest-20s-pw + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + python code for predictions: - python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-20s-pw-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-20s-pw-1_predictions.py +.. collapse:: genre_discogs400-discogs-maest-30s-pw - .. collapse:: ⬇️ genre_discogs400-discogs-maest-30s-pw + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + python code for predictions: - python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-30s-pw-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-30s-pw-1_predictions.py +.. collapse:: genre_discogs400-discogs-maest-30s-pw-ts - .. collapse:: ⬇️ genre_discogs400-discogs-maest-30s-pw-ts + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + python code for predictions: - python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-30s-pw-ts-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs400/genre_discogs400-discogs-maest-30s-pw-ts-1_predictions.py + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @inproceedings{alonso2023efficient, + title={Efficient Supervised Training of Audio Transformers for Music Representation Learning}, + author={Alonso-Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry}, + booktitle={International Society for Music Information Retrieval Conference (ISMIR)}, + year={2023} + } Genre Discogs519 @@ -496,17 +602,36 @@ Music style classification by 519 styles from the Discogs taxonomy:: .. highlight:: default -Models: +**Models** + +.. collapse:: genre_discogs519 - .. collapse:: ⬇️ genre_discogs519 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + python code for predictions: - python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs519/genre_discogs519-discogs-maest-30s-pw-519l-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/genre_discogs519/genre_discogs519-discogs-maest-30s-pw-519l-1_predictions.py + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @inproceedings{alonso2023efficient, + title={Efficient Supervised Training of Audio Transformers for Music Representation Learning}, + author={Alonso-Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry}, + booktitle={International Society for Music Information Retrieval Conference (ISMIR)}, + year={2023} + } MTG-Jamendo genre @@ -526,67 +651,89 @@ Multi-label classification with the genre subset of MTG-Jamendo Dataset (87 clas .. highlight:: default -Models: +**Models** + +.. collapse:: mtg_jamendo_genre-discogs-effnet + + | + + ⬇️ `Weights `__ 📄 `Metadata `__ + + Python code for predictions: + + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs-effnet-1_predictions.py + +.. collapse:: mtg_jamendo_genre-discogs_artist_embeddings-effnet + + | - .. collapse:: ⬇️ mtg_jamendo_genre-discogs-effnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_artist_embeddings-effnet-1_predictions.py - Python code for predictions: + .. collapse:: ⬇️ mtg_jamendo_genre-discogs_label_embeddings-effnet - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs-effnet-1_predictions.py + | - .. collapse:: ⬇️ mtg_jamendo_genre-discogs_artist_embeddings-effnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_label_embeddings-effnet-1_predictions.py - Python code for predictions: +.. collapse:: mtg_jamendo_genre-discogs_multi_embeddings-effnet - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_artist_embeddings-effnet-1_predictions.py + | - .. collapse:: ⬇️ mtg_jamendo_genre-discogs_label_embeddings-effnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_multi_embeddings-effnet-1_predictions.py - Python code for predictions: +.. collapse:: mtg_jamendo_genre-discogs_release_embeddings-effnet - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_label_embeddings-effnet-1_predictions.py + | - .. collapse:: ⬇️ mtg_jamendo_genre-discogs_multi_embeddings-effnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_release_embeddings-effnet-1_predictions.py - Python code for predictions: +.. collapse:: mtg_jamendo_genre-discogs_track_embeddings-effnet - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_multi_embeddings-effnet-1_predictions.py + | - .. collapse:: ⬇️ mtg_jamendo_genre-discogs_release_embeddings-effnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_track_embeddings-effnet-1_predictions.py - Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_release_embeddings-effnet-1_predictions.py +**References** - .. collapse:: ⬇️ mtg_jamendo_genre-discogs_track_embeddings-effnet +.. list-table:: + :widths: auto + :header-rows: 0 - | + * - 📄 `Dataset paper `__ + - 💻 `GitHub `__ - [`weights `_, `metadata `_] +.. code-block:: bibtex - Python code for predictions: + @conference{bogdanov2019mtg, + author = "Bogdanov, Dmitry and Won, Minz and Tovstogan, Philip and Porter, Alastair and Serra, Xavier", + title = "The MTG-Jamendo Dataset for Automatic Music Tagging", + booktitle = "Machine Learning for Music Discovery Workshop, International Conference on Machine Learning (ICML 2019)", + year = "2019", + address = "Long Beach, CA, United States", + url = "http://hdl.handle.net/10230/42015" + } - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs_track_embeddings-effnet-1_predictions.py Moods and context @@ -598,37 +745,37 @@ Approachability Music approachability predicts whether the music is likely to be accessible to the general public (e.g., belonging to common mainstream music genres vs. niche and experimental genres). The models output rather two (``approachability_2c``) or three (``approachability_3c``) levels of approachability or continous values (``approachability_regression``). -Models: +**Models** - .. collapse:: ⬇️ approachability_2c-discogs-effnet +.. collapse:: approachability_2c-discogs-effnet - | + | - [`weights `_, `metadata `_, `demo `_] + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/approachability/approachability_2c-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/approachability/approachability_2c-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ approachability_3c-discogs-effnet +.. collapse:: approachability_3c-discogs-effnet - | + | - [`weights `_, `metadata `_, `demo `_] + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/approachability/approachability_3c-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/approachability/approachability_3c-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ approachability_regression-discogs-effnet +.. collapse:: approachability_regression-discogs-effnet - | + | - [`weights `_, `metadata `_, `demo `_] + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/approachability/approachability_regression-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/approachability/approachability_regression-discogs-effnet-1_predictions.py @@ -638,37 +785,37 @@ Engagement Music engagement predicts whether the music evokes active attention of the listener (high-engagement "lean forward" active listening vs. low-engagement "lean back" background listening). The models output rather two (``engagement_2c``) or three (``engagement_3c``) levels of engagement or continuous (``engagement_regression``) values (regression). -Models: +**Models** - .. collapse:: ⬇️ engagement_2c-discogs-effnet +.. collapse:: engagement_2c-discogs-effnet - | + | - [`weights `_, `metadata `_, `demo `_] + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/engagement/engagement_2c-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/engagement/engagement_2c-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ engagement_3c-discogs-effnet +.. collapse:: engagement_3c-discogs-effnet - | + | - [`weights `_, `metadata `_, `demo `_] + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/engagement/engagement_3c-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/engagement/engagement_3c-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ engagement_regression-discogs-effnet +.. collapse:: engagement_regression-discogs-effnet - | + | - [`weights `_, `metadata `_, `demo `_] + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/engagement/engagement_regression-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/engagement/engagement_regression-discogs-effnet-1_predictions.py @@ -679,27 +826,47 @@ Music arousal and valence regression with the `DEAM `__ 📄 `Metadata `__ 🎸 `Demo `__ - .. collapse:: ⬇️ deam-msd-musicnn + Python code for predictions: - | + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/deam/deam-msd-musicnn-2_predictions.py - [`weights `_, `metadata `_, `demo `_] +.. collapse:: deam-audioset-vggish - Python code for predictions: + | - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/deam/deam-msd-musicnn-2_predictions.py + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - .. collapse:: ⬇️ deam-audioset-vggish + Python code for predictions: - | + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/deam/deam-audioset-vggish-2_predictions.py - [`weights `_, `metadata `_, `demo `_] +**References** - Python code for predictions: +.. list-table:: + :widths: auto + :header-rows: 0 - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/deam/deam-audioset-vggish-2_predictions.py + * - 📄 `Dataset paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @conference{bogdanov2022mtg, + author = "Bogdanov, Dmitry and Lizarraga-Seijas, Xavier and Alonso-Jiménez, Pablo and Serra, Xavier", + title = "MusAV: A dataset of relative arousal-valence annotations for validation of audio models", + booktitle = "International Society for Music Information Retrieval Conference (ISMIR 2022)", + year = "2022", + address = "Bengaluru, India", + url = "http://hdl.handle.net/10230/54181" + } @@ -710,27 +877,47 @@ Music arousal and valence regression with the `emoMusic emomusic-msd-musicnn +.. collapse:: emomusic-msd-musicnn - | + | - [`weights `_, `metadata `_, `demo `_] + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/emomusic/emomusic-msd-musicnn-2_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/emomusic/emomusic-msd-musicnn-2_predictions.py - .. collapse:: ⬇️ emomusic-audioset-vggish +.. collapse:: emomusic-audioset-vggish - | + | - [`weights `_, `metadata `_, `demo `_] + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/emomusic/emomusic-audioset-vggish-2_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/emomusic/emomusic-audioset-vggish-2_predictions.py + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @conference{bogdanov2022mtg, + author = "Bogdanov, Dmitry and Lizarraga-Seijas, Xavier and Alonso-Jiménez, Pablo and Serra, Xavier", + title = "MusAV: A dataset of relative arousal-valence annotations for validation of audio models", + booktitle = "International Society for Music Information Retrieval Conference (ISMIR 2022)", + year = "2022", + address = "Bengaluru, India", + url = "http://hdl.handle.net/10230/54181" + } @@ -741,27 +928,47 @@ Music arousal and valence regression with the `MuSE muse-msd-musicnn + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/muse/muse-msd-musicnn-2_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/muse/muse-msd-musicnn-2_predictions.py +.. collapse:: muse-audioset-vggish - .. collapse:: ⬇️ muse-audioset-vggish + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ 🎸 `Demo `__ - [`weights `_, `metadata `_, `demo `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/muse/muse-audioset-vggish-2_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/muse/muse-audioset-vggish-2_predictions.py +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @conference{bogdanov2022mtg, + author = "Bogdanov, Dmitry and Lizarraga-Seijas, Xavier and Alonso-Jiménez, Pablo and Serra, Xavier", + title = "MusAV: A dataset of relative arousal-valence annotations for validation of audio models", + booktitle = "International Society for Music Information Retrieval Conference (ISMIR 2022)", + year = "2022", + address = "Bengaluru, India", + url = "http://hdl.handle.net/10230/54181" + } @@ -772,55 +979,55 @@ Music danceability (2 classes):: danceable, not_danceable -Models: +**Models** - .. collapse:: ⬇️ danceability-audioset-vggish +.. collapse:: danceability-audioset-vggish - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/danceability/danceability-audioset-vggish-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/danceability/danceability-audioset-vggish-1_predictions.py - .. collapse:: ⬇️ danceability-audioset-yamnet +.. collapse:: danceability-audioset-yamnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/danceability/danceability-audioset-yamnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/danceability/danceability-audioset-yamnet-1_predictions.py - .. collapse:: ⬇️ danceability-discogs-effnet +.. collapse:: danceability-discogs-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/danceability/danceability-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/danceability/danceability-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ danceability-msd-musicnn +.. collapse:: danceability-msd-musicnn - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/danceability/danceability-msd-musicnn-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/danceability/danceability-msd-musicnn-1_predictions.py - .. collapse:: ⬇️ danceability-openl3-music-mel128-emb512 +.. collapse:: danceability-openl3-music-mel128-emb512 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. @@ -832,55 +1039,74 @@ Music classification by mood (2 classes):: aggressive, non_aggressive -Models: +**Models** + +.. collapse:: mood_aggressive-audioset-vggish + + | - .. collapse:: ⬇️ mood_aggressive-audioset-vggish + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_aggressive/mood_aggressive-audioset-vggish-1_predictions.py - Python code for predictions: +.. collapse:: mood_aggressive-audioset-yamnet - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_aggressive/mood_aggressive-audioset-vggish-1_predictions.py + | - .. collapse:: ⬇️ mood_aggressive-audioset-yamnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_aggressive/mood_aggressive-audioset-yamnet-1_predictions.py - Python code for predictions: +.. collapse:: mood_aggressive-discogs-effnet - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_aggressive/mood_aggressive-audioset-yamnet-1_predictions.py + | - .. collapse:: ⬇️ mood_aggressive-discogs-effnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_aggressive/mood_aggressive-discogs-effnet-1_predictions.py - Python code for predictions: +.. collapse:: mood_aggressive-msd-musicnn - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_aggressive/mood_aggressive-discogs-effnet-1_predictions.py + | - .. collapse:: ⬇️ mood_aggressive-msd-musicnn + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_aggressive/mood_aggressive-msd-musicnn-1_predictions.py - Python code for predictions: +.. collapse:: mood_aggressive-openl3-music-mel128-emb512 - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_aggressive/mood_aggressive-msd-musicnn-1_predictions.py + | - .. collapse:: ⬇️ mood_aggressive-openl3-music-mel128-emb512 + ⬇️ `Weights `__ 📄 `Metadata `__ - | + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - [`weights `_, `metadata `_] +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + +.. code-block:: bibtex + + @inproceedings{laurier2008multimodal, + author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto}, + title={Multimodal Music Mood Classification Using Audio and Lyrics}, + booktitle={2008 Seventh International Conference on Machine Learning and Applications}, + year={2008}, + doi={10.1109/ICMLA.2008.96} + } - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. Mood Happy @@ -890,55 +1116,74 @@ Music classification by mood (2 classes):: happy, non_happy -Models: +**Models** + +.. collapse:: mood_happy-audioset-vggish - .. collapse:: ⬇️ mood_happy-audioset-vggish + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_happy/mood_happy-audioset-vggish-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_happy/mood_happy-audioset-vggish-1_predictions.py +.. collapse:: mood_happy-audioset-yamnet - .. collapse:: ⬇️ mood_happy-audioset-yamnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_happy/mood_happy-audioset-yamnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_happy/mood_happy-audioset-yamnet-1_predictions.py +.. collapse:: mood_happy-discogs-effnet - .. collapse:: ⬇️ mood_happy-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_happy/mood_happy-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_happy/mood_happy-discogs-effnet-1_predictions.py +.. collapse:: mood_happy-msd-musicnn - .. collapse:: ⬇️ mood_happy-msd-musicnn + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_happy/mood_happy-msd-musicnn-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_happy/mood_happy-msd-musicnn-1_predictions.py +.. collapse:: mood_happy-openl3-music-mel128-emb512 - .. collapse:: ⬇️ mood_happy-openl3-music-mel128-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + +.. code-block:: bibtex + + @inproceedings{laurier2008multimodal, + author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto}, + title={Multimodal Music Mood Classification Using Audio and Lyrics}, + booktitle={2008 Seventh International Conference on Machine Learning and Applications}, + year={2008}, + doi={10.1109/ICMLA.2008.96} + } - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. Mood Party @@ -948,55 +1193,74 @@ Music classification by mood (2 classes):: party, non_party -Models: +**Models** + +.. collapse:: mood_party-audioset-vggish - .. collapse:: ⬇️ mood_party-audioset-vggish + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_party/mood_party-audioset-vggish-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_party/mood_party-audioset-vggish-1_predictions.py +.. collapse:: mood_party-audioset-yamnet - .. collapse:: ⬇️ mood_party-audioset-yamnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_party/mood_party-audioset-yamnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_party/mood_party-audioset-yamnet-1_predictions.py +.. collapse:: mood_party-discogs-effnet - .. collapse:: ⬇️ mood_party-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_party/mood_party-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_party/mood_party-discogs-effnet-1_predictions.py +.. collapse:: mood_party-msd-musicnn - .. collapse:: ⬇️ mood_party-msd-musicnn + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_party/mood_party-msd-musicnn-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_party/mood_party-msd-musicnn-1_predictions.py +.. collapse:: mood_party-openl3-music-mel128-emb512 - .. collapse:: ⬇️ mood_party-openl3-music-mel128-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + +.. code-block:: bibtex + + @inproceedings{laurier2008multimodal, + author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto}, + title={Multimodal Music Mood Classification Using Audio and Lyrics}, + booktitle={2008 Seventh International Conference on Machine Learning and Applications}, + year={2008}, + doi={10.1109/ICMLA.2008.96} + } - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. Mood Relaxed @@ -1006,55 +1270,74 @@ Music classification by mood (2 classes):: relaxed, non_relaxed -Models: +**Models** + +.. collapse:: mood_relaxed-audioset-vggish - .. collapse:: ⬇️ mood_relaxed-audioset-vggish + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_relaxed/mood_relaxed-audioset-vggish-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_relaxed/mood_relaxed-audioset-vggish-1_predictions.py +.. collapse:: mood_relaxed-audioset-yamnet - .. collapse:: ⬇️ mood_relaxed-audioset-yamnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_relaxed/mood_relaxed-audioset-yamnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_relaxed/mood_relaxed-audioset-yamnet-1_predictions.py +.. collapse:: mood_relaxed-discogs-effnet - .. collapse:: ⬇️ mood_relaxed-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_relaxed/mood_relaxed-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_relaxed/mood_relaxed-discogs-effnet-1_predictions.py +.. collapse:: mood_relaxed-msd-musicnn - .. collapse:: ⬇️ mood_relaxed-msd-musicnn + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_relaxed/mood_relaxed-msd-musicnn-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_relaxed/mood_relaxed-msd-musicnn-1_predictions.py +.. collapse:: mood_relaxed-openl3-music-mel128-emb512 - .. collapse:: ⬇️ mood_relaxed-openl3-music-mel128-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + +.. code-block:: bibtex + + @inproceedings{laurier2008multimodal, + author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto}, + title={Multimodal Music Mood Classification Using Audio and Lyrics}, + booktitle={2008 Seventh International Conference on Machine Learning and Applications}, + year={2008}, + doi={10.1109/ICMLA.2008.96} + } - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. Mood Sad @@ -1064,55 +1347,74 @@ Music classification by mood (2 classes):: sad, non_sad -Models: +**Models** + +.. collapse:: mood_sad-audioset-vggish - .. collapse:: ⬇️ mood_sad-audioset-yvggish + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_sad/mood_sad-audioset-vggish-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_sad/mood_sad-audioset-vggish-1_predictions.py +.. collapse:: mood_sad-audioset-yamnet - .. collapse:: ⬇️ mood_sad-audioset-yamnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_sad/mood_sad-audioset-yamnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_sad/mood_sad-audioset-yamnet-1_predictions.py +.. collapse:: mood_sad-discogs-effnet - .. collapse:: ⬇️ mood_sad-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_sad/mood_sad-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_sad/mood_sad-discogs-effnet-1_predictions.py +.. collapse:: mood_sad-msd-musicnn - .. collapse:: ⬇️ mood_sad-msd-musicnn + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_sad/mood_sad-msd-musicnn-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_sad/mood_sad-msd-musicnn-1_predictions.py +.. collapse:: mood_sad-openl3-music-mel128-emb512 - .. collapse:: ⬇️ mood_sad-openl3-music-mel128-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + +.. code-block:: bibtex + + @inproceedings{laurier2008multimodal, + author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto}, + title={Multimodal Music Mood Classification Using Audio and Lyrics}, + booktitle={2008 Seventh International Conference on Machine Learning and Applications}, + year={2008}, + doi={10.1109/ICMLA.2008.96} + } - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. Moods MIREX @@ -1130,28 +1432,28 @@ Music classification by mood with the MIREX Audio Mood Classification Dataset (5 .. highlight:: default -Models: +**Models** - .. collapse:: ⬇️ moods_mirex-msd-musicnn +.. collapse:: moods_mirex-msd-musicnn - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/moods_mirex/moods_mirex-msd-musicnn-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/moods_mirex/moods_mirex-msd-musicnn-1_predictions.py - .. collapse:: ⬇️ moods_mirex-audioset-vggish +.. collapse:: moods_mirex-audioset-vggish - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/moods_mirex/moods_mirex-audioset-vggish-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/moods_mirex/moods_mirex-audioset-vggish-1_predictions.py MTG-Jamendo mood and theme @@ -1165,67 +1467,88 @@ Multi-label classification with mood and theme subset of the MTG-Jamendo Dataset powerful, relaxing, retro, romantic, sad, sexy, slow, soft, soundscape, space, sport, summer, trailer, travel, upbeat, uplifting -Models: +**Models** - .. collapse:: ⬇️ mtg_jamendo_moodtheme-discogs-effnet +.. collapse:: mtg_jamendo_moodtheme-discogs-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_moodtheme-discogs_artist_embeddings-effnet +.. collapse:: mtg_jamendo_moodtheme-discogs_artist_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_artist_embeddings-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_artist_embeddings-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_moodtheme-discogs_label_embeddings-effnet +.. collapse:: mtg_jamendo_moodtheme-discogs_label_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_label_embeddings-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_label_embeddings-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_moodtheme-discogs_multi_embeddings-effnet +.. collapse:: mtg_jamendo_moodtheme-discogs_multi_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_multi_embeddings-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_multi_embeddings-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_moodtheme-discogs_release_embeddings-effnet +.. collapse:: mtg_jamendo_moodtheme-discogs_release_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_release_embeddings-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_release_embeddings-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_moodtheme-discogs_track_embeddings-effnet +.. collapse:: mtg_jamendo_moodtheme-discogs_track_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: + + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_track_embeddings-effnet-1_predictions.py + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @conference{bogdanov2019mtg, + author = "Bogdanov, Dmitry and Won, Minz and Tovstogan, Philip and Porter, Alastair and Serra, Xavier", + title = "The MTG-Jamendo Dataset for Automatic Music Tagging", + booktitle = "Machine Learning for Music Discovery Workshop, International Conference on Machine Learning (ICML 2019)", + year = "2019", + address = "Long Beach, CA, United States", + url = "http://hdl.handle.net/10230/42015" + } - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs_track_embeddings-effnet-1_predictions.py @@ -1245,67 +1568,88 @@ Multi-label classification using the instrument subset of the MTG-Jamendo Datase viola, violin, voice -Models: +**Models** - .. collapse:: ⬇️ mtg_jamendo_instrument-discogs-effnet +.. collapse:: mtg_jamendo_instrument-discogs-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_instrument-discogs_artist_embeddings-effnet +.. collapse:: mtg_jamendo_instrument-discogs_artist_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_artist_embeddings-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_artist_embeddings-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_instrument-discogs_label_embeddings-effnet +.. collapse:: mtg_jamendo_instrument-discogs_label_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_label_embeddings-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_label_embeddings-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_instrument-discogs_multi_embeddings-effnet +.. collapse:: mtg_jamendo_instrument-discogs_multi_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_multi_embeddings-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_multi_embeddings-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_instrument-discogs_release_embeddings-effnet +.. collapse:: mtg_jamendo_instrument-discogs_release_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_release_embeddings-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_release_embeddings-effnet-1_predictions.py - .. collapse:: ⬇️ mtg_jamendo_instrument-discogs_track_embeddings-effnet +.. collapse:: mtg_jamendo_instrument-discogs_track_embeddings-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: + + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_track_embeddings-effnet-1_predictions.py + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @conference{bogdanov2019mtg, + author = "Bogdanov, Dmitry and Won, Minz and Tovstogan, Philip and Porter, Alastair and Serra, Xavier", + title = "The MTG-Jamendo Dataset for Automatic Music Tagging", + booktitle = "Machine Learning for Music Discovery Workshop, International Conference on Machine Learning (ICML 2019)", + year = "2019", + address = "Long Beach, CA, United States", + url = "http://hdl.handle.net/10230/42015" + } - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs_track_embeddings-effnet-1_predictions.py Music loop instrument role @@ -1315,17 +1659,35 @@ Classification of music loops by their instrument role using the `Freesound Loop bass, chords, fx, melody, percussion -Models: +**Models** - .. collapse:: ⬇️ fs_loop_ds-msd-musicnn +.. collapse:: fs_loop_ds-msd-musicnn - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/fs_loop_ds/fs_loop_ds-msd-musicnn-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/fs_loop_ds/fs_loop_ds-msd-musicnn-1_predictions.py + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 🌐 `Zenodo `__ + +.. code-block:: bibtex + + @inproceedings{ramires2020freesound, + title={The Freesound Loop Dataset and Annotation Tool}, + author={Ramires, Ant{\'o}nio and Chandna, Pritish and Favory, Xavier and G{\'o}mez, Emilia and Serra, Xavier}, + booktitle={International Society for Music Information Retrieval Conference (ISMIR)}, + year={2020} + } Mood Acoustic @@ -1335,55 +1697,73 @@ Music classification by type of sound (2 classes):: acoustic, non_acoustic -Models: +**Models** + +.. collapse:: mood_acoustic-audioset-vggish - .. collapse:: ⬇️ mood_acoustic-audioset-vggish + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_acoustic/mood_acoustic-audioset-vggish-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_acoustic/mood_acoustic-audioset-vggish-1_predictions.py +.. collapse:: mood_acoustic-audioset-yamnet - .. collapse:: ⬇️ mood_acoustic-audioset-yamnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_acoustic/mood_acoustic-audioset-yamnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_acoustic/mood_acoustic-audioset-yamnet-1_predictions.py +.. collapse:: mood_acoustic-discogs-effnet - .. collapse:: ⬇️ mood_acoustic-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_acoustic/mood_acoustic-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_acoustic/mood_acoustic-discogs-effnet-1_predictions.py +.. collapse:: mood_acoustic-msd-musicnn - .. collapse:: ⬇️ mood_acoustic-msd-musicnn + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_acoustic/mood_acoustic-msd-musicnn-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_acoustic/mood_acoustic-msd-musicnn-1_predictions.py +.. collapse:: mood_acoustic-openl3-music-mel128-emb512 - .. collapse:: ⬇️ mood_acoustic-openl3-music-mel128-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + +.. code-block:: bibtex + + @inproceedings{laurier2008multimodal, + author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto}, + title={Multimodal Music Mood Classification Using Audio and Lyrics}, + booktitle={2008 Seventh International Conference on Machine Learning and Applications}, + year={2008}, + doi={10.1109/ICMLA.2008.96} + } Mood Electronic @@ -1393,56 +1773,73 @@ Music classification by type of sound (2 classes):: electronic, non_electronic -Models: +**Models** + +.. collapse:: mood_electronic-audioset-vggish - .. collapse:: ⬇️ mood_electronic-audioset-vggish + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_electronic/mood_electronic-audioset-vggish-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_electronic/mood_electronic-audioset-vggish-1_predictions.py +.. collapse:: mood_electronic-audioset-yamnet - .. collapse:: ⬇️ mood_electronic-audioset-yamnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_electronic/mood_electronic-audioset-yamnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_electronic/mood_electronic-audioset-yamnet-1_predictions.py +.. collapse:: mood_electronic-discogs-effnet - .. collapse:: ⬇️ mood_electronic-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_electronic/mood_electronic-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_electronic/mood_electronic-discogs-effnet-1_predictions.py +.. collapse:: mood_electronic-msd-musicnn - .. collapse:: ⬇️ mood_electronic-msd-musicnn + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_electronic/mood_electronic-msd-musicnn-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mood_electronic/mood_electronic-msd-musicnn-1_predictions.py +.. collapse:: mood_electronic-openl3-music-mel128-emb512 - .. collapse:: ⬇️ mood_electronic-openl3-music-mel128-emb512 + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. +**References** +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + +.. code-block:: bibtex + + @inproceedings{laurier2008multimodal, + author={Laurier, Cyril and Grivolla, Jens and Herrera, Perfecto}, + title={Multimodal Music Mood Classification Using Audio and Lyrics}, + booktitle={2008 Seventh International Conference on Machine Learning and Applications}, + year={2008}, + doi={10.1109/ICMLA.2008.96} + } Voice/instrumental @@ -1452,55 +1849,55 @@ Classification of music by presence or absence of voice (2 classes):: instrumental, voice -Models: +**Models** - .. collapse:: ⬇️ voice_instrumental-audioset-vggish +.. collapse:: voice_instrumental-audioset-vggish - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/voice_instrumental/voice_instrumental-audioset-vggish-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/voice_instrumental/voice_instrumental-audioset-vggish-1_predictions.py - .. collapse:: ⬇️ voice_instrumental-audioset-yamnet +.. collapse:: voice_instrumental-audioset-yamnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/voice_instrumental/voice_instrumental-audioset-yamnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/voice_instrumental/voice_instrumental-audioset-yamnet-1_predictions.py - .. collapse:: ⬇️ voice_instrumental-discogs-effnet +.. collapse:: voice_instrumental-discogs-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/voice_instrumental/voice_instrumental-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/voice_instrumental/voice_instrumental-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ voice_instrumental-msd-musicnn +.. collapse:: voice_instrumental-msd-musicnn - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/voice_instrumental/voice_instrumental-msd-musicnn-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/voice_instrumental/voice_instrumental-msd-musicnn-1_predictions.py - .. collapse:: ⬇️ voice_instrumental-openl3-music-mel128-emb512 +.. collapse:: voice_instrumental-openl3-music-mel128-emb512 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. Voice gender @@ -1510,55 +1907,55 @@ Classification of music by singing voice gender (2 classes):: female, male -Models: +**Models** - .. collapse:: ⬇️ gender-audioset-vggish +.. collapse:: gender-audioset-vggish - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/gender/gender-audioset-vggish-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/gender/gender-audioset-vggish-1_predictions.py - .. collapse:: ⬇️ gender-audioset-yamnet +.. collapse:: gender-audioset-yamnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/gender/gender-audioset-yamnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/gender/gender-audioset-yamnet-1_predictions.py - .. collapse:: ⬇️ gender-discogs-effnet +.. collapse:: gender-discogs-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/gender/gender-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/gender/gender-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ gender-msd-musicnn +.. collapse:: gender-msd-musicnn - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/gender/gender-msd-musicnn-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/gender/gender-msd-musicnn-1_predictions.py - .. collapse:: ⬇️ gender-openl3-music-mel128-emb512 +.. collapse:: gender-openl3-music-mel128-emb512 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. @@ -1569,17 +1966,17 @@ Classification of music by timbre color (2 classes):: bright, dark -Models: +**Models** - .. collapse:: ⬇️ timbre-discogs-effnet +.. collapse:: timbre-discogs-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/timbre/timbre-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/timbre/timbre-discogs-effnet-1_predictions.py Nsynth acoustic/electronic @@ -1589,17 +1986,35 @@ Classification of monophonic sources into acoustic or electronic origin using th acoustic, electronic -Models: +**Models** + +.. collapse:: nsynth_acoustic_electronic-discogs-effnet + + | + + ⬇️ `Weights `__ 📄 `Metadata `__ - .. collapse:: ⬇️ nsynth_acoustic_electronic-discogs-effnet + Python code for predictions: - | + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/nsynth_acoustic_electronic/nsynth_acoustic_electronic-discogs-effnet-1_predictions.py - [`weights `_, `metadata `_] +**References** - Python code for predictions: +.. list-table:: + :widths: auto + :header-rows: 0 - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/nsynth_acoustic_electronic/nsynth_acoustic_electronic-discogs-effnet-1_predictions.py + * - 📄 `Dataset paper `__ + - 🌐 `Website `__ + +.. code-block:: bibtex + + @inproceedings{engel2017neural, + title={Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders}, + author={Engel, Jesse and Resnick, Cinjon and Roberts, Adam and Dieleman, Sander and Norouzi, Mohammad and Eck, Douglas and Simonyan, Karen}, + booktitle={International Conference on Machine Learning (ICML)}, + year={2017} + } Nsynth bright/dark @@ -1609,17 +2024,35 @@ Classification of monophonic sources by timbre color using the `Nsynth nsynth_bright_dark-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/nsynth_bright_dark/nsynth_bright_dark-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/nsynth_bright_dark/nsynth_bright_dark-discogs-effnet-1_predictions.py +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 🌐 `Website `__ + +.. code-block:: bibtex + + @inproceedings{engel2017neural, + title={Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders}, + author={Engel, Jesse and Resnick, Cinjon and Roberts, Adam and Dieleman, Sander and Norouzi, Mohammad and Eck, Douglas and Simonyan, Karen}, + booktitle={International Conference on Machine Learning (ICML)}, + year={2017} + } Nsynth instrument @@ -1629,17 +2062,35 @@ Classification of monophonic sources by instrument family using the `Nsynth nsynth_instrument-discogs-effnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/nsynth_instrument/nsynth_instrument-discogs-effnet-1_predictions.py - Python code for predictions: +**References** - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/nsynth_instrument/nsynth_instrument-discogs-effnet-1_predictions.py +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 🌐 `Website `__ + +.. code-block:: bibtex + + @inproceedings{engel2017neural, + title={Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders}, + author={Engel, Jesse and Resnick, Cinjon and Roberts, Adam and Dieleman, Sander and Norouzi, Mohammad and Eck, Douglas and Simonyan, Karen}, + booktitle={International Conference on Machine Learning (ICML)}, + year={2017} + } Nsynth reverb @@ -1649,19 +2100,35 @@ Detection of reverb in monophonic sources using the `Nsynth nsynth_reverb-discogs-effnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/nsynth_reverb/nsynth_reverb-discogs-effnet-1_predictions.py - Python code for predictions: +**References** - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/nsynth_reverb/nsynth_reverb-discogs-effnet-1_predictions.py +.. list-table:: + :widths: auto + :header-rows: 0 + * - 📄 `Dataset paper `__ + - 🌐 `Website `__ +.. code-block:: bibtex + + @inproceedings{engel2017neural, + title={Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders}, + author={Engel, Jesse and Resnick, Cinjon and Roberts, Adam and Dieleman, Sander and Norouzi, Mohammad and Eck, Douglas and Simonyan, Karen}, + booktitle={International Conference on Machine Learning (ICML)}, + year={2017} + } Tonality @@ -1676,55 +2143,55 @@ Music classification by tonality (2 classes):: atonal, tonal -Models: +**Models** - .. collapse:: ⬇️ tonal_atonal-audioset-vggish +.. collapse:: tonal_atonal-audioset-vggish - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/tonal_atonal/tonal_atonal-audioset-vggish-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/tonal_atonal/tonal_atonal-audioset-vggish-1_predictions.py - .. collapse:: ⬇️ tonal_atonal-audioset-yamnet +.. collapse:: tonal_atonal-audioset-yamnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/tonal_atonal/tonal_atonal-audioset-yamnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/tonal_atonal/tonal_atonal-audioset-yamnet-1_predictions.py - .. collapse:: ⬇️ tonal_atonal-discogs-effnet +.. collapse:: tonal_atonal-discogs-effnet - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/tonal_atonal/tonal_atonal-discogs-effnet-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/tonal_atonal/tonal_atonal-discogs-effnet-1_predictions.py - .. collapse:: ⬇️ tonal_atonal-msd-musicnn +.. collapse:: tonal_atonal-msd-musicnn - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/tonal_atonal/tonal_atonal-msd-musicnn-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/tonal_atonal/tonal_atonal-msd-musicnn-1_predictions.py - .. collapse:: ⬇️ tonal_atonal-openl3-music-mel128-emb512 +.. collapse:: tonal_atonal-openl3-music-mel128-emb512 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. + We do not have a dedicated algorithm to extract embeddings with this model. For now, OpenL3 embeddings can be extracted using this `script `_. @@ -1744,57 +2211,78 @@ Music automatic tagging using the top-50 tags of the MTG-Jamendo Dataset:: electricpiano, guitar, keyboard, piano, strings, synthesizer, violin, voice, emotional, energetic, film, happy, relaxing -Models: +**Models** + +.. collapse:: mtg_jamendo_top50tags-discogs-effnet - .. collapse:: ⬇️ mtg_jamendo_top50tags-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs-effnet-1_predictions.py +.. collapse:: mtg_jamendo_top50tags-discogs_label_embeddings-effnet - .. collapse:: ⬇️ mtg_jamendo_top50tags-discogs_label_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs_label_embeddings-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs_label_embeddings-effnet-1_predictions.py +.. collapse:: mtg_jamendo_top50tags-discogs_multi_embeddings-effnet - .. collapse:: ⬇️ mtg_jamendo_top50tags-discogs_multi_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs_multi_embeddings-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs_multi_embeddings-effnet-1_predictions.py +.. collapse:: mtg_jamendo_top50tags-discogs_release_embeddings-effnet - .. collapse:: ⬇️ mtg_jamendo_top50tags-discogs_release_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs_release_embeddings-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs_release_embeddings-effnet-1_predictions.py +.. collapse:: mtg_jamendo_top50tags-discogs_track_embeddings-effnet - .. collapse:: ⬇️ mtg_jamendo_top50tags-discogs_track_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs_track_embeddings-effnet-1_predictions.py + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @conference{bogdanov2019mtg, + author = "Bogdanov, Dmitry and Won, Minz and Tovstogan, Philip and Porter, Alastair and Serra, Xavier", + title = "The MTG-Jamendo Dataset for Automatic Music Tagging", + booktitle = "Machine Learning for Music Discovery Workshop, International Conference on Machine Learning (ICML 2019)", + year = "2019", + address = "Long Beach, CA, United States", + url = "http://hdl.handle.net/10230/42015" + } - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtg_jamendo_top50tags/mtg_jamendo_top50tags-discogs_track_embeddings-effnet-1_predictions.py MagnaTagATune @@ -1807,67 +2295,83 @@ Music automatic tagging with the top-50 tags of the MagnaTagATune dataset:: vocal, no vocals, no voice, opera, piano, pop, quiet, rock, singing, sitar, slow, soft, solo, strings, synth, techno, violin, vocal, vocals, voice, weird, woman -Models: +**Models** + +.. collapse:: mtt-discogs-effnet - .. collapse:: ⬇️ mtt-discogs-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs-effnet-1_predictions.py +.. collapse:: mtt-discogs_artist_embeddings-effnet - .. collapse:: ⬇️ mtt-discogs_artist_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_artist_embeddings-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_artist_embeddings-effnet-1_predictions.py +.. collapse:: mtt-discogs_label_embeddings-effnet - .. collapse:: ⬇️ mtt-discogs_label_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_label_embeddings-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_label_embeddings-effnet-1_predictions.py +.. collapse:: mtt-discogs_multi_embeddings-effnet - .. collapse:: ⬇️ mtt-discogs_multi_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_multi_embeddings-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_multi_embeddings-effnet-1_predictions.py +.. collapse:: mtt-discogs_release_embeddings-effnet - .. collapse:: ⬇️ mtt-discogs_release_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_release_embeddings-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_release_embeddings-effnet-1_predictions.py +.. collapse:: mtt-discogs_track_embeddings-effnet - .. collapse:: ⬇️ mtt-discogs_track_embeddings-effnet + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_track_embeddings-effnet-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/mtt/mtt-discogs_track_embeddings-effnet-1_predictions.py +**References** +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + +.. code-block:: bibtex + + @inproceedings{law2009evaluation, + title={Evaluation of Algorithms Using Games: The Case of Music Tagging}, + author={Law, Edith and West, Kris and Mandel, Michael I and Bay, Mert and Downie, J Stephen}, + booktitle={International Society for Music Information Retrieval Conference (ISMIR)}, + year={2009} + } Million Song Dataset @@ -1875,7 +2379,7 @@ Million Song Dataset .. highlight:: none -Music automatic tagging using the top-50 tags of the `LastFM/Million Song Dataset `_:: +Music automatic tagging using the top-50 tags of the `Last.fm/Million Song Dataset `_:: rock, pop, alternative, indie, electronic, female vocalists, dance, 00s, alternative rock, jazz, beautiful, metal, chillout, male vocalists, classic rock, soul, indie rock, Mellow, electronica, 80s, folk, 90s, chill, instrumental, punk, @@ -1884,18 +2388,35 @@ Music automatic tagging using the top-50 tags of the `LastFM/Million Song Datase .. highlight:: default -Models: +**Models** - .. collapse:: ⬇️ msd-msd-musicnn +.. collapse:: msd-msd-musicnn - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/msd/msd-msd-musicnn-1_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/classification-heads/msd/msd-msd-musicnn-1_predictions.py +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Dataset paper `__ + - 🌐 `Website `__ + +.. code-block:: bibtex + + @inproceedings{bertin2011million, + title={The Million Song Dataset}, + author={Bertin-Mahieux, Thierry and Ellis, Daniel PW and Whitman, Brian and Lamere, Paul}, + booktitle={International Society for Music Information Retrieval Conference (ISMIR)}, + year={2011} + } Audio event recognition @@ -1967,29 +2488,48 @@ Audio event recognition (520 audio event classes):: .. highlight:: default -Models: +**Models** + +.. collapse:: audioset-yamnet + + | - .. collapse:: ⬇️ audioset-yamnet + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/yamnet/audioset-yamnet-1_predictions.py - Python code for predictions: + Python code for embedding extraction: - .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/yamnet/audioset-yamnet-1_predictions.py + .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/yamnet/audioset-yamnet-1_embeddings.py - Python code for embedding extraction: +**References** - .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/yamnet/audioset-yamnet-1_embeddings.py +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Paper `__ + - 💻 `TensorFlow Models `__ + - 🌐 `AudioSet `__ + +.. code-block:: bibtex + + @inproceedings{gemmeke2017audio, + title={Audio Set: An ontology and human-labeled dataset for audio events}, + author={Gemmeke, Jort F. and Ellis, Daniel P. W. and Freedman, Dylan and Jansen, Aren and Lawrence, Wade and Moore, R. Channing and Plakal, Manoj and Ritter, Marvin}, + booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, + year={2017} + } FSD-SINet ^^^^^^^^^ -.. highlight:: none - Audio event recognition using the `FSD50K `_ dataset targeting 200 classes drawn from the `AudioSet Ontology `_. + +.. highlight:: none The Shift Invariant Network (SINet) is offered in two different model sizes. ``vgg42`` is a variation of ``vgg41`` with twice the number of filters for each convolutional layer. Also, the shift-invariance technique may be trainable low-pass filters (``tlpf``), adaptative polyphase sampling (``aps``), or both (``tlpf_aps``):: @@ -2019,63 +2559,94 @@ Also, the shift-invariance technique may be trainable low-pass filters (``tlpf`` .. highlight:: default -Models: +**Models** + +.. collapse:: fsd-sinet-vgg41-tlpf - .. collapse:: ⬇️ fsd-sinet-vgg41-tlpf + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg41-tlpf-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg41-tlpf-1_predictions.py + Python code for embedding extraction: - Python code for embedding extraction: + .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg41-tlpf-1_embeddings.py - .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg41-tlpf-1_embeddings.py +.. collapse:: fsd-sinet-vgg42-aps - .. collapse:: ⬇️ fsd-sinet-vgg42-aps + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-aps-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-aps-1_predictions.py + Python code for embedding extraction: - Python code for embedding extraction: + .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-aps-1_embeddings.py - .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-aps-1_embeddings.py +.. collapse:: fsd-sinet-vgg42-tlpf_aps - .. collapse:: ⬇️ fsd-sinet-vgg42-tlpf_aps + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf_aps-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf_aps-1_predictions.py + Python code for embedding extraction: - Python code for embedding extraction: + .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf_aps-1_embeddings.py - .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf_aps-1_embeddings.py +.. collapse:: fsd-sinet-vgg42-tlpf - .. collapse:: ⬇️ fsd-sinet-vgg42-tlpf + | - | + ⬇️ `Weights `__ 📄 `Metadata `__ - [`weights `_, `metadata `_] + Python code for predictions: - Python code for predictions: + .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf-1_predictions.py - .. literalinclude :: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf-1_predictions.py + Python code for embedding extraction: - Python code for embedding extraction: + .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf-1_embeddings.py - .. literalinclude:: ../../src/examples/python/models/scripts/audio-event-recognition/fsd-sinet/fsd-sinet-vgg42-tlpf-1_embeddings.py +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `FSD50K Paper `__ + - 📄 `SINet Paper `__ + - 📊 `FSD50K Dataset `__ + + +.. code-block:: bibtex + + @inproceedings{fonseca2021shift, + title={Shift-Invariance for Sound Event Detection}, + author={Fonseca, Eduardo and Ferraro, Andres and Serra, Xavier}, + booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, + year={2021} + } + +.. code-block:: bibtex + + @article{fonseca2022fsd50k, + title={{FSD50K}: An Open Dataset of Human-Labeled Sound Events}, + author={Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier}, + journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, + volume={30}, + pages={829--852}, + year={2022} + } @@ -2088,57 +2659,75 @@ CREPE Monophonic pitch detection (360 20-cent pitch bins, C1-B7) trained on the RWC-synth and the MDB-stem-synth datasets. CREPE is offered with different model sizes ranging from ``tiny`` to ``full``. A larger model is expected to perform better at the expense of additional computational costs. -Models: +**Models** + +.. collapse:: crepe-full + + | - .. collapse:: ⬇️ crepe-full + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-full-1_predictions.py - Python code for predictions: +.. collapse:: crepe-large - .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-full-1_predictions.py + | - .. collapse:: ⬇️ crepe-large + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-large-1_predictions.py - Python code for predictions: +.. collapse:: crepe-medium - .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-large-1_predictions.py + | - .. collapse:: ⬇️ crepe-medium + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-medium-1_predictions.py - Python code for predictions: +.. collapse:: crepe-small - .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-medium-1_predictions.py + | - .. collapse:: ⬇️ crepe-small + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-small-1_predictions.py - Python code for predictions: +.. collapse:: crepe-tiny - .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-small-1_predictions.py + | - .. collapse:: ⬇️ crepe-tiny + ⬇️ `Weights `__ 📄 `Metadata `__ - | + Python code for predictions: - [`weights `_, `metadata `_] + .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-tiny-1_predictions.py - Python code for predictions: +**References** - .. literalinclude :: ../../src/examples/python/models/scripts/pitch/crepe/crepe-tiny-1_predictions.py +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @inproceedings{kim2018crepe, + title={{CREPE}: A Convolutional Representation for Pitch Estimation}, + author={Kim, Jong Wook and Salamon, Justin and Li, Peter and Bello, Juan Pablo}, + booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, + year={2018} + } @@ -2151,105 +2740,126 @@ Spleeter Source separation into 2, 4, or 5 stems. Spleeter can separate music in different numbers of stems: ``2`` (vocals and accompaniment), ``4`` (vocals, drums, bass, and other separation), or ``5`` (vocals, drums, bass, piano, and other separation). -Models: +**Models** + +.. collapse:: spleeter-2s + + | + + ⬇️ `Weights `__ 📄 `Metadata `__ - .. collapse:: ⬇️ speeter-2s + Python code for source separation: - | + .. code-block:: python - [`weights `_, `metadata `_] + from essentia.standard import AudioLoader, TensorflowPredict + from essentia import Pool + import numpy as np - Python code for source separation: + # Input should be audio @41kHz. + audio, sr, _, _, _, _ = AudioLoader(filename="audio.wav")() - .. code-block:: python + pool = Pool() + # The input needs to have 4 dimensions so that it is interpreted as an Essentia tensor. + pool.set("waveform", audio[..., np.newaxis, np.newaxis]) - from essentia.standard import AudioLoader, TensorflowPredict - from essentia import Pool - import numpy as np + model = TensorflowPredict( + graphFilename="spleeter-2s-3.pb", + inputs=["waveform"], + outputs=["waveform_vocals", "waveform_accompaniment"] + ) - # Input should be audio @41kHz. - audio, sr, _, _, _, _ = AudioLoader(filename="audio.wav")() + out_pool = model(pool) + vocals = out_pool["waveform_vocals"].squeeze() + accompaniment = out_pool["waveform_accompaniment"].squeeze() - pool = Pool() - # The input needs to have 4 dimensions so that it is interpreted as an Essentia tensor. - pool.set("waveform", audio[..., np.newaxis, np.newaxis]) +.. collapse:: spleeter-4s - model = TensorflowPredict( - graphFilename="spleeter-2s-3.pb", - inputs=["waveform"], - outputs=["waveform_vocals", "waveform_accompaniment"] - ) + | - out_pool = model(pool) - vocals = out_pool["waveform_vocals"].squeeze() - accompaniment = out_pool["waveform_accompaniment"].squeeze() + ⬇️ `Weights `__ 📄 `Metadata `__ - .. collapse:: ⬇️ speeter-4s + Python code for source separation: - | + .. code-block:: python - [`weights `_, `metadata `_] + from essentia.standard import AudioLoader, TensorflowPredict + from essentia import Pool + import numpy as np - Python code for source separation: + # Input should be audio @41kHz. + audio, sr, _, _, _, _ = AudioLoader(filename="audio.wav")() - .. code-block:: python + pool = Pool() + # The input needs to have 4 dimensions so that it is interpreted as an Essentia tensor. + pool.set("waveform", audio[..., np.newaxis, np.newaxis]) - from essentia.standard import AudioLoader, TensorflowPredict - from essentia import Pool - import numpy as np + model = TensorflowPredict( + graphFilename="spleeter-4s-3.pb", + inputs=["waveform"], + outputs=["waveform_vocals", "waveform_drums", "waveform_bass", "waveform_other"] + ) - # Input should be audio @41kHz. - audio, sr, _, _, _, _ = AudioLoader(filename="audio.wav")() + out_pool = model(pool) + vocals = out_pool["waveform_vocals"].squeeze() + drums = out_pool["waveform_drums"].squeeze() + bass = out_pool["waveform_bass"].squeeze() + other = out_pool["waveform_other"].squeeze() - pool = Pool() - # The input needs to have 4 dimensions so that it is interpreted as an Essentia tensor. - pool.set("waveform", audio[..., np.newaxis, np.newaxis]) +.. collapse:: spleeter-5s - model = TensorflowPredict( - graphFilename="spleeter-4s-3.pb", - inputs=["waveform"], - outputs=["waveform_vocals", "waveform_drums", "waveform_bass", "waveform_other"] - ) + | - out_pool = model(pool) - vocals = out_pool["waveform_vocals"].squeeze() - drums = out_pool["waveform_drums"].squeeze() - bass = out_pool["waveform_bass"].squeeze() - other = out_pool["waveform_other"].squeeze() + ⬇️ `Weights `__ 📄 `Metadata `__ - .. collapse:: ⬇️ speeter-5s + Python code for source separation: - | + .. code-block:: python - [`weights `_, `metadata `_] + from essentia.standard import AudioLoader, TensorflowPredict + from essentia import Pool + import numpy as np - Python code for source separation: + # Input should be audio @41kHz. + audio, sr, _, _, _, _ = AudioLoader(filename="audio.wav")() - .. code-block:: python + pool = Pool() + # The input needs to have 4 dimensions so that it is interpreted as an Essentia tensor. + pool.set("waveform", audio[..., np.newaxis, np.newaxis]) - from essentia.standard import AudioLoader, TensorflowPredict - from essentia import Pool - import numpy as np + model = TensorflowPredict( + graphFilename="spleeter-5s-3.pb", + inputs=["waveform"], + outputs=["waveform_vocals", "waveform_drums", "waveform_bass", "waveform_piano", "waveform_other"] + ) - # Input should be audio @41kHz. - audio, sr, _, _, _, _ = AudioLoader(filename="audio.wav")() + out_pool = model(pool) + vocals = out_pool["waveform_vocals"].squeeze() + drums = out_pool["waveform_drums"].squeeze() + bass = out_pool["waveform_bass"].squeeze() + bass = out_pool["waveform_piano"].squeeze() + other = out_pool["waveform_other"].squeeze() - pool = Pool() - # The input needs to have 4 dimensions so that it is interpreted as an Essentia tensor. - pool.set("waveform", audio[..., np.newaxis, np.newaxis]) +**References** - model = TensorflowPredict( - graphFilename="spleeter-5s-3.pb", - inputs=["waveform"], - outputs=["waveform_vocals", "waveform_drums", "waveform_bass", "waveform_piano", "waveform_other"] - ) +.. list-table:: + :widths: auto + :header-rows: 0 - out_pool = model(pool) - vocals = out_pool["waveform_vocals"].squeeze() - drums = out_pool["waveform_drums"].squeeze() - bass = out_pool["waveform_bass"].squeeze() - bass = out_pool["waveform_piano"].squeeze() - other = out_pool["waveform_other"].squeeze() + * - 📄 `Paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @article{hennequin2020spleeter, + title={Spleeter: a fast and efficient music source separation tool with pre-trained models}, + author={Hennequin, Romain and Khlif, Anis and Voituret, Felix and Moussallam, Manuel}, + journal={Journal of Open Source Software}, + volume={5}, + number={50}, + pages={2154}, + year={2020} + } @@ -2265,34 +2875,52 @@ Tempo classification (256 BPM classes, 30-286 BPM) trained on the Extended Ballr TempoCNN may feature square filters (``deepsquare``) or longitudinal ones (``deeptemp``) and a model size factor of 4 (``k4``) or 16 (``k16``). A larger model is expected to perform better at the expense of additional computational costs. -Models: +**Models** - .. collapse:: ⬇️ deepsquare-k16 +.. collapse:: deepsquare-k16 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/tempo/tempocnn/deepsquare-k16-3_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/tempo/tempocnn/deepsquare-k16-3_predictions.py - .. collapse:: ⬇️ deeptemp-k4 +.. collapse:: deeptemp-k4 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/tempo/tempocnn/deeptemp-k4-3_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/tempo/tempocnn/deeptemp-k4-3_predictions.py - .. collapse:: ⬇️ deeptemp-k16 +.. collapse:: deeptemp-k16 - | + | - [`weights `_, `metadata `_] + ⬇️ `Weights `__ 📄 `Metadata `__ - Python code for predictions: + Python code for predictions: - .. literalinclude :: ../../src/examples/python/models/scripts/tempo/tempocnn/deeptemp-k16-3_predictions.py + .. literalinclude :: ../../src/examples/python/models/scripts/tempo/tempocnn/deeptemp-k16-3_predictions.py + +**References** + +.. list-table:: + :widths: auto + :header-rows: 0 + + * - 📄 `Paper `__ + - 💻 `GitHub `__ + +.. code-block:: bibtex + + @inproceedings{schreiber2018singlestep, + title={A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network}, + author={Schreiber, Hendrik and M{\"u}ller, Meinard}, + booktitle={International Society for Music Information Retrieval Conference (ISMIR)}, + year={2018} + }