#980839 RFP: rnnoise -- noise suppression library based on a recurrent neural network

Package:
wnpp
Source:
wnpp
Submitter:
Petter Reinholdtsen
Date:
2025-11-29 16:48:12 UTC
Severity:
wishlist
#980839#5
Date:
2021-01-22 23:31:23 UTC
From:
To:
* Package name    : rnnoise
  Version         : n/a git repo
  Upstream Author : Jean-Marc Valin <jmvalin@jmvalin.ca>
* URL             : https://gitlab.xiph.org/xiph/rnnoise
* License         : BSD
  Programming Lang: C
  Description     : noise suppression library based on a recurrent neural network

This library is used by one of the VLC modules, see
<URL: http://git.videolan.org/?p=vlc.git;a=blob;f=modules/audio_filter/rnnoise.c;h=34229f0012487ffe00f932238b8d990e861e3b22;hb=HEAD#l77 >.

#980839#10
Date:
2021-01-22 23:55:56 UTC
From:
To:
I discovered in
<URL: https://github.com/xiph/rnnoise/issues/137 > that both Mumble and
OBS Studio can use rnnoise.  So three Debian packages would use it.

#980839#17
Date:
2021-02-01 08:19:03 UTC
From:
To:
It has been made clear in this Hacker News subthread that the RNNoise
model has been trained in part using proprietary data:

https://news.ycombinator.com/item?id=25978309

#980839#22
Date:
2021-02-01 17:54:57 UTC
From:
To:
respect to that thread and the Debian Machine Learning and Software
Freedom policy proposal:


https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst

There was some confusion because Mozilla collected public data
submissions for the project. According to author Jean-Marc Valin, these
data were *not used* to train the currently-published rnnoise model,
which was instead trained on other free and non-free data sets.

The crowdsourced data set was published under a CC0 license and is
available for further work, but it needs cleaning and characterization
before it can be directly useful for training.

Data download: https://media.xiph.org/rnnoise/
Original click-through soliciting license agreement from submittors:
https://web.archive.org/web/20171003052023/https://people.xiph.org/~jm/demo/rnnoise/donate.html

So, rnnoise falls under the "toxic candy" model classification in the
policy proposal.

It's good to have names for these situations, and definitely good to
ask for public data for training models, but I don't think it would be
reasonable to block packaging rnnoise based on this criterion.
Compression technologies, whether for voice, music, images, or video
have all been tested and tuned against source data which is not all
publicly redistributable. For example, the codebooks of the speex
codec, part of Debian since 2002, were trained on some of the same
proprietary datasets as the default rnnoise model. Even the Linux
kernel is tuned using proprietary workloads.

Recent interest in machine learning has made better tools for model
training available, bringing us closer to applying the modification
aspect of software freedom to parameter sets used to configure
software. That's a step forward. Deciding that new models must reach a
higher bar than established code would be a step back.

#980839#27
Date:
2024-04-15 07:14:55 UTC
From:
To:
available data.  Time to see if rnnoise can go into Debian?