• Re: How to check whether audio bytes contain empty noise or actual voic

    From Stefan Ram@21:1/5 to marc nicole on Sat Oct 26 11:16:13 2024
    marc nicole <mk1853387@gmail.com> wrote or quoted:
    I have a hard time finding a way to check whether audio data samples are >containing empty noise or actual significant voice/noise.

    Or, you could have a human do a quick listen to some audio files to
    gauge the "empty-noise ratio," then use that number as the filename
    as a float, and finally train up a neural net on this. E.g.,

    0.99.wav # very empty
    0.992.wav # very empty file #2
    0.993.wav # very empty file #3

    0.00.wav # very not empty file
    0.002.wav # very not empty file #2

    One possible approach:

    import os
    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense
    from tensorflow.keras.optimizers import Adam
    import librosa

    ## Data Preparation

    # Function to extract audio features
    def extract_features(file_path):
    audio, sr = librosa.load(file_path)
    mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)
    return np.mean(mfccs.T, axis=0)

    # Load data from directory
    directory = 'd' # for example
    X = []
    y = []

    for filename in os.listdir(directory):
    if filename.endswith('.wav'):
    file_path = os.path.join(directory, filename)
    X.append(extract_features(file_path))
    y.append(float(filename[:-4])) # Assuming filename is the p value

    X = np.array(X)
    y = np.array(y)

    # Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Feature scaling
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    ## Neural Network Model

    model = Sequential([
    Dense(64, activation='relu', input_shape=(13,)),
    Dense(32, activation='relu'),
    Dense(1)
    ])

    model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

    ## Training

    model.fit(X_train_scaled, y_train, epochs=100, batch_size=32, validation_split=0.2, verbose=1)

    ## Evaluation

    test_loss = model.evaluate(X_test_scaled, y_test, verbose=0)
    print(f"Test Loss: {test_loss}")

    ## Prediction Function

    def predict_p(audio_file):
    features = extract_features(audio_file)
    scaled_features = scaler.transform(features.reshape(1, -1))
    prediction = model.predict(scaled_features)
    return prediction[0][0]

    # Example usage
    new_audio_file = 'path/to/new/audio/file.wav'
    predicted_p = predict_p(new_audio_file)
    print(f"Predicted p value: {predicted_p}")

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MRAB@21:1/5 to marc nicole via Python-list on Sat Oct 26 16:35:47 2024
    On 2024-10-25 17:25, marc nicole via Python-list wrote:
    Hello Python fellows,

    I hope this question is not very far from the main topic of this list, but
    I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise.

    I am using PyAudio to collect the sound through my PC mic as follows:

    FRAMES_PER_BUFFER = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 48000
    RECORD_SECONDS = 2import pyaudio
    audio = pyaudio.PyAudio()
    stream = audio.open(format=FORMAT,
    channels=CHANNELS,
    rate=RATE,
    input=True,
    frames_per_buffer=FRAMES_PER_BUFFER,
    input_device_index=2)
    data = stream.read(FRAMES_PER_BUFFER)


    I want to know whether or not data contains voice signals or empty sound,
    To note that the variable always contains bytes (empty or sound) if I print it.

    Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke?

    Thanks.
    If you do a spectral analysis and find peaks at certain frequencies,
    then there might be a "significant" sound.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to marc nicole via Python-list on Sat Oct 26 12:07:10 2024
    On 10/25/2024 12:25 PM, marc nicole via Python-list wrote:
    Hello Python fellows,

    I hope this question is not very far from the main topic of this list, but
    I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise.

    I am using PyAudio to collect the sound through my PC mic as follows:

    FRAMES_PER_BUFFER = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 48000
    RECORD_SECONDS = 2import pyaudio
    audio = pyaudio.PyAudio()
    stream = audio.open(format=FORMAT,
    channels=CHANNELS,
    rate=RATE,
    input=True,
    frames_per_buffer=FRAMES_PER_BUFFER,
    input_device_index=2)
    data = stream.read(FRAMES_PER_BUFFER)


    I want to know whether or not data contains voice signals or empty sound,
    To note that the variable always contains bytes (empty or sound) if I print it.

    Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke?

    It's not always so easy. The Fast Fourier Transform will be your
    friend. The most straightforward way would be to do an autocorrelation
    on the recorded interval, possibly with some pre-filtering to enhance
    the typical vocal frequency range. If the data is only noise, the autocorrelation will show a large signal at point 0 and only small,
    obviously noisy numbers everywhere else. There are practical aspects
    that make things less clear. For example, voices tend to be spiky and
    erratic so you need to use small intervals to have a better chance of
    getting an interval with a good S/N ratio, but small intervals will have
    a lower signal to noise ratio.

    Human speech is produced with various statistical regularities and these
    can sometimes be detected with various means, including the autocorrelation.

    You also will need to test-record your entire signal chain because it
    might be producing artifacts that could fool some tests. And background
    sounds could fool some tests as well.

    Here are some Python libraries that could be very helpful:

    librosa (I have not worked with this but it sounds right on target); scipy.signal (I have used scypi but not specifically scipy.signal); python-speech-features (another I haven't used);
    https://python-speech-features.readthedocs.io/en/latest/

    Other people will know of others.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lars Liedtke@21:1/5 to All on Mon Oct 28 09:57:09 2024
    There are also the concepts of Cepstrum (https://en.wikipedia.org/wiki/Cepstrum) and Quefrency, which are derivatives of Spectrum and Frequency, with which you can even do speaker-recognition, but also detection of events.


    Lars Liedtke
    Lead Developer

    [Tel.] +49 721 98993-
    [Fax] +49 721 98993-
    [E-Mail] lal@solute.de<mailto:lal@solute.de>


    solute GmbH
    Zeppelinstraße 15
    76185 Karlsruhe
    Germany

    [Marken]

    Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
    Webseite | www.solute.de <http://www.solute.de/>
    Sitz | Registered Office: Karlsruhe
    Registergericht | Register Court: Amtsgericht Mannheim
    Registernummer | Register No.: HRB 748044
    USt-ID | VAT ID: DE234663798



    Informationen zum Datenschutz | Information about privacy policy https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php




    Am 26.10.24 um 18:07 schrieb Thomas Passin via Python-list:
    On 10/25/2024 12:25 PM, marc nicole via Python-list wrote:
    Hello Python fellows,

    I hope this question is not very far from the main topic of this list, but
    I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise.

    I am using PyAudio to collect the sound through my PC mic as follows:

    FRAMES_PER_BUFFER = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 48000
    RECORD_SECONDS = 2import pyaudio
    audio = pyaudio.PyAudio()
    stream = audio.open(format=FORMAT,
    channels=CHANNELS,
    rate=RATE,
    input=True,
    frames_per_buffer=FRAMES_PER_BUFFER,
    input_device_index=2)
    data = stream.read(FRAMES_PER_BUFFER)


    I want to know whether or not data contains voice signals or empty sound,
    To note that the variable always contains bytes (empty or sound) if I print
    it.

    Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke?

    It's not always so easy. The Fast Fourier Transform will be your friend. The most straightforward way would be to do an autocorrelation on the recorded interval, possibly with some pre-filtering to enhance the typical vocal frequency range. If the data
    is only noise, the autocorrelation will show a large signal at point 0 and only small, obviously noisy numbers everywhere else. There are practical aspects that make things less clear. For example, voices tend to be spiky and erratic so you need to use
    small intervals to have a better chance of getting an interval with a good S/N ratio, but small intervals will have a lower signal to noise ratio.

    Human speech is produced with various statistical regularities and these can sometimes be detected with various means, including the autocorrelation.

    You also will need to test-record your entire signal chain because it might be producing artifacts that could fool some tests. And background sounds could fool some tests as well.

    Here are some Python libraries that could be very helpful:

    librosa (I have not worked with this but it sounds right on target); scipy.signal (I have used scypi but not specifically scipy.signal); python-speech-features (another I haven't used);
    https://python-speech-features.readthedocs.io/en/latest/

    Other people will know of others.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to marc nicole on Fri Oct 25 16:43:11 2024
    marc nicole <mk1853387@gmail.com> wrote or quoted:
    I hope this question is not very far from the main topic of this list, but
    I have a hard time finding a way to check whether audio data samples are >containing empty noise or actual significant voice/noise.

    The Spectral Flatness Measure (SFM), also called Wiener entropy, can
    separate the wheat from the chaff when it comes to how noise-like
    a signal is. This measure runs the gamut from 0 to 1, where:
    1 means you've hit pay dirt with perfect white noise (flat spectrum),
    0 is as pure as a Napa Valley Chardonnay (single frequency).
    (Everything in between is just different shades of gnarly.)

    import numpy as np
    from scipy.signal import welch

    def noiseness(signal, fs):
    # Compute the power spectral density
    f, psd = welch(signal, fs, nperseg=min(len(signal), 256))

    # Compute geometric mean of PSD
    geometric_mean = np.exp(np.mean(np.log(psd + 1e-10)))

    # Compute arithmetic mean of PSD
    arithmetic_mean = np.mean(psd)

    # Calculate Spectral Flatness Measure
    sfm = geometric_mean / arithmetic_mean

    return sfm

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Stefan Ram on Fri Oct 25 17:00:14 2024
    ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
    marc nicole <mk1853387@gmail.com> wrote or quoted:
    I hope this question is not very far from the main topic of this list, but >>I have a hard time finding a way to check whether audio data samples are >>containing empty noise or actual significant voice/noise.
    The Spectral Flatness Measure (SFM), also called Wiener entropy, can
    separate the wheat from the chaff when it comes to how noise-like
    a signal is.

    You can also peep the envelope flatness (the flatness of the
    volume). If you've got some white noise that's not bringing much to
    the table, that envelope should be flatter than a pancake at IHOP.

    import librosa
    import numpy as np

    def measure_volume_flatness(audio_path, sr=None):
    # Load the audio file
    y, sr = librosa.load(audio_path, sr=sr)

    # Calculate the root mean square (RMS) energy for each frame
    frame_length = 2048
    hop_length = 512
    rms = librosa.feature.rms(y=y, frame_length=frame_length, hop_length=hop_length)[0]

    # Calculate the dynamic range
    db_range = librosa.amplitude_to_db(np.max(rms)) - librosa.amplitude_to_db(np.min(rms))

    # Normalize the dynamic range to a 0-1 scale
    # Assuming a maximum possible dynamic range of 120 dB
    flatness = 1 - (db_range / 120)

    return flatness

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)