python - How can I extract the audio embeddings (features) from Google’s AudioSet? -
i’m talking audio features dataset available @ https://research.google.com/audioset/download.html tar.gz archive consisting of frame-level audio tfrecords.
extracting else tfrecord files works fine (i extract keys: video_id, start_time_seconds, end_time_seconds, labels), actual embeddings needed training not seem there @ all. when iterate on contents of tfrecord file dataset, 4 keys video_id, start_time_seconds, end_time_seconds, , labels, printed.
this code i'm using:
import tensorflow tf import numpy np def readtfrecordsamples(tfrecords_filename): record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename) string_record in record_iterator: example = tf.train.example() example.parsefromstring(string_record) print(example) # prints abovementioned 4 keys not audio_embeddings # first label can parsed this: label = (example.features.feature['labels'].int64_list.value[0]) print('label 1: ' + str(label)) # this, however, not work: #audio_embedding = (example.features.feature['audio_embedding'].bytes_list.value[0]) readtfrecordsamples('embeddings/01.tfrecord')
is there trick extracting 128-dimensional embeddings? or not in dataset?
solved it, tfrecord files need read sequence examples, not examples. above code works if line
example = tf.train.example()
is replaced by
example = tf.train.sequenceexample()
the embeddings , other content can viewed running
print(example)
Comments
Post a Comment