Extending Gentle Aligner
Week 2
Decoding & Word-Phoneme Alignment using voxforge_ru Russian ASR model
-
Preparing Kaldi input files: wav.scp, utt2spk, spk2utt.
wav.scp: utterance /path_to_wav_files utt2spk: utterance speaker spk2utt: [creating this file manually casuses issues later]
Use: utils/utt2spk.pl /path_to _wav.scp_and_utt2spk eg: ./utt2spk_to_spk2utt.pl data/
-
Decoding using steps/decode.sh script:
$cmd --num-threads $num_threads JOB=1:$nj $dir/log/decode.JOB.log \ gmm-latgen-faster$thread_string --max-active=$max_active --beam=$beam --lattice-beam=$lattice_beam \ --acoustic-scale=$acwt --allow-partial=true --word-symbol-table=$graphdir/words.txt $decode_extra_opts \ $model $graphdir/HCLG.fst "$feats" "ark:|gzip -c > $dir/lat.JOB.gz" "ark,t:$dir/words.JOB" "ark,t:$dir/alignments.JOB" || exit 1;
-
On getting the transcripts - time aligning them to words and phonemes
Phoneme Alignement:
./ali-to-phones --ctm-output ../../egs/recipes/voxforge_ru/exp/tri2a/final.mdl ark:../../egs/recipes/voxforge_ru/exp/tri2a/manual_transcript_decode/alignments.1 1.ctm .py script to map integers to phonetic symbols.
Word Alignment:
./steps/get_ctm.sh ./transcript ./data/lang ./exp/tri2a/manual_transcript_decode/
- Next step: To visualize the time-aligned transcript data. Week 3
- Prev: Running Voxforge_ru (Russian ASR model) recipe Week 1
- main page
Tools: Kaldi, Python, C, Bash Scripting
Link to GSoC Project Repository