GSoC

Extending Gentle Aligner

Week 2

Decoding & Word-Phoneme Alignment using voxforge_ru Russian ASR model

Preparing Kaldi input files: wav.scp, utt2spk, spk2utt.

wav.scp: utterance /path_to_wav_files utt2spk: utterance speaker spk2utt: [creating this file manually casuses issues later]
```
  Use: utils/utt2spk.pl /path_to _wav.scp_and_utt2spk 
  eg:  ./utt2spk_to_spk2utt.pl data/ 
```

Decoding using steps/decode.sh script:

  $cmd --num-threads $num_threads JOB=1:$nj $dir/log/decode.JOB.log \
  gmm-latgen-faster$thread_string --max-active=$max_active --beam=$beam --lattice-beam=$lattice_beam \
  --acoustic-scale=$acwt --allow-partial=true --word-symbol-table=$graphdir/words.txt $decode_extra_opts \
  $model $graphdir/HCLG.fst "$feats" "ark:|gzip -c > $dir/lat.JOB.gz" "ark,t:$dir/words.JOB" "ark,t:$dir/alignments.JOB" || exit 1;

On getting the transcripts - time aligning them to words and phonemes

Phoneme Alignement:

  ./ali-to-phones --ctm-output ../../egs/recipes/voxforge_ru/exp/tri2a/final.mdl ark:../../egs/recipes/voxforge_ru/exp/tri2a/manual_transcript_decode/alignments.1 1.ctm
  .py script to map integers to phonetic symbols.

Word Alignment:

  ./steps/get_ctm.sh ./transcript ./data/lang ./exp/tri2a/manual_transcript_decode/

Next step: To visualize the time-aligned transcript data. Week 3
Prev: Running Voxforge_ru (Russian ASR model) recipe Week 1
main page

Tools: Kaldi, Python, C, Bash Scripting

Link to GSoC Project Repository

GSoC 2019 | Red Hen Lab

Google Summer of Code 2019 | Red Hen Lab | Robert Ochshorn | Shreya

Extending Gentle Aligner

Week 2

Decoding & Word-Phoneme Alignment using voxforge_ru Russian ASR model