GSoC

Extending Gentle Aligner

Week 7

Decoding using a customized langauge model

Here proto_dir is kaldi/egs/gentle/data where gentle is the package cloned from git.

1. run initialize.py 
    python3 scripts/initialize.py proto_dir model_dir path_to_lexicon.txt

This script ensures that you have the correct directory structure and files inside proto_dir/, model_dir/ and lexicon.txt.

2. run generateLM.py 
    python3 generateLM.py path_to_utterance proto_dir kaldi_path

This script makes sure that correct langauge inputs as lexicon.txt, nonsilence_phones.txt etc are created, prepares L.fst etc using 
Kaldi and provides these inputs to Gentle for decoding graph generation (HCLG.fst).

3. run automateDecoding.py
    python3 scripts/automateDecoding.py kaldi_path audio_file utterance_path proto_dir 

This script creates speech features for the audio input and generates gmm based lattices using the customized decoding graph.
These lattices are then segregated for best results, providing us with the best path/best sequence of utterance which gets time-aligned
using the nbest-to-ctm script. Giving us the phone-level and word-level time alignments for the given utterance. This script also 
produces json inputs for alignment visualization.

Sample output looks like this:

Tools: Kaldi, Python, C, Bash Scripting

Link to GSoC Project Repository