Extending Gentle Aligner
Week 6
Generating a Language Model
-
Preparing for integration with Gentle Aligner
Acoustic model:* Pick final.mdl and tree for the langauge that you want to build langauge model for. * Place final.mdl inside data/tdnn_7b_chain_online/ * Place tree inside data/tdnn_7b_chain_online/
Lexicon: * Place words.txt inside data/tdnn_7b_chain_online/graph_pp/words.txt
-
Did we get HCLG.fst?
run script generateLM.py with three arguments, 1. path to the utterance.txt 2. proto_dir: path to the
data
directory containinglangdir
andtdnn_7b_chain_online
3. path toa pristine copy of KaldiEx: python3 generateLM.py path-to/utterance.txt path-to/data This script calls langauge_model functions of Gentle which generate a bigram grammar G.fst, and compiles L.fst, C.fst and G.fst into HCLG.fst.
Inputs taken by Gentle are: langdir/L.fst langdir/L_disambig.fst langdir/phones/disambig.int tdnn_7b_chain_online/final.mdl tdnn_7b_chain_online/tree tdnn_7b_chain_online/graph_pp/words.txt
-
After this step you would have generated temp_HCLG.fst which gets saved as
tdnn_7b_chain_online/graph_pp/HCLG.fst
. ThisHCLG.fst
will be used in the next step of decoding utterances and producing their word & phoneme level alignments. - Next: Decoding using an utterance-level-customized langauge model Week 7
- Prev: Groundwork for Generating a Language Model Week 5
- main page
Tools: Kaldi, Python, C, Bash Scripting
Link to GSoC Project Repository