GSoC

Extending Gentle Aligner

Week 6

Generating a Language Model

Preparing for integration with Gentle Aligner
Acoustic model:

 * Pick final.mdl and tree for the langauge that you want to build langauge model for.
 * Place final.mdl inside data/tdnn_7b_chain_online/
 * Place tree inside data/tdnn_7b_chain_online/

Lexicon: * Place words.txt inside data/tdnn_7b_chain_online/graph_pp/words.txt

Did we get HCLG.fst?

run script generateLM.py with three arguments, 1. path to the utterance.txt 2. proto_dir: path to the data directory containing langdir and tdnn_7b_chain_online 3. path toa pristine copy of Kaldi

Ex: python3 generateLM.py path-to/utterance.txt path-to/data This script calls langauge_model functions of Gentle which generate a bigram grammar G.fst, and compiles L.fst, C.fst and G.fst into HCLG.fst.
```
  Inputs taken by Gentle are:
     langdir/L.fst
     langdir/L_disambig.fst
     langdir/phones/disambig.int
     tdnn_7b_chain_online/final.mdl
     tdnn_7b_chain_online/tree
     tdnn_7b_chain_online/graph_pp/words.txt
```
After this step you would have generated temp_HCLG.fst which gets saved as tdnn_7b_chain_online/graph_pp/HCLG.fst. This HCLG.fst will be used in the next step of decoding utterances and producing their word & phoneme level alignments.
Next: Decoding using an utterance-level-customized langauge model Week 7
Prev: Groundwork for Generating a Language Model Week 5
main page

Tools: Kaldi, Python, C, Bash Scripting

Link to GSoC Project Repository

GSoC 2019 | Red Hen Lab

Google Summer of Code 2019 | Red Hen Lab | Robert Ochshorn | Shreya

Extending Gentle Aligner

Week 6

Generating a Language Model