Extending Gentle Aligner
Week 10
Customized German Language Model
Using german tuda-de Kaldi recipe
- create lexicon.txt
- phoneGeneration.py
Go inside kaldi/egs/gentle/german/
python3 ~/gentle/gentle/phoneGeneration.py proto_dir
UPDATES [POST-GSOC]
-
built lm.arpa successfully for lm.arpa — for the sentence:
~/kaldi/tools/srilm/lm/bin/i686-m64/ngram-count -text corpus.txt -order 3 -limit-vocab -vocab data/dict/lexicon.txt -lm lm.arpa
-
generated langauge files using utils/prep_langauge.sh successfully
./utils/prepare_lang.sh --phone-symbol-table de_400k_nnet3chain_tdnn1f_2048_sp_bi/phones.txt german/data/dict '<UNK>' german/data/lang_test/ german/data/lang
-
Building grammar
gzip lm.arpa
./utils/format_lm.sh german/data/lang german/lm.arpa.gz german/data/dict/lexicon.txt german/G.fst
-
Compiling HCLG.fst
./utils/mkgraph.sh --self-loop-scale 1.0 german/G.fst german/model german/
-
Decoding using newly compiled HCLG.fst & existing final.mdl [TO DO]
-
Getting time-alignments, converting them into visualization format.[TO DO]
- Next: Putting everything together in a Singularity Container Week 11
- Prev: Exploring German Language Models Week 9
- main page
Tools: Kaldi, Python, C, Bash Scripting
Link to GSoC Project Repository