GSoC

Extending Gentle Aligner

Project aims to extend the existing Gentle Aligner to different langauges. This project involves integrating different language models in Gentle Aligner Tool.

Milestones [Next 4-8 weeks]

Running an ASR Kaldi recipe, generating an ASR model
Providing test data to the generated model to fetch timing information for the test audio data
Using the transcript to generate a new language model
Combining the language model with the previously used acoustic model and dictionary to obtain new FST.
Utilizing the new decoded graph in the Gentle tool to adapt it to the new language.
Repeating 1-5 on another language.

Week 1: Running voxforge_ru recipe, generating ASR model. Figuring out how to extract timing information from the generated model. Week 1
Week 2: Decoding & Word-Phoneme Alignment using voxforge_ru ASR model Week 2
Week 3: Word & Phoneme Alignment Visualization week 3
Week 4: Automating Decoding week 4
Week 5: Groundwork for Generating a Language Model week 5
Week 6: Generating a Language Model week 6
Week 7: Decoding using a customized langauge model week 7
Week 8: Walk-through the process: from installation to decoding week 8
Week 9: Exploring German Language Models week 9
Week 10: Trial: Customized German Language Model week 10
Week 11: Putting everything together in a Singularity Container week 11
Week 12: Summary week 12

Tools: Kaldi, Python, C, Bash scripting

Link to GSoC Project Repository

GSoC 2019 | Red Hen Lab

Google Summer of Code 2019 | Red Hen Lab | Robert Ochshorn | Shreya

Extending Gentle Aligner