SanitizeFunction#
- class montreal_forced_aligner.tokenization.simple.SanitizeFunction(word_table, clitic_marker, clitic_cleanup_regex, clitic_quote_regex, punctuation_regex, word_break_regex, bracket_regex, bracket_sanitize_regex, cutoff_regex, ignore_case=True)[source]#
Bases:
object
Class for functions that sanitize text and strip punctuation
- Parameters:
punctuation (list[str]) – List of characters to treat as punctuation
compound_markers (list[str]) – Characters that mark compound words
brackets (list[tuple[str, str]]) – List of bracket sets to not strip from the ends of words
ignore_case (bool) – Flag for whether all items should be converted to lower case, defaults to True
quote_markers (list[str], optional) – Quotation markers to use when parsing text
quote_markers – Quotation markers to use when parsing text
word_break_markers (list[str], optional) – Word break markers to use when parsing text