SanitizeFunction#

class montreal_forced_aligner.tokenization.simple.SanitizeFunction(word_table, clitic_marker, clitic_cleanup_regex, clitic_quote_regex, punctuation_regex, word_break_regex, bracket_regex, bracket_sanitize_regex, ignore_case=True)[source]#

Bases: object

Class for functions that sanitize text and strip punctuation

Parameters:
  • punctuation (list[str]) – List of characters to treat as punctuation

  • clitic_markers (list[str]) – Characters that mark clitics

  • compound_markers (list[str]) – Characters that mark compound words

  • brackets (list[tuple[str, str]]) – List of bracket sets to not strip from the ends of words

  • ignore_case (bool) – Flag for whether all items should be converted to lower case, defaults to True

  • quote_markers (list[str], optional) – Quotation markers to use when parsing text

  • quote_markers – Quotation markers to use when parsing text

  • word_break_markers (list[str], optional) – Word break markers to use when parsing text