Portuguese Sign Language Reference Corpus: An annotated signed corpus with diachronic foundation
The lack of a structured linguistic data collection representing the natural use of sign language limits linguistic understanding, language preservation, educational opportunities, accessibility, and technological advancement while also threatening the cultural heritage of the deaf community. To have a well-documented Portuguese Sign Language (LGP) corpus to ensure that LGP thrives and continues to be recognized as a legitimate and fully developed language, we built the LGP Reference Corpus.In this article, we describe the construction of the first machine-readable digital reference corpus for LGP, with 112 hours 51 minutes of LGP recordings and associated metadata collected between 1992 and 2019, which we listed, digitalized, and archived. We also present the annotation structure and conventions we built to annotate this corpus at different linguistic levels (phonological, lexical, morphological, syntactic, and semantic). Its diachronic foundation and dialectal and social variety data allow future LGP studies on its grammar and variation. Furthermore, the LGP Reference Corpus is the foundation for developing various linguistic tools, such as calculating sign frequency indices, which supported the inclusion of signs in the Fundamental LGP Vocabulary dictionary, aiding in the analysis and extraction of grammatical rules implemented in the LGP Translator (M. Gonçalves et al., 2021).
New in the Journal of Portuguese Linguistics: "Portuguese Sign Language Reference Corpus: An annotated signed corpus with diachronic foundation" by Mara Moita et al.: doi.org/10.16995/jpl...
#PortugueseLinguistics #Linguistics