

A specific goal is to establish interoperability between and among IGT dictionaries, FST lexicons and morphological dictionaries used for machine learning. OntoLex-Morph does support both aligned and non-aligned morphological dictionaries. In 2019, the OntoLex-Morph module has been proposed to facilitate data modelling of morphology in lexicography, as well as to provide a data model for morphological dictionaries for Natural Language Processing. OntoLex is a community standard for machine-readable dictionaries on the web. OntoLex-Morph: A community standard for morphological dictionaries
#Linguistic dictionary online software#
Their formats of FLEx and Toolbox are not intended for human consumption, nor are they well-supported by any processing software other than their native tools. Xigt comes with FLEx and Toolbox importers, but is less widely used that either FLEx or Toolbox. FLEx and Toolbox are not directly interoperable with each other, but a semiautomated converter for Toolbox to FLEx does exist. FLEx and Toolbox provide different editor functionalities for annotating text and editing dictionaries, so that additional information beyond that found in annotations can be added, but at its core, their formats provide aligned morphological dictionaries.įLEx and Xigt are based on XML formats, Toolbox uses a plain text format with idiosyncratic "markers". Whenever a morphological segment is newly annotated, the annotation is stored in the dictionary. Whenever a morphological segment is encountered for which an annotation in the dictionary can be found, this annotations is applied.

Toolbox and FLEx support semi-automated annotation by means of an internal morphological dictionary. Although IGT can be created without any specialized software (but just with a conventional editor), such specialized software has been developed, with notable examples such as Toolbox, the FieldWorks Language Explorer (FLEx) or open source alternatives such as Xigt. Interlinear Glossed Text (IGT) is a popular formalism in language documentation, linguistic typology and other branches of linguistics and the philologies. Sample data from SMOR (German SFST grammar): These are thus aligned morphological dictionaries, but very rich (and also, idiosyncratic) in structure. Popular FST packages such as SFST (as available from the fst package in Debian and Ubuntu) allow to define application-specific file formats for morphological lexica, that bundle different pieces of morphological information with every individual morpheme. They thus require morphological dictionaries with specific processing instructions (which often have a linguistic interpretation, but, technically, are just treated like arbitrary string symbols). In rule-based morphological parsers, both lexicon and rules are normally formalized as finite state automata and subsequently combined. Their simplistic format is particularly well-suited for the application of machine learning techniques, and UniMorph in particular, has been subject of numerous shared tasks.įinite State Transducers (FSTs) are a popular technique for the computational handling of morphology, esp., inflectional morphology. Columns are BASE, DERIVED, RULE)Īt the time of writing (2021), all of these are non-aligned morphological dictionaries (see below). In UDer, additional information (part of speech) is encoded within the columns:

These feature simple tabular ( tab-separated) formats with one form in a row, and its derivation (UDer), resp., inflection information (UniMorph): Inspired by the success of the Universal Dependencies for cross-linguistic annotation of syntactic dependencies, similar efforts have emerged for morphology, e.g., UniMorph and UDer. Notable examples and formalisms Universal Morphologies There are two kinds of morphological dictionaries: morpheme-aligned dictionaries and full-form (non-aligned) dictionaries. In English give, gives, giving, gave and given are surface forms of the verb give. The corresponding lexical form of a surface form is the lemma followed by grammatical information (for example the part of speech, gender and number). Surface forms of words are those found in natural language text. In the fields of computational linguistics and applied linguistics, a morphological dictionary is a linguistic resource that contains correspondences between surface form and lexical forms of words.
