Apertiumpp: various utilities for writing Apertium project’s rule-based machine translators
1 Installation
Here are instructions on how to install apertiumpp:
Install the Racket language.
Clone this repository: git clone https://github.com/taruen/apertiumpp.
cd to the apertiumpp/apertiumpp directory.
Install the apertiumpp package: raco pkg install.
Later on, when there are any updates in the Git repository, you should run raco setup -p apertiumpp to compile the latest version.
Alternatively, if you don’t plan to hack on apertiumpp itself, you can install Racket and then run the following command in your terminal:
raco pkg install |
https://github.com/taruen/apertiumpp.git?path=apertiumpp |
2 Reference
2.1 Glossary
(require apertiumpp/glossary) | package: apertiumpp |
Raise "exn:unk-sym" if the symbol is not in the glossary, and "exn:no-desc" if it is in the glossary, but there isn’t a description for it in language "lang".
> (explain "n" 'eng) "Common noun"
> (explain "a-made-up-non-existing-tag" 'eng) uncaught exception: #<exn:unk-sym>
> (explain "n" 'a-made-up-non-existing-lang) uncaught exception: #<exn:no-desc>
2.2 Dictionary
(require apertiumpp/dictionary) | package: apertiumpp |
struct
(struct dictionary (lang alphabet sdefs pardefs sections attrs))
lang : (listof symbol?) alphabet : string? sdefs : (listof sdef?) pardefs : (listof pardef?) sections : (listof section?) attrs : (hash/c symbol? string?)
dictionary-lang : ISO 639-3 code(s) of the language(s)
dictionary-alphabet : relevant for a monolingual dictionary only
dictionary-sdefs : grammatical symbols used in dictionary entries
dictionary-pardefs : paradigm definitions
dictionary-sections : lists of dictionary entries
dictionary-attrs : user-defined attributes with name and value
value
(define D-0 (dictionary '(eng) "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" '() '() '() (hash)))
value
(define D-B-0 (dictionary '(eng tat) "" '() '() '() (hash)))
sdef-n : grammatical symbol used in dictionary pardefs or entries.
sdef-attrs : user-defined attributes with name and value
pardef-n : the name of the paradigm
pardef-con : its content
pardef-attrs : any additional attributes
struct
n : string? type : (or/c STANDARD PREBLANK POSTBLANK INCONDITIONAL) con : (listof e?) attrs : (hash/c symbol? string?)
section-n : name/id of the section
section-type : A quote from the {official documentation}:The value of the attribute type is used to express the kind of string tokenization applied in each dictionary section: the possible values of this attribute are: \"standard\", for almost all the forms of the dictionary (conditional mode), \"preblank\" and \"postblank\", for the forms that require an unconditional tokenisation and the placing of a blank (before and after, respectively), and \"inconditional\" for the rest of forms that require unconditional tokenization."
section-con : content (entries)
section-attrs : any additional attributes
struct
o : (or/c #f NA NG) re : (or/c #f string?) lm : (or/c #f string?) l : string? r : string? par : (or/c #f string?) attrs : (hash/c symbol? any/c)
e-o : usage restriction, either NA (‘do Not Analyse’, only generate this form) or NG (‘do Not Generate’, only analyse this form) or #f (not set)
e-re : regular expression or #f (not set)
e-lm : lemma or #f (not set)
e-l : left/upper/lexical string
e-r : right/lower/surface string
e-par : name of the (inflection) paradigm or #f (not set)
e-attrs : any additional attributes
procedure
(parse-dix bidix) → dictionary?
bidix : string?
procedure
(parse-lexc lexc) → dictionary?
lexc : string?
procedure
(dictionary->dix d) → string?
d : dictionary?
procedure
(dictionary->lexc d) → string?
d : dictionary?
2.3 Apertium-pkg
(require apertiumpp/apertium-pkg) | package: apertiumpp |
struct
(struct apkg ( id gitattributes gitignore authors copying changelog news autogen.sh)) id : string? gitattributes : string? gitignore : string? authors : string? copying : string? changelog : string? news : string? autogen.sh : string?
id : by convention ISO 639-3 code of the language, e.g. kaz, or a pair of ISO 639-3 codes, e.g. kaz-tat
value