Apertiumpp:   various utilities for writing Apertium project’s rule-based machine translators
1 Installation
2 Reference
2.1 Glossary
explain
2.2 Dictionary
dictionary
D-0
D-B-0
sdef
SDEF-0
pardef
section
e
parse-dix
parse-lexc
dictionary->dix
dictionary->lexc
2.3 Apertium-pkg
apkg
APKG-COMMON
0.0

Apertiumpp: various utilities for writing Apertium project’s rule-based machine translators

1 Installation

Here are instructions on how to install apertiumpp:

Later on, when there are any updates in the Git repository, you should run raco setup -p apertiumpp to compile the latest version.

Alternatively, if you don’t plan to hack on apertiumpp itself, you can install Racket and then run the following command in your terminal:

raco pkg install

https://github.com/taruen/apertiumpp.git?path=apertiumpp

2 Reference

2.1 Glossary

 (require apertiumpp/glossary) package: apertiumpp

procedure

(explain tag lang)  (or/c string? exn:unk-sym? exn:no-desc?)

  tag : string?
  lang : symbol?
Return a description, in language "lang" (ISO 639-3 code), for what a given Apertium symbol stans for.

Raise "exn:unk-sym" if the symbol is not in the glossary, and "exn:no-desc" if it is in the glossary, but there isn’t a description for it in language "lang".

Examples:
> (explain "n" 'eng)

"Common noun"

> (explain "a-made-up-non-existing-tag" 'eng)

uncaught exception: #<exn:unk-sym>

> (explain "n" 'a-made-up-non-existing-lang)

uncaught exception: #<exn:no-desc>

2.2 Dictionary

struct

(struct dictionary (lang alphabet sdefs pardefs sections attrs))

  lang : (listof symbol?)
  alphabet : string?
  sdefs : (listof sdef?)
  pardefs : (listof pardef?)
  sections : (listof section?)
  attrs : (hash/c symbol? string?)
interpretation: an Apertium dictionary (read: contents of a .dix or .lexc file).

value

D-0 : dictionary?

An example of an empty English dictionary:
(define D-0
  (dictionary
   '(eng)
   "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
   '() '() '() (hash)))
An empty English-Tatar bilingual dictionary:
(define D-B-0
  (dictionary
   '(eng tat)
   "" '() '() '() (hash)))

struct

(struct sdef (n attrs))

  n : string?
  attrs : (hash/c symbol? string?)
interpretation: a symbol definition.
  • sdef-n : grammatical symbol used in dictionary pardefs or entries.

  • sdef-attrs : user-defined attributes with name and value

value

SDEF-0 : sdef?

A noun symbol.

(define SDEF-0 (sdef "n" (hash 'c "Noun")))

struct

(struct pardef (n con attrs))

  n : string?
  con : (listof e?)
  attrs : (hash/c symbol? string?)
interpretation: a paradigm definition.

struct

(struct section (n type con attrs))

  n : string?
  type : (or/c STANDARD PREBLANK POSTBLANK INCONDITIONAL)
  con : (listof e?)
  attrs : (hash/c symbol? string?)
interpretation: a section of a dictionary with entries
  • section-n : name/id of the section

  • section-type : A quote from the {official documentation}:The value of the attribute type is used to express the kind of string tokenization applied in each dictionary section: the possible values of this attribute are: \"standard\", for almost all the forms of the dictionary (conditional mode), \"preblank\" and \"postblank\", for the forms that require an unconditional tokenisation and the placing of a blank (before and after, respectively), and \"inconditional\" for the rest of forms that require unconditional tokenization."

  • section-con : content (entries)

  • section-attrs : any additional attributes

struct

(struct e (o re lm l r par attrs))

  o : (or/c #f NA NG)
  re : (or/c #f string?)
  lm : (or/c #f string?)
  l : string?
  r : string?
  par : (or/c #f string?)
  attrs : (hash/c symbol? any/c)
interpretation: an entry in a dictionary paradigm or section.
  • e-o : usage restriction, either NA (‘do Not Analyse’, only generate this form) or NG (‘do Not Generate’, only analyse this form) or #f (not set)

  • e-re : regular expression or #f (not set)

  • e-lm : lemma or #f (not set)

  • e-l : left/upper/lexical string

  • e-r : right/lower/surface string

  • e-par : name of the (inflection) paradigm or #f (not set)

  • e-attrs : any additional attributes

procedure

(parse-dix bidix)  dictionary?

  bidix : string?
WISHLIST ITEM. Parse .dix file and return a dictionary.

procedure

(parse-lexc lexc)  dictionary?

  lexc : string?
WISHLIST ITEM. Parse .lexc file and return a dictionary.

procedure

(dictionary->dix d)  string?

  d : dictionary?
WISHLIST ITEM. Convert a dictionary into a .dix file.

procedure

(dictionary->lexc d)  string?

  d : dictionary?
WISHLIST ITEM. Convert a dictionary into a .lexc file.

2.3 Apertium-pkg

struct

(struct apkg (id
    gitattributes
    gitignore
    authors
    copying
    changelog
    news
    autogen.sh))
  id : string?
  gitattributes : string?
  gitignore : string?
  authors : string?
  copying : string?
  changelog : string?
  news : string?
  autogen.sh : string?
An abstract linguistic data package from the Apertium project.
  • id : by convention ISO 639-3 code of the language, e.g. kaz, or a pair of ISO 639-3 codes, e.g. kaz-tat

Content common to all Apertium linguistic data packages. Corresponds to any-module of apertium-init.