Methodology

Sorting

Show all entries

A   B   C   D   E   F  G   H  I   J  K   L   M   N   O   P   Q  R   S   T   U  V   W   X  Y  Z 


Transcription  (Quote)

The linguistic material is represented graphically in double way in order to fulfil the opposite demands of being faithful to the sources and of easy comparability:

(1) Input version in original transcription
In the VA portal, sources are brought together which come from different discipline's traditions (Romance studies, German studies, Slavonic studies) and which represent different historical stages of dialectological research. Some of the dictionary data have been collected at the beginning of the last century (GPSR) and others only a few years ago (ALD). It is therefore necessary for reasons of the history of science to respect the original transcription to the greatest possible extent. For technical reasons, it is, however, impossible to keep unchanged certain conventions. This is true especially for the vertical combination of base characters ('letters') and diacritical marks, as e.g. if a symbol for stress accent is positioned over a symbol for length over a vowel over a symbol for closure (Beta code). These conventions are transferred to linear sequences of characters in each time defined technical transcriptions, in which, however, exclusively ASCII characters are used (so-called Beta code). For the beta encoding, one can make to most of graphic resemblances between the original diacritic and the ASCII equivalence, which are intuitively understandable, to a certain degree. They are mnemonically favourable.

(2) Output version in IPA
The data output in a uniform transcription is desirable from the point of view of comparability and user-friendliness. Therefore, all Beta Codes are transferred to IPA characters using specific substitution routines. There are a few inevitable incompatibilities for the cases where two different basic characters in IPA correspond to one basic character which is specified by diacritics in the input transcription. This is especially the case for the degrees of vowel height: in the palatal row, the two basic characters <i> and <e> in combination with the diacritic closure dot and one or two opening ticks allow depicting six degrees of vowel height. In Beta encoding these vowels are the following: i – i( – i((– e?-- e – e(– e((. In IPA, there are only four basic characters for these vowels: i – ɪ – e – ɛ.

(auct. Thomas Krefeld – trad. Susanne Oberholzer)

Tags: Linguistics Information technology



Transcription rules  (Quote)

We distinguish between base characters and diacritics.

Base characters are located on the baseline. All characters that are not on the baseline are considered diacritics. Purely typographic variations of a base character are also treated as diacritics in the broader sense, e.g. if the base character is displayed smaller than the others.

Base characters


Base characters that exist in the ASCII table are retained (= all Latin characters; not German umlauts!). All other base characters are transcribed by a combination of a letter and a numeral (see table).

Diacritics

Diacritics are always placed after the base character to which they are assigned. If there are several diacritics on one base character, the following sequence must be observed:

  • First, diacritics that mark a typographic variation of a base character are written, e.g. if the base character is set higher or lower. These diacritics are shown in yellow in the table.
  • Then diacritics below and above the base character are written down from bottom to top. In particular, the diacritics below a base character (marked in green) must always come before those above a base character (marked in blue).
  • At last diacritics which come after the base character are written, e.g. a length sign or an apostrophe after a base character. These are marked in orange in the table.

Each character used for the transcription of a diacritic may only occur once per base character. There are special rules for the repetition of the same diacritic, e.g. : for two points above a base character or \2 for a double grave accent.

If a diacritic refers to two or more characters, e.g. a͠e, the base characters are placed in square brackets, in this case [ae]~.

Brackets and Comments


Comments (whether in brackets or not) are placed in angle brackets after the attestation to which they refer, e.g: (m.) → <m.>. If the entire attestation is bracketed, the attestation is transcribed without brackets and the remark "in brackets" is added in angle brackets.

Separators


Possible morphosyntactic variants of an attestation such as singular and plural forms are separated by commas, different word forms are separated by semicolons. This corresponds to the representation of the attestations in the atlas AIS. If the attestations are separated by other separators (e.g. / or -) in the source, these must be replaced accordingly by commas and semicolons in the transcription. Any numbering of different variants is omitted.

Typified attestations


If the source contains both a single attestation and an already typified variant for an informant, only the single attestation is transcribed. Only if this is not possible, the typified variant will be transcribed and will be marked as "phonetic type" or "morpho-lexical type" via the corresponding selection menu. In contrast to single attestations, for types also capital letters are allowed, otherwise the same rules apply for transcription.

If there exist single attestations as well as typified attestations for an informant, two different lines must be created for the transcription, in which the transcripts are marked accordingly as type or as attestation.

Special characters in the source


All characters used as diacritics in the transcription (including numerals), must be masked by prefixing them with two backslashes, e.g. * → *, if they appear as original characters. This only applies to characters that are part of the phonetic transcription of the single attestation in the source. For characters that have a certain meaning, this meaning must instead be written as a remark in angle brackets behind the attestation. For example, the character † stands for an obsolete form in the AIS and must be marked with in the transcription. Brackets are always replaced (see brackets and comments and placeholders).

The following characters from the AIS can simply be omitted: ℗, ○, P, S, +

Placeholders


All forms of placeholders or shortened spellings must be replaced by the character string they represent. If an attestation with comments is split into multiple attestations, these must be repeated. The following table gives some examples:

Attestation from the sourceTranscription
u kā́ni; i ~u ka-/ni; i ka-/ni
(Alm)hütteAlmhu:tte; Hu:tte
(um bé̜l) pašọ́ɳ (selten)um be(/l pas^o?/n1 <selten>; pas^o?/n1 <selten>


There is an exception for small phonetic variations in already typified attestations, e.g. the morpho-lexical type "Sänn(e)hütte" can be transcribed as "Sa:nn\(e\)hu:tte".


Transcription not possible


The "vacat" button is used for informants for whom no data is entered in a map. If the transcription of an attestation is problematic (e.g. because it is not possible or unclear according to these rules), the "problem" button is used.

Transcription preview


When you enter the transcription, a preview of how the attestation will look after the reconversion is displayed behind the corresponding text field for comparison purposes. If the text "Not valid" appears, the attestation is transcribed incorrectly and cannot be entered. If individual characters appear highlighted in red in the beta code, this means that the attestation is valid, but the character cannot yet be converted. This is mainly the case with characters that have not yet occurred in this form. In this case, the attestation can be entered as usual.

Base characters


Character Description Beta code Comment
α
Greek alphaa1
ɒ
mirror-inverted aa2
æ
ligature aea3
β
Greek betab1
ƀ
crossed out bb2
χ
Greek Chic1
ҁ
sign for glottis closurec2
c
crossed out cc3
ɕ
c4
δ
Greek deltad1
đ
crossed out dd2
ð
ethd3
ə
schwae1
tick to the left of the ee2
ε
Greek epsilone3
φ
Greek Phif1
ƒ
labiodental fortisf2
ɣ
Greek gammag1
open g on the rightg2
g with bottom lineg3
ʔ
glottal beatg4
ɥ
h1
i with slanted linei1
ı
i without doti2
ɨ
i with horizontal linei3
ɪ
i4
ɟ
j1
ł
crossed out ll1
l with strongly curved linel2
l with two curved linesl3
λ
Lambdal4
ʎ
l5
ɱ
m1
ɳ
sign for velar "n" (German: kling)n1
ŋ
velar nasalsn2
ɲ
n3
œ
ligature oeo1
ɔ
open o on the lefto2
ơ
o with tick at the upper right margino3
ǫ
o with ogoneko4
ø
o with diagonal lineo5
ω
Greek omegao6
π
the number Pip1
þ
thornp2
q with horizontal lineq1
ʀ
Upper case letter R at the height of a lower case letterr1
ɹ
r2
ɾ
r3
ʃ
Eshs1
s with diagonal stroke lefts2
ʂ
s3
ϑ
Greek thetat1
Stronger curved uu1
ʊ
u2
ʒ
Ezhz1
ʑ
z2

Diacritics


Character Description Beta code Comment Example
dot under base character?s?
ė
dot above base character?1e?1
ä
two dots above base character:a:
two dots under base character:1u:1
tick open to the right under base character(o(
two ticks open to the right under base character(1e(1
semicircle open to the left (spiritus lenis) above base character)r)
semicircle open to the left under base character)1o)1
ç
cedilla)2c(2
ó
acute on base character/o/
double acute on base character/2o/2
à
gravis on base character</td>a</td>
double gravis on base character\2a\2
gravis with dot at the upper end on base character\3u\3
ā
horizontal line above base character-minus sign -a-
ā̄
two horizontal lines above base character-2minus sign -a-2
horizontal line under base character_underscore_n_
Double horizontal line under base character_1n_1
tilde ABOVE base character~e~
stronger curved tilde ABOVE base character~1
tilde UNDER base character+e+
semicircle opened to the TOP ABOVE base character!a!
semicircle opened to the BOTTOM ABOVE base character%a%
semicircle opened to the BOTTOM UNDER base character@a@
semicircle opened to the TOP UNDER base character@1k@1
circle ABOVE base character|u|
circle UNDER base character&s&
vertical line under base character$e$
hacek^g^
ĝ
circumflex^1g^1
"circumflex" under base character^2o^2
"hacek" under base character^3d^3
u
infinity symbol above base character"u"
"greater-than symbol" above base character>n>
cross under base character*a*
cross above base sign*1a*1
g’
apostrophe after base character'on the #-keyg'
inverted apostrophe after base character'1on the #-keya'1
elevated vertical line after base character'2on the #-keyg'2
tick after base character=k=
superscript number after base character\<n>0mask number with \ and put 0 after itc\20
IPA length character:2a:2
half IPA length character:3a:3
ᵃb
base character above the baseline0a0b
base character on the baseline, smaller than all other characters8n8d
ᵢn
base character below the baseline9i9n
upper or lower diacritics in brackets[<d>]Diacritic in brackets between square bracketsu[:] bzw. e[?]
base character above base character{<z>}elevated base character between bracesa{o}
base character below base character{1<z>}a{1o}

Special characters

In principle, these characters are equivalent to base characters, except that they cannot be combined with diacritics.

Character Description Beta code Example
·e̜kọ́ɳ
A dot, before or after the base character. Higher than the baseline..1.1e(ko?/n1

Special Blanks
(Regular blanks are represented by the character ␣ in this table)

Character Description Beta code Example
w‿d
blank with curve{␣}w{␣}d


(auct. Stephan Lücke | Florian Zacherl – trad. Christina Mutter)

Tags: Information technology



Typification  (Quote)

The typification of the geocoded linguistic data is one of the fundamental requests of VerbaAlpina. For this, in a first step tokens ('single words') are extracted from the input data after the transcription and registered in the database field of the same name, where this is possible.

The centre of VerbaAlpina's attention is the morphological typification of the collected linguistic material. A morphological type is defined by the agreement of the following properties: language family – part of speech – single word vs. affixed words – gender – lexical basic type. The form by which the morphological type is cited takes a bearing on the lemmas of selected reference dictionaries (see below).

The unity of all merged morpho-lexical types becomes clear by means of the assignment to a common lexical type – also over language borders. By doing so, the following nouns and verbs (which are not described here in detail) can be assigned to one singular basic type malga:  malga (MOUNTAIN PASTURE, HERD), malgaro (ALPINE DAIRYMAN), malghese (HERDER), immalgare (TO MOVE ON THE MOUNTAIN PASTURE), dismalgare (TO LEAVE THE MOUNTAIN PASTURE). The lexical basic type, however, does not say anything about the word history of a single morpho-lexical type. It has to be brought out each time individually if a type with Latin-Romance etymon which today is sourced in the Germanic or Slovene language area (as e.g. Slovene bajta 'simple house') goes back to old local substratum or to more recent Romance language contact. For this reason, the designation "etymon" is avoided in this context as it refers in principle to the immediate historical preliminary stage of a word – even if the lexical basic type actually corresponds to the etymon of a morpho-lexical type in many cases.

The morpho-lexical types form the leading category for the management of linguistic data. They are comparable to the lemmas of lexicography. By means of the above-mentioned, robust criteria that can be well operationalised the four phonetic types barga, bark, margun, bargun with the meaning ALPINE HERDSMEN'S HUT, ALPINE STABLE can be reduced to three morpho-lexical types for example:





The membership of the morpho-lexical types to language families (gem., roa., sla.) depends on the respective source. It results automatically through the respective informants in the case of data from atlases or dictionaries and is written accordingly in the database. In case of data which VerbaAlpina itself collects through crowdsourcing, the membership  to a language/dialect of the informants is claimed and ideally confirmed quantitatively; the number of confirming informants becomes with that an instrument of data validation.

Morpho-lexical types are limited to a language family. It has to be cleared up by which form a morpho-lexical type should be represented in the search function on the interactive map. Regarding the Germanic and Slavonic language family the answer is quite easy as both are represented by only one standardised individual language ('German' [deu] / 'Slovenian' [slo]). The morpho-lexical types can be depicted by their standard variants, of course on condition that there are equivalents of the type in the standard language. Like this, all corresponding phonetic types of Alemannic and Bavarian which are variants of the standard form 'cheese' can for example be retrieved under this standard form. If there is no such standard variant, the lemmas of the big reference dictionaries (Idiotikon, WBÖ) are called up for comparison.

The situation is much more complex for the Romance language family due to its numerous, partly not sufficiently standardised small languages. For pragmatic reasons, the following way of proceeding has been chosen: all morpho-lexical types are represented by the French and Italian standard forms, if existing. All phonetic types which are variants of beurre/burro 'butter' can be retrieved under these two forms. The reference dictionaries are among others TLF and Treccani. If only one of the two standard languages has an appropriate variant, only this one comes out as in the case of ricotta (the membership to Italian is marked by the notation convention -/ricotta). If there is no variant of the type in any of the two Romance reference languages, we fall back upon an entry of a dialectal reference dictionary, for instance upon LSI. If there are no reliable entries in dialect dictionaries, VerbaAlpina suggests a basic type along with a graphic representation ('VA').

The phonetic typification of the linguistic material is scheduled in the overall concept and the technical implementation, but it is peripheral and therefore not put to practice consistently. The corresponding category is primarily therefore indispensable as linguistic atlases (e.g. SDS and VALTS) and dictionaries document sometimes exclusively phonetic types. When VerbaAlpina typificates phonetically, the tokens are divided up into phonetic types according to criteria of historical phonetics (database field 'phon_typ'). We examine an automation of the phonetic typification on the basis of Levenshtein algorithms and soundex algorithms. If the automation is shown to be possible, we will put it into practice.

The data diversity gets increasingly clear by typification (formation of classes). The following rule is valid: number of tokens > number of phonetic types > number of morpho-lexical types > basic typ. There can be, however, the extreme case of one single attestation (hapax) which corresponds to a token, a phonetic type and a morpho-lexical type as only representative of a basic type. It may make sense to filter out such hapax forms in the depiction.


(auct. Thomas Krefeld | Stephan Lücke – trad. Susanne Oberholzer)

Tags: Linguistics