So far, there is no full-text search for the entries published under methodology. However, such a full-text search is planned and under development and will be available in one of the next versions of VerbaAlpina. In the meantime, the full-text search of the browser (usually Ctrl+F) can be used instead, after clicking on "Show all entries".

Show all entries

(no Tag)   Extralinguistic context   Functional areas   Information technology   Linguistics   Web page  

API  (Quote)

API stands for "application programming interface". VerbaAlpina provides such an interface at the address Detailed documentation of the syntax to be used there can be found in the following article: API documentation. The API allows specific content from the VA database (VA_DB) to be retrieved in defined formats via a browser. The selection of the data and the output format are controlled by URL parameters.

(auct. Stephan Lücke – trad. Christina Mutter)

Tags: Information technology

Beta Code  (Quote)

(auct. Thomas Krefeld | Stephan Lücke)

Tags: Linguistics Information technology

Code Page  (Quote)

(auct. Stephan Lücke)

Tags: Linguistics Information technology

Concept Description  (Quote)

(auct. Giorgia Grimaldi | Thomas Krefeld)

Tags: Information technology

Crowdsourcing  (Quote)

Although there are already a lot of relevant linguistic data regarding the fields of investigation of VerbaAlpina (especially in atlases and dictionaries), it is an aim of the project to collect new data. By this new collection, (1) inconsistencies between the existing sources shall be evened out, (2) gaps or rather inaccuracies shall be disposed and (3) antiquated designations or rather devices shall be marked as such. However, the new collection of data shall not be carried out by the traditional methods of field research, but by the means that the social media offer us by now. The corresponding methods are often subsumed under the term crowdsourcing. The reference to the crowd can in some respects be misunderstood, not least because many associate arbitrariness, amateurishness and insufficient reliability with the term. The reservations are not completely unjustified as the corresponding methods are indeed directed at a vague and anonymous crowd of potential interested persons. Fundamental problems arise from two directions: 1) from the scientific provider of the project, 2) from the target group of the project, which can consist in linguistic laymen, but not necessarily need to do so. The offer has to be adequately 'visible' and attractive and the target group has to have sufficient linguistic competence and sufficient knowledge regarding the specific subject. There are different strategies to handle this. One can try for example to increase the offer's attractiveness by designing it in an entertaining way and with interfaces that have play character. The project alliance play4science). The competence can be judged by specific questions of knowledge, but it is unquestionably more reliable to get confirmed and validated the provided data by other speakers from the same places.

(auct. Thomas Krefeld | Stephan Lücke – trad. Susanne Oberholzer)

Tags: Functional areas Information technology

Data Access Layer  (Quote)

(auct. Stephan Lücke)

Tags: Information technology

Digital Humanities  (Quote)

The project VerbaAlpina has been planned from the beginning with regard to suitability for the web as it wants to contribute to the transferring of established arts traditions (more precisely of geolinguistics), to the digital humanities.
This means as follows:
(1) The empirical basis of the research consists in data (cf. Schöch 2013), e.g. in digitally codified and structured units or at least in units that can be structured. The data the project is dealing with are partly already published data which are digitised secondarily (as e.g. the older material out of atlases), but partly also new data which still have to be collected. With regard to the relevant concepts the new data shall be as extensive as possible. Therefore, the method is quantitative and to a great extent inductive.
(2) The research communication takes place on the medial conditions of the internet. This allows to intertwine hypertextually different media (writing, picture, video and sound). Furthermore, the persons who are participating in the project either as researchers (especially as project partner) and/or as informants can communicate and cooperate with each other continuously.
(3) The interested researchers are offered to collaborate on the development of this collaborative research platform based on the project. This perspective is useful and gets the project further at least in two respects: it permits to integrate different sites and to make progress with the combination of information technology and linguistic geography by using public resources, i.e. without being forced to fall back upon the (legally and economically difficult) support of private IT companies.
(4) The knowledge which is relevant for the project can also continuously be accumulated and modified for a fairly long time although the guarantee of a lasting availability is still difficult to realise technically (cf. to this the important research infrastructure of CLARIN-D Anyhow, the publication of the results on real media (books, CDs, DVDs) is no fundamental request anymore. Nevertheless, a secondary print option is set up, a solution the online lexicography offers occasionally, as e.g. the exemplary Tesoro della Lingua Italiana delle Origini.

(auct. Thomas Krefeld – trad. Susanne Oberholzer)

Tags: Information technology

Digitisation  (Quote)

Within the context of VerbaAlpina, the term digitisation> is not only used to describe the simple use of computers for electronic data processing. The term describes essentially the digital deep development of the material by *structuring* it systematically and transparently and by categorising it.

VerbaAlpina works almost exclusively with the relational data model which organises the data material in principle in the form of tables. The tables consist in rows (= data sets, tuples) and columns (= attributes, fields, properties). Every table can be widened by additional rows and columns in every direction. Between the tables, there are logical relations which allow coherent nexus and corresponding synoptic depictions (the so-called "joins") of two or more tables. At the moment, VerbaAlpina uses the database management system MySQL for the management of the tables. However, the tables are not bound to this system, but can be exported at any time, for example in the form of text with separators that have to be defined unambiguously both for field and data set limits; they are exported together with the row names and the documentation of the logical relations (entity-relationship model). The XML structure that is often used at the moment is not used in the operational activities of VerbaAlpina. But XML is anchored as export format within the interface concept.

Besides the logical structuring of data, the coding of the characters is the second important concept in connection with the term "digitisation". The right handling of this topic is of fundamental importance with regard to the long-term filing of the data material. As far as possible, VerbaAlpina gets its bearings by the encoding table and the guidelines of the Unicode Consortium. In the case of the digitisation of characters that have not been included yet in the Unicode table the digital data capture of a single character takes primarily place by serialisation choosing a sequence of characters out of the Unicode code space x21 to x7E (within the ASCII range). The corresponding allocations are documented in special tables; this procedure allows a conversion in Unicode values which possibly will be available at a later date.

(auct. Stephan Lücke – trad. Susanne Oberholzer)

Tags: Linguistics Information technology

Entity-Relationship  (Quote)

In principle, data can be classed into so-called "entities". These are classes of data that show each a particular kind and number of specific features. So, the cities Trento, Innsbruck and Lucerne can form for example a class "places" which is characterised by the features "place name", "degree of longitude", "degree of latitude", "state" and "number of inhabitants". The single members of such a class differ from each other in the different values of the features that characterise this class.
In a relational database, each entity is ideally saved in an own table with the values of one specific feature in each table column. The table rows contain the the individual members of the data class (entity). In most cases – also in VerbaAlpina -, a relational database represents a collection of different entities (and hence tables) between which there a logical relations. So, the entity "informant" which is defined by the features "age", "sex", "birthplace" and "place of residence" is linked logically to the entity "places" in such a way that the values of the features "birthplace" and "place of residence" have a correspondence in the entity "places". Relations between members of these two entities result from the concordance of the features' values in each entity, which are congruent in their nature. In this case, there could result theoretically an assignment from identical values of the features "birthplace" and "place of residence", by which the geographical coordinates of the birthplace could be assigned indirectly to an informant. Looking at this specific example, one can easily recognize that problems could arise due to homonyms. To avoid such problems, integral numbers are usually applied as identifiers (briefly: "ID") that mark the members of an entity unambiguously.
This system of entities and their logical relations, which was sketched above, is called entity-relationship. The data stock, which is stored in a relational database can hardly be understood and used without any explanation of the dependences between the data within the database. Usually, entity-relationship is illustrated in form of a graphic scheme.
The entity-relationship is subject to permanent adaptations (and hereby changes) during the cyclic development phases of VerbaAlpina (cf. Versioning). Each filed version of VerbaAlpina will be stored with the the corresponding entity-relationship model of the underlying database version in form of an ER diagram. This diagram is created using the program yEd and saved as (GraphML) and as PDF document. The following chart is based on the entities and links of the database VA_XXX as it was on 20/03/125, but it does not reproduce it completely and has to be understood as illustrating example:

(auct. Stephan Lücke – trad. Susanne Oberholzer)

Tags: Information technology

Geocoding  (Quote)

Geocoding is a fundamental ordering criterion of the data which are administrated by VerbaAlpina; degrees of latitude and longitude are used for geocoding. The exactness of this coding varies depending on the data type; VerbaAlpina aims at a coding as exact as possible, to within a metre. In the case of linguistic data from atlases and dictionaries, it is generally only possible to do an approximate coding according to the place name. However, in the case of e.g. archaeological data a geocoding to within a meter is actually possible. Spots, lines (as streets, rivers etc.) and surfaces can be saved. For the geocoding, the so-called WKT format ( is essentially used, which is transferred to a specific MySQL format in the VA database by means of the function geomfromtext() ( and is saved like this. The output in WKT result is done by means of the MySQL function astext().
The reference grid of the geocoding is the network of municipalities in the Alpine region, which can be output as surface or as spots, as required. The basis is the courses of the municipalities’ border from circa 2014, which VerbaAlpina received from its partner "Alpine Convention". A constant update of these data (which can often change due to administrative reforms) is unnecessary because they form merely a geographical reference frame. The spot depiction of the municipality grid is deduced in an algorithmic way from the municipalities’ borders and therefore secondary. The calculated municipality spots represent the geometric midpoints of the municipality surfaces and mark only by case theirs centre. If necessary, all data can be projected individually or in an accumulated way on the calculated municipality spot. This is the case for linguistic data out of atlases and dictionaries.
Additionally, there will be a honeycombed grid which is quasi geocoded: it portrays in fact the approximate position of the municipalities to each other, but it assigns at the same time an idealised surface with each time the same form and size to each municipality territory. By doing so, two alternative methods of mapping are offered to the users. Both have their advantages and disadvantages and both offer a certain suggestive potential because of their figurativeness. The topographic depiction gives a better insight into the concrete spatiality (with its very special ground profile, single transitions, valley courses, inaccessible valley exits etc.) because of its precision. The honeycomb map in comparison allows more abstracted visualisations of the data as it balances the sizes of municipality surfaces and agglomeration resp. scattered settlements. This is especially useful for quantitative maps because perceiving the size of the surface the impression of quantitative weight is instinctively created.

(auct. Thomas Krefeld | Stephan Lücke – trad. Susanne Oberholzer)

Tags: Linguistics Information technology Extralinguistic context

Long-term Archiving  (Quote)

(auct. Stephan Lücke)

Tags: Information technology

Modules  (Quote)

Cf. Versioning

Tags: Information technology

Transcription  (Quote)

The linguistic material is represented graphically in double way in order to fulfil the opposite demands of being faithful to the sources and of easy comparability:

(1) Input version in original transcription
In the VA portal, sources are brought together which come from different discipline's traditions (Romance studies, German studies, Slavonic studies) and which represent different historical stages of dialectological research. Some of the dictionary data have been collected at the beginning of the last century (GPSR) and others only a few years ago (ALD). It is therefore necessary for reasons of the history of science to respect the original transcription to the greatest possible extent. For technical reasons, it is, however, impossible to keep unchanged certain conventions. This is true especially for the vertical combination of base characters ('letters') and diacritical marks, as e.g. if a symbol for stress accent is positioned over a symbol for length over a vowel over a symbol for closure (Beta code). These conventions are transferred to linear sequences of characters in each time defined technical transcriptions, in which, however, exclusively ASCII characters are used (so-called Beta code). For the beta encoding, one can make to most of graphic resemblances between the original diacritic and the ASCII equivalence, which are intuitively understandable, to a certain degree. They are mnemonically favourable.

(2) Output version in IPA
The data output in a uniform transcription is desirable from the point of view of comparability and user-friendliness. Therefore, all Beta Codes are transferred to IPA characters using specific substitution routines. There are a few inevitable incompatibilities for the cases where two different basic characters in IPA correspond to one basic character which is specified by diacritics in the input transcription. This is especially the case for the degrees of vowel height: in the palatal row, the two basic characters <i> and <e> in combination with the diacritic closure dot and one or two opening ticks allow depicting six degrees of vowel height. In Beta encoding these vowels are the following: i – i( – i((– e?-- e – e(– e((. In IPA, there are only four basic characters for these vowels: i – ɪ – e – ɛ.

(auct. Thomas Krefeld – trad. Susanne Oberholzer)

Tags: Linguistics Information technology

Transcription rules  (Quote)

We distinguish between base characters and diacritics.

Base characters are located on the baseline. All characters that are not on the baseline are considered diacritics. Purely typographic variations of a base character are also treated as diacritics in the broader sense, e.g. if the base character is displayed smaller than the others.

Base characters

Base characters that exist in the ASCII table are retained (= all Latin characters; not German umlauts!). All other base characters are transcribed by a combination of a letter and a numeral (see table).


Diacritics are always placed after the base character to which they are assigned. If there are several diacritics on one base character, the following sequence must be observed:

  • First, diacritics that mark a typographic variation of a base character are written, e.g. if the base character is set higher or lower. These diacritics are shown in yellow in the table.
  • Then diacritics below and above the base character are written down from bottom to top. In particular, the diacritics below a base character (marked in green) must always come before those above a base character (marked in blue).
  • At last diacritics which come after the base character are written, e.g. a length sign or an apostrophe after a base character. These are marked in orange in the table.

Each character used for the transcription of a diacritic may only occur once per base character. There are special rules for the repetition of the same diacritic, e.g. : for two points above a base character or \2 for a double grave accent.

If a diacritic refers to two or more characters, e.g. a͠e, the base characters are placed in square brackets, in this case [ae]~.

Brackets and Comments

Comments (whether in brackets or not) are placed in angle brackets after the attestation to which they refer, e.g: (m.) → <m.>. If the entire attestation is bracketed, the attestation is transcribed without brackets and the remark "in brackets" is added in angle brackets.


Possible morphosyntactic variants of an attestation such as singular and plural forms are separated by commas, different word forms are separated by semicolons. This corresponds to the representation of the attestations in the atlas AIS. If the attestations are separated by other separators (e.g. / or -) in the source, these must be replaced accordingly by commas and semicolons in the transcription. Any numbering of different variants is omitted.

Typified attestations

If the source contains both a single attestation and an already typified variant for an informant, only the single attestation is transcribed. Only if this is not possible, the typified variant will be transcribed and will be marked as "phonetic type" or "morpho-lexical type" via the corresponding selection menu. In contrast to single attestations, for types also capital letters are allowed, otherwise the same rules apply for transcription.

If there exist single attestations as well as typified attestations for an informant, two different lines must be created for the transcription, in which the transcripts are marked accordingly as type or as attestation.

Special characters in the source

All characters used as diacritics in the transcription (including numerals), must be masked by prefixing them with two backslashes, e.g. * → *, if they appear as original characters. This only applies to characters that are part of the phonetic transcription of the single attestation in the source. For characters that have a certain meaning, this meaning must instead be written as a remark in angle brackets behind the attestation. For example, the character † stands for an obsolete form in the AIS and must be marked with in the transcription. Brackets are always replaced (see brackets and comments and placeholders).

The following characters from the AIS can simply be omitted: ℗, ○, P, S, +


All forms of placeholders or shortened spellings must be replaced by the character string they represent. If an attestation with comments is split into multiple attestations, these must be repeated. The following table gives some examples:

Attestation from the sourceTranscription
u kā́ni; i ~u ka-/ni; i ka-/ni
(Alm)hütteAlmhu:tte; Hu:tte
(um bé̜l) pašọ́ɳ (selten)um be(/l pas^o?/n1 <selten>; pas^o?/n1 <selten>

There is an exception for small phonetic variations in already typified attestations, e.g. the morpho-lexical type "Sänn(e)hütte" can be transcribed as "Sa:nn\(e\)hu:tte".

Transcription not possible

The "vacat" button is used for informants for whom no data is entered in a map. If the transcription of an attestation is problematic (e.g. because it is not possible or unclear according to these rules), the "problem" button is used.

Transcription preview

When you enter the transcription, a preview of how the attestation will look after the reconversion is displayed behind the corresponding text field for comparison purposes. If the text "Not valid" appears, the attestation is transcribed incorrectly and cannot be entered. If individual characters appear highlighted in red in the beta code, this means that the attestation is valid, but the character cannot yet be converted. This is mainly the case with characters that have not yet occurred in this form. In this case, the attestation can be entered as usual.

Base characters

Character Description Beta code Comment
Greek alphaa1
mirror-inverted aa2
ligature aea3
Greek betab1
crossed out bb2
Greek Chic1
sign for glottis closurec2
crossed out cc3
Greek deltad1
crossed out dd2
tick to the left of the ee2
Greek epsilone3
Greek Phif1
labiodental fortisf2
Greek gammag1
open g on the rightg2
g with bottom lineg3
glottal beatg4
i with slanted linei1
i without doti2
i with horizontal linei3
l with diagonal strokel1
l with strongly curved linel2
l with two curved linesl3
l with curved linel6
l with horizontal strokel7
sign for velar "n" (German: kling)n1
velar nasalsn2
ligature oeo1
open o on the lefto2
o with tick at the upper right margino3
o with ogoneko4
o with diagonal lineo5
Greek omegao6
the number Pip1
q with horizontal lineq1
Upper case letter R at the height of a lower case letterr1
s with diagonal stroke lefts2
Greek thetat1
Stronger curved uu1


Character Description Beta code Comment Example
dot under base character?s?
dot above base character?1e?1
two dots above base character:a:
two dots under base character:1u:1
tick open to the right under base character(o(
two ticks open to the right under base character(1e(1
semicircle open to the left (spiritus lenis) above base character)r)
semicircle open to the left under base character)1o)1
acute on base character/o/
double acute on base character/2o/2
gravis on base character</td>a</td>
double gravis on base character\2a\2
gravis with dot at the upper end on base character\3u\3
horizontal line above base character-minus sign -a-
two horizontal lines above base character-2minus sign -a-2
horizontal line under base character_underscore_n_
Double horizontal line under base character_1n_1
tilde ABOVE base character~e~
stronger curved tilde ABOVE base character~1
tilde UNDER base character+e+
semicircle opened to the TOP ABOVE base character!a!
semicircle opened to the BOTTOM ABOVE base character%a%
semicircle opened to the BOTTOM UNDER base character@a@
semicircle opened to the TOP UNDER base character@1k@1
circle ABOVE base character|u|
circle UNDER base character&s&
vertical line under base character$e$
"circumflex" under base character^2o^2
"hacek" under base character^3d^3
infinity symbol above base character"u"
"greater-than symbol" above base character>n>
cross under base character*a*
cross above base sign*1a*1
apostrophe after base character'on the #-keyg'
inverted apostrophe after base character'1on the #-keya'1
elevated vertical line after base character'2on the #-keyg'2
tick after base character=k=
superscript number after base character\<n>0mask number with \ and put 0 after itc\20
IPA length character:2a:2
half IPA length character:3a:3
base character above the baseline0a0b
base character on the baseline, smaller than all other characters8n8d
base character below the baseline9i9n
upper or lower diacritics in brackets[<d>]Diacritic in brackets between square bracketsu[:] bzw. e[?]
base character above base character{<z>}elevated base character between bracesa{o}
base character below base character{1<z>}a{1o}

Special characters

In principle, these characters are equivalent to base characters, except that they cannot be combined with diacritics.

Character Description Beta code Example
A dot, before or after the base character. Higher than the baseline..1.1e(ko?/n1

Special Blanks
(Regular blanks are represented by the character ␣ in this table)

Character Description Beta code Example
blank with curve{␣}w{␣}d

(auct. Stephan Lücke | Florian Zacherl – trad. Christina Mutter)

Tags: Information technology

Versioning  (Quote)

VerbaAlpina is composed of the following modules:

- VA_DB: data stock in the (MySQL) project database (va_xxx)
- VA_WEB: programme code of the project portals web interface along with the accompanying WordPress database (va_wp)
- VA-MT: media data files (photographs, films, text documents, sound recordings), that are in the media library of the web interface

All three modules form a consistent whole with mutual nexus and dependencies and can therefore not be separated from each other. During the project term, the actual status of the modules VA-DB and VA_Web will be "frozen" simultaneously at regular intervals in form of an electronic copy. These frozen copies get a version number according to the scheme [calendar year]/[serial number] (e.g. 15/1). The productive version of VA gets the marking XXX.

The production of copies of the VA media center (VA_MT) is unthinkable due to the generally enormous size of media data files. For this reason, no copy of this module is created during the process of version control. That is why elements that once have been filed in the media center cannot be removed from it if only one single VA version is combined with them.

In the project portal, there is the possibility to change between the "productive" VA version (subject to constant changes) and the filed ("frozen") versions. In the portal itself, an appropriate colouring of the background or rather certain user elements will inform if the productive or on of the filed versions of VA is activated at the moment. *Exclusively* the filed versions of VA are citable.

(auct. Stephan Lücke – trad. Susanne Oberholzer)

Tags: Information technology