Methodology

So far, there is no full-text search for the entries published under methodology. However, such a full-text search is planned and under development and will be available in one of the next versions of VerbaAlpina. In the meantime, the full-text search of the browser (usually Ctrl+F) can be used instead, after clicking on "Show all entries".
Sorting

Show all entries

A   B   C   D   E   F  G   H  I   J  K   L   M   N   O   P   Q  R   S   T   U  V   W   X  Y  Z 


chrono referencing



(auct. Katharina Knapp | Thomas Krefeld | Stephan Lücke)

Tags: Web page



Citation method



(auct. Stephan Lücke | Susanne Oberholzer)

Tags: Web page



Code Page



(auct. Stephan Lücke)

Tags: Linguistics Information technology



Concept



(auct. Thomas Krefeld)

Tags: Linguistics



Concept Description



(auct. Giorgia Grimaldi | Thomas Krefeld)

Tags: Information technology



Continuity

The question of the tradition continuity is of fundamental importance when reconstructing multilingual communication spaces. It has to be dealt with in an interdisciplinary way for the purposes of a data-driven, inductive approach. But even with combined efforts or several discplines it would not be reasonable to expect a large number of answers regarding the Pre-Roman substrata of the Alpine region. However, the starting point regarding the Romance substratum in the nowadays German- and Slowenian-speaking subareas is much better. The language shift from Romance to German is absolutely a historical constant, that can even be observed currently in the Grisons. The process starts already with the fall of the Roman infrastructure (in 476); the period which follows immediately thereafter is of most interest for linguistic history. However, it is extremely sparsely documented in writing so that the cooperation with other historic subjects, especially with archaeology, is imperative. Although there are still big research gaps, there is at least the work of Weindauer 2014, which studies the archaeological and onomastic sources (6th-8th century) of southern Upper Bavaria, the Salzburg area and the Tyrolean Inntal. Hence, one can exclude a longer, fondamental interruption of settlement between the Roman era and the times of the Bajuwaren ("eine längere, grundlegende Siedlungsunterbrechung zwischen Römer- und Bajuwarenzeit", Weindauer 2014, 248) as all evidence is in favour of a fluid transition of the settlement structure from the Late Antiquity to the Early Middle Ages ("einen fließenden Übergang der Besiedlungsstruktur von der Spätantike zum Frühmittelalter", Weindauer 2014, 248). Nevertheless, a gradual difference between the mentioned areas persists regarding the empirical consolidation: "Was bezüglich des Zusammenhangs spätantiker und frühmittelalterlicher Fundstellen für das oberbayerische Alpenvorland noch überwiegend theoretisch galt {...}, findet in den österreichischen Gebieten seine nachweisliche Bestätigung: Die frühmittelalterlichen Ortsgründungen des 6. Jhs. orientieren sich fast ausschließlich an spätrömischer Infrastruktur bzw. – soweit noch vorhanden – an der romanischen Siedlungsstruktur." (Weindauer 2014, 257; translation: What has been valid still mainly theoretically for the foothills of the Alps of southern Upper Bavaria regarding the link between places of discoveries from the Late Antiquity and the Early Middle Ages {...}, is confirmed by the Austrian areas: The Early Middle Ages foundings of places )

(auct. Thomas Krefeld – trad. Susanne Oberholzer)

Tags: Linguistics



Cooperation



(auct. Thomas Krefeld | Stephan Lücke)

Tags: Functional areas



Crowdsourcing

Although a lot of the relevant data analysed by VerbaAlpina already exists (especially in atlases and dictionaries), there are also plans to collect more. The goal is to (1) find and correct inconsistencies within the pre-existing sources, (2) fill in the gaps and resolve inaccuracies, and (3) identify traditional terms and tools that have been passed down from generation to generation. The new data, however, won’t be collected with the classical field study methods, but rather using the means provided by social media. This method is usually referred to as crowdsourcing.
„Crowdsourcing ist eine interaktive Form der Leistungserbringung, die kollaborativ oder wettbewerbsorientiert organisiert ist und eine große Anzahl extrinsisch oder intrinsisch motivierter Akteure unterschiedlichen Wissensstands unter Verwendung moderner IuK-Systeme auf Basis des Web 2.0 einbezieht." (Martin/Lessmann/Voß 2008, translation: Crowdsourcing is a form of interactive service, organised competitively or collaboratively. It involves a large number of participants who are motivated extrinsically or intrinsically and whose knowledge on the subject can vary, and it is made possible thanks to the use of modern information and communication systems, based on the web 2.0).
The term crowd can be at times misleading, as it is often associated with arbitrariness, amateurishness, and lack of reliability. Those concerns are not entirely unjustified, as it is true that this method must rely on an unknown and anonymous amount of potential interested parties. Issues can arise both on the side of the scientific researcher supplying the project as well as on the side of the addressees (who can be unaffiliated to the project, but do not have to be): The supply has to be sufficiently visible and attractive and the addressee has to be knowledgeable enough in the field studied. There are multiple strategies available to approach that. The attractiveness of the project can be increased through games, thus making it entertaining. An example for this is the play4science. The experiences collected in the project above, however, also indicate that communicating to the addressee that the findings of the study will directly influence their language skills and expertise, is much more effective in making the supply enticing (see citizen science-Projekte). The competence of the addressees can be assessed through targeted tests, but it is without a doubt more effective to confirm the validity of data by comparing it with other people’s answers. A successful geolinguistics pilot-project which made use of crowdsourcing is Stephan Elspaß and Robert Möller‘s Atlas zur deutschen Alltagssprache (AdA); this project marks a milestone for the development of digial geolingustics.
The goal of VerbaAlpina is to transcribe data from printed sources, especially dictionaries and linguistic atlases, and collect them in a structured database; examine and possibly correct already available transcriptions; typecast and assign lexical lemmata to already transcribed data. Comments about things like the origin and spread of words are also welcome. VerbaAlpina is also really interested in current linguistic material, which has yet to be documented in written sources. Whoever possessed knowledge about a dialect in the Alpine region is therefore welcome to submit expressions used in those dialects and contribute to the VerbaAlpina database. Collecting various sources allows for both an enrichment of the already printed data, as well as an understanding of the dynamic processes involved in causing linguistic change. This works best if a lot of people are involved. Pictures of typically alpine objects, but also of pastures, huts, flora, fauna, mointains and langscapes are also welcome. They will be saved in the media library.

Alongside the collaboration with VerbaAlpina, each user has the opportunity to create their own research environment, in which they can collect language (or any other) data. The only condition is that the data remains georeferenced. They have the chance to encrypt the data and only save it for their personal use or share it with other users, in order to discuss and comment. The potential of the database therefore relies on a large amount of data being accessible to the public.
VerbaAlpina documents the vitality of the crowdsourcing-tools in a separate page. The experiences collected over the last two years using the crowdsourcing tools have shown that publicity is essential for a successful implementation of this method. Whenever the crowdsourcing tool is addressed in the public, crowd activity increases.

Alongside WordPress and the crowdsourcing module developed within the platform, VerbaAlpina also uses Zooniverse’s platform. Zoonivers is a Citizen Science Portal that approaches volunteers on the Internet to carry out specific tasks. The tool developed for this purpose can be found here: https://www.zooniverse.org/projects/filip-hr/verbaalpina/classify. The original idea was to use the free and already available crowdsourcing tool developed by Zooniverse and therefore save time to focus on timesensitive and elaborate in-house development. Furthermore, Zooniverse’s community already offered a profitable space to reach a large amount of volunteering crowders and therefore reach a quantitatively large amount of people.
Originally, the first phase of the AIS was meant to translate crowder transcriptions into maps and language atlases. During the course of the development, which was led primarily by Filip Hristov, however, it became clear that the original expectations towards Zooniverse would not have been met. Therefore, already the preparation of the Zooniverse-software proved to be more complicated than expected. In addition to that, Zooniverse-moderators also contributed and demanded changes to the concept and concrete set-up of the tool. All this delayed the launch of the tool to the 31st of March 2021.

Concerns regarding whether or not voluntary and inexperienced crowders would be capable of fullfilling the rather complex process of transcriptions were raised already during the development phase. This led to a change in the task description assigned to crowders. Transcription was introduced as an optional additonal task. The priority was instead shifted towards the manually more intuitive assignment of written words to numbers on the map. The numbers stood for other informants, who had uttered the words. In other words, crowders are asked to pick out text from a map by encompassing it in a rectangle and then assigning that text box to the correct informant number. The coordinates of the text boxes then get saved in the VerbaAlpina database. Thanks to this data, it is possible to automatically select and translate picture-data from a map into the OCR-program.





Zooniverse: VerbaAlpina’s Crowdsouring-Tool. The crowders task is to select entries with a rectangle and assign the text to the correct informant number.

The actual problem with the use of OCR to read maps from linguistic atlases is the exact assignment of the entries to the informant numbers. The reason being that the amount of numbers can sometimes be very dense and the text is often written not on the number but beside it.




Section of the map AIS 1218 ("il siero del formaggio; il siero della ricotta")

The LMU has looked for solutions to this problem before, in a master thesis in the informatics institute (advisor: Prof. Kranzlmüller), the attempt was made to use KI-methods (Nguyen). Though the efforts and results allowed for significant insight, they provided no tools or practical solutions.

The OCR-procedure for the transcription of the text could be done for example with the program Abbyy FineReader. Ten years ago already, the ITG was able to successfuly test and use FineReader, which allows to transcribe even the most exotic writing systems in an ASCII-sequence. The details of this procedure are documented here Lücke/Riepl/Trautmann 2017, S. 125-129.




Speech bubble of the OCR-Program Abbyy FineReader. The Greek letter Theta (θ; https://www.fileformat.info/info/unicode/char/03b8/index.htm) has been assigned the HTML-conforming ASCII sequence θ. The method can be used for any writing system, for example the Böhmer-Ascoli system AIS. (Abb.: Lücke/Riepl/Trautmann 2017, S. 126 Abb. 39)




Speech bubble of the OCR-program. Transcription of a Greek text in HTML. (Abb.: Lücke/Riepl/Trautmann 2017, S. 128 Abb. 42)

The use of OCR-programmes also creates some issues. For one, text often overlaps, especially in densely labelled maps. Repeating words in AIS also get replaces with symbols such as (~) or (-). Another example can be identified in the image above. The plural e(r s?u:/me(z?e( is transcribed as e(r -z?e(. These cases require a manual postprocessing. The goal and hope is still to reduce the workload by implementing an automated OCR-method for transcriptions.

After starting the VerbaAlpina-Zooniverse collaboration at the end of March 2021, it has been found that voluntary users were increasingly willing to also partake in the optional transcription part of their task. This, however, caused inconsistencies and mistakes in the transcription, which also increased the workload. Ultimately, this forced a change in the crowder rules and instructions for the users.


(auct. Thomas Krefeld | Stephan Lücke – trad. Alessia Cortina | Susanne Oberholzer)

Tags: Functional areas Information technology