Research Center for Applied Linguistics (RCAL)

About the RCAL

The Research Center for Applied Linguistics (RCAL) is a project initiated by and associated with the Bonn Applied English Linguistics (BAEL) department. Empirical pragmatics involves the study of language in use through analysis of actual language data. This data can be collected using questionnaires, interviews or experiments, but may also be sampled from large collections of texts such as corpora.

Eine Wissenschaftlerin und ein Wissenschaftler arbeiten hinter einer Glasfassade und mischen Chemikalien mit Großgeräten.

Student workspaces at the RCAL

The RCAL provides students with the opportunity to conduct their own empirical research. At the centre students have access to:

a wide range of corpora
technical equipment
software for data collection and analysis
research lab tools for interviews or role-plays
a quiet room to conduct their own experiments

Available Corpora

ACE - Australian Corpus of English

Developer: Pam Peters, Peter Collins and David Blair at Macquarie University, Sydney

Sampling period: 1986

Size: 1 million words; 500 text samples of approx. 2,000 words

Contents: written and spoken language; modelled on LOB and BROWN

Variety sampled: Australian English

Annotation: untagged

Availability: available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of ACE

BROWN Corpus

Developer: Nelson Francis and Henry Kucera at Brown University, Providence, Rhode Island

Sampling period: early 1960sSize:1 million words

Contents: written language; 500 text samples of approx. 2,000 words; 15 text categories

Variety sampled: American English

Annotation: untagged and tagged version POS tagging

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the BROWN Corpus

Buckeye Corpus

Developer: Ohio State University: Eric Fossler-Lussier, Elizabeth Hume, Keith Johnson, Mark Pitt

Sampling period: 2000Size:300,000Contents:Interviews of 40 people, each ~ one hour

Variety sampled: American English, "long-term residents of Ohio"

Annotation: phonetic/phonemic transcription, word labels

Availability: Online Access through BAEL Licence

Homepage: Website of the Buckeye Corpus

CEECS - Corpus of Early English Correspondence Sampler

Developer: M. Rissanen, O. Ihalainen and M. Kytö at the Department of English, University of Helsinki

Sampling period: 1418-1680Size:450,000Contents:personal letters

Variety sampled: British English

Annotation: no annotation

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the CEECS Corpus

COLT - Bergen Corpus of London Teenage Language

Developer: University of Bergen, Norway

Sampling period: 1993

Size: 500,000

Contents: transcripts of spoken language of London teenagers (COLT is part of the BNC)

Variety sampled: British English

Annotation: POS tagging

Availability: available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the Colt Corpus

FLOB - Freiburg-LOB Corpus of British English

Developer: Christian Mair at the University of Freiburg

Sampling period: 1990s

Size: 1 million words

Contents: written language; 500 text samples of approx. 2,000 words; 15 text categories (matches the original LOB corpus)

Variety sampled: British English

Annotation: untagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the FLOB corpus

FROWN - Freiburg BROWN Corpus of American English

Developer: Christian Mair at the University of Freiburg

Sampling period: 1990s

Size: 1 million words

Contents: 500 text samples of approx. 2,000 words; 15 text categories (matches the Brown Coprus)

Variety sampled: American English

Annotation: untagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the FROWN corpus

Helsinki Corpus of English Texts: Diachronic Part

Developer: M. Rissanen, O. Ihalainen and M. Kytö at the Department of English, University of Helsinki

Sampling period: ca. 750 to 1700

Size: 1.5 million words

Contents: samples of Old, Middle and Early Modern English texts

Variety sampled: British English

Annotation: untagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the Helsinki Corpus

Helsinki Corpus of Older Scots

Developer: M. Rissanen, O. Ihalainen and M. Kytö at the Department of English, University of Helsinki

Sampling period: 1450-1700

Size: 830,000 words

Contents: Old, Middle and Early Modern English texts covering 15 prose genres

Variety sampled: Northern British English

Annotation: untagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Bibliography of the Helsinki Corpus of Older Scots (no specific manual available online)

ICAMET - Innsbruck Computer Archive of Machine-readable English Texts

Developer: University of Innsbruck

Sampling period: Middle English Prose: 1100 - 1500; Middle/Early Modern English Letters: 1386 - 1688; Middle/Modern English Texts: in progress

Size: Middle English Prose: 182,000; Middle/Early Modern English Letters: 110,000; Middle/Modern English texts: in progress

Contents:

Prose
Letters
Varia

Variety sampled: Middle English, Early Modern English, Modern English

Annotation: Middle English Prose: untagged; Middle/Early Modern English Letters: untagged; Middle/Modern English texts: mix of tagged/normalized/translated/manipulated texts

Availability: Available for students at the RCEP

Homepage: Manual information for ICAMET

ICE - International Corpus of English

+ SPICE-Ireland - Systems of Pragmatic annotations for the spoken component of ICE-Ireland

Developer: Jeffrey L. Kallen and John M. Kirk

Sampling period: 1990s

Size: 500 texts, each 2,000 words (1 million words)

Contents: 500 texts, spoken and written language (spoken part 60%):

Spoken (300)
Dialogue (180)
Private (100)
Public (80)
Monologue (120)Unscripted (70)
Scripted (50)

written (200)
Non-printed (50)Non-professional writing (20)
Correspondence (30)
Printed (150)Informational (learned) (40)
Informational (popular) (40)
Informational (reportage) (20)
Instructional (20)
Persuasive (10)
Creative (20)
(Figures adapted from Kennedy (1998: 55))

SPICE-Ireland
provides pragmatic and discourse annotation and
a prosodic transcription to 100 of the 300 texts of the spoken component of the ICE-Ireland Corpus.

Variety sampled: Aim is to sample all varieties of English

Annotation: Textual markup, word class tagging, syntactic parsing (+ additional tags in some components)

Availability:

Hong Kong, East Africa, India, Philippines, Singapore, Jamaica, USA written, Canada, Ireland, SPICE-Ireland, Great Britain, Nigeria, Sri Lanka, Ghana, New Zealand

RCEP: All subcorpora are available at the RCEP

IAAK corpus computer: Great Britain, East Africa are available for students on the corpus computer in the IAAK library

Homepage: Homepage of the ICE corpus

ICLE - International Corpus of Learner English

Developer: CECL UCL; Project director: Prof. Sylviane Granger

Sampling period: 1990 - 2000

Size: 3,7 million

Contents: Subcorpora (learners of English):
Bulgarian
Chinese
Czech
Dutch
Finnish
French
German
Italian
Japanese
Norwegian
Polish
Russian
Spanish
Swedish
Tswana
Turkish

Variety sampled: Learners of English

Annotation: word form/lemma/POS tagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Homepage of the ICLE

Kolhapur Corpus

Developer: S. K. Verma at University of Lancaster and Shivaji University, Kolhapur

Sampling period: 1978

Size: 1 million words, 500 text samples of approx. 2,000 words

Contents: written language; modelled on BROWN and LOB

Variety sampled: Indian English

Annotation: untagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the Kolhapur Corpus

Lampeter Corpus of Early Modern English Tracts

Developer: Josef Schmied, Claudia Claridge and Rainer Siemund at TU Chemnitz

Sampling period: 1640 -1740

Size: 1.1 million words

Contents: non-literary prose texts of Early Modern English (various genres)

Variety sampled: British English

Annotation: textual markup

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Homepage of the Lampeter Corpus

LINDSEI - Louvain International Database of Spoken English Interlanguage

Developer: Gaëtanelle Gilquin, Sylvie DeCock & Sylviane Granger [eds]

Sampling period: 1995 - 2010

Size: 1 million words, c. 50 interviews per subcorpus, each interview ~ 2000 words

Contents: spoken language, interviews with learners of English

National subcorpus: Bulgarian, Chinese, Dutch, French, German, Greek, Italian, Japanese, Polish, Spanish, Swedish

Variety sampled: Interlanguage

Annotation: untagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Homepage of the LINDSEI Corpus

LLC - London-Lund Corpus of Spoken English

Developer: Randolph Quirk and Sidney Greenbaum at University College London Jan Svartvik at Lund University

Sampling period: 1960s, 1975-81, 1985-88

Size: 500,000 words

Contents: spoken language, based on the Survey of English Usage (SEU, 1959, University College London) and on the Survey of Spoken English (SSE, 1975, Lund University)

Variety sampled: British English

Annotation: prosodic and discourse annotation

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the LLC

LOB - Lancaster / Oslo-Bergen Corpus

Developer: Geoffrey Leech, University of Lancaster, and Stig Johansson, University of Oslo, in collaboration with Knut Hofland, Norwegian Computing Centre for the Humanities, Bergen

Sampling period: 1961

Size: 1 million words

Contents: written language; 500 text samples of approx. 2,000 words; 15 text categories; British counterpart of Brown corpus

Variety sampled: British English

Annotation: untagged and tagged version POS tagging

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the LOB Corpus

Newdigate Newsletter Corpus

Developer: Philip Hines, Jr., Norfolk, Virginia

Sampling period: 1692

Size: 750,000 words

Contents: a series of more than 2,000 newsletters in the Newdigate series (most of which are addressed to Sir Richard Newdigate, Warwickshire)

Variety sampled: British English

Annotation: untagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the Newdigate Corpus

PoW - Polytechnic of Wales Corpus

Developer: The Computational Linguistics Unit at the University of Wales College of Cardiff

Sampling period: 1978-1984

Size: 65,000 words

Contents: transcripts of spoken child language

Variety sampled: British English

Annotation: POS tagging, syntactic parsing

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the PoW Corpus

(SB)CSAE - Santa Barbara Corpus of Spoken American English

Developer: John W. Du Bois, Wallace L. Chafe, Sandra A. Thompson, Charles Meyer, Robert Englebretson

Sampling period: 1990s

Size: 249,000 words

Contents: transcripts and audio files of naturally occuring interaction from all over the US (mostly face-to-face conversations)

Variety sampled: American English

Annotation: transcripts are time-stamped, overlap indicated; marked-up version on talkbank.org

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library (Parts 1-4)

Homepage: Homepage of the Santa Barbara Corpus of Spoken American English

SEC - Lancaster / IBM Spoken English Corpus

Developer: University of Lancaster and IBM Scientific Centre

Sampling period: 1984-87

Size: 52,000 words

Contents: spoken language; transcripts from radio-broadcasts, recordings made at University of Lancaster

Variety sampled: British English

Annotation: prosodic markup, POS tagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the SEC

Wellington Corpus of Written New Zealand English

Developer: Laurie Bauer at Victoria University, Wellington

Sampling period: 1986-90

Size: 1 million words; 500 text samples of approx. 2,000 words

Contents: written language; modelled on BROWN and LOB

Variety sampled: New Zealand English

Annotation: untagged

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the Wellington Corpus (written)

Wellington Corpus of Spoken New Zealand English

Developer: Janet Holmes, Bernadette Vine and Gary Johnson at at Victoria University, Wellington

Sampling period: 1988-94

Size: 1 million words; 500 text samples of approx. 2,000 words

Contents: spoken language; formal, semi-formal and informal speech

Variety sampled: New Zealand English

Annotation: discourse markup

Availability: Available for students at the RCEP and on the corpus computer in the IAAK library

Homepage: Manual of the Wellington Corpus (spoken)

Hardware and software available at the RCAL

At the RCAL, students have access to the following research tools:

Hardware

Action Cams (2): Use at RCAL, Can be borrowed

Webcam (2): Use at RCAL, Can be borrowed Digital Voice

Recorder (2): Use at RCAL, Can be borrowed

Tabletop Microphone (3): Use at RCAL, Can be borrowed

Headset (2): Use at RCAL, Can be borrowed

USB Footpedal (4): Use at RCAL, Can be borrowed

Software

Antconc: Use at RCAL, Freely downloadable online

Audacity: Use at RCAL, Freely downloadable online

Camtasia: Use at RCAL

ICE-CUP: Use at RCAL

f4: Use at RCAL, Can be borrowed

- for Microsoft, etc. (3)

- for MacOS (1)

MaxQDA: Use at RCAL, Can be borrowed

OpenSesame: Use at RCAL, Freely downloadable online

Translog-II: Use at RCAL, Freely downloadable online

Wordsmith: Use at RCAL, Can be borrowed

Want to know more?

If you are interested in taking advantage of the resources we have to offer, you can register for the RCAL office hours by emailing the RCAL Mentor Alyson Wong at rcal[at]uni-bonn.de. During the winter term 2024/25 our office hours are on Wednesdays 11:00 - 12:00 only with notification, other times and a possible Zoom call can be agreed upon via email, as well. If you want to book an appointment or have any questions, please send an email to rcal[at]uni-bonn.de. Alyson can advise you on which research tools might suit your research question as well as provide assistance with access to corpora, data analysis tools, and more.

Contact

Alyson Wong

rcal@uni-bonn.de

Address:

Research Center for Applied Linguistics
Genscherallee 3
53113 Bonn, Germany
Room 3.015 / 3.016 (third floor)

Phone: +49 (0)228 73-4481
Email: rcal[at]uni-bonn.de