Sunday, November 11, 2018

Corpora or Corpus and Corpus Linguistics

Corpus Linguistic:

Linguistics being the scientific study of language and it's structure. Corpus linguistic is the study if language 'on the basis of text corpora'

Corpus linguistic is the study of language by examining a collection of natural language data,often in the form of texts.

"Corpus linguistic thus is the analysis of naturally occurring language on the basis of computerized corpora. Usually, the analysis is performed with the help of the computer. i.e. with specialized software, takes into account of frequency of the phenomena investigated"(Nesselhauf,2005)

As corpus linguistic focuses on language data,it is usually considered more of a linguistic approach than a linguistic theory. It is a very principled approach. It is actually the study of language as expressed in corpora of 'real world' text.

Corpus or Corpora:

The word corpus derived from the Latin word meaning 'body', may be used to refer to any text in written or spoken form. However in modern linguistics this term is used to refer to large collection of texts which represent a sample of a particular variety or use of languages that are presented in machine readable form.
Corpus is the singular form and corpora is the plural form. Corpus or Corpora are now largely derived by automated process.

Definition:

Corpus means a collection of written or spoken material stored on a computer and used to find out how language is used. It is a body of currently or naturally occurring language, collected with explicit linguistic criteria with a particular purpose in mind and structured in view of its representativeness.

Computer - readable corpora can consist of raw text only. Many corpora have been provided with some kind of linguistic information which is called markup or annotation.

Types of corpora:

There are four varieties of corpora. The four corpora are -

a) General corpora
b) Specialized corpora
c) Learner corpora
d) Parallel corpora

1.General Corpora: General corpora consist of general texts,texts that do not belong to a single text type,subject field or register. These  corpora are much larger than specialized corpora.
Sometimes these corpora are called 'Sub language or Reference corpora or corpus'.

Example: British National Corpora

2.Specialized Corpora: Specialized corpora mean a corpus of texts of a particular type. Specialized corpora reflect the type of language a researcher wants to explore. We may also restrict the corpus to a social setting,to a given topic.

Example:
Editorial, Academic Articles, Lectures, Essays etc.

3.Learner Corpora: Learner corpora are electronic collections of language data produced by L2 learners, that is second or foreign language learners.
It is a computerized textual database of the language produced by foreign language learners (Leech,1998)

The first computerized learner corpora were collected in the 1995 when several learner corpora projects were launched,the Longman Learners' Corpora,the Cambridge Learner Corpus,the HongKong University learner Corpus and the International Corpus of learner English. It is used to identify differences among learners, frequency and type of mistakes etc

Example:

i)Longman Learners'     Corpora
ii)Cambridge Learner Corpus
iii)Hong Kong University learner Corpus
iv)International Corpus of learner English.

4.Parallel Corpora: Parallel corpora consists of a collection of texts which have been translated into one or more other languages. This can be used by translators and learners to find potential equivalents in each language and to investigate differences between languages.

Example:

i) The Canadian Hansard proceedings in English and French
ii) The PENDANT project,Gutenberg, Sweden

Features of Corpora:

--Quantity
--Quality
--Representation
--Simplicity
--Retrievability
--Verifiability

Advantages of Corpora:

There are some advantages of corpus ---

1) short time
2) flexibility of time and place
3) Drag and drop option
4) Some unique test items
5) Multimedia integration
6) Instant feedback
7) Accessible
8) Store and use data
9) Continuous assessment techniques

Disadvantages of Corpora:

These are the limitations of corpora-

1) Reliability
2) No human interaction
3) create, manage and administer computer - adaptive tests
4)Teacher training
5) Digital literacy

References:

Corpus linguistic Retrieved from
www.slideshare.com

Corpus linguistics Retrieved from
https://en.m.wikipedia.org/wiki/Corpus_linguistics

Corpus linguistics Retrieved from
https://slideplayer.com/slide/4740460/

TESOL International Association, 2017.Corpus Linguistics and Language Learning and Teaching: Basic introduction

Personal Experience and Class Lectures

2 comments: