Application for Use of Sinica Balanced Corpus
The Sinica Balanced Corpus (Sinica Corpus) is the first balanced Chinese corpus with part-of-speech tagging. The corpus (Sinica 4.0) is open to the research community through the WWW (http://www.sinica.edu.tw/SinicaCorpus/). The size of this corpus is Ten million words. Each text in the corpus is classified and marked according to five criteria: genre, style, mode, topic, and source. The feature values of these classifications are assigned in a hierarchy. Subcorpora can be defined with a specific set of attributes to serve different research purposes. Texts in the corpus are segmented according to the word segmentation standard proposed by the ROC Computational Linguistic Society. Each segmented word is tagged with its part-of-speech. Linguistic patterns and language structures can be extracted from the tagged corpus via a corpus inspection program which can filter the data, generate statistics, sort, and identify collocations.
Please complete the required documents as below and send them to ACLCLP at the following address:
The Association for Computational Linguistics and Chinese Language
℅Institute of Information Science, Academia Sinica
128, Sec. 2, Academic Rd., Nankang, Taipei 115, Taiwan
An official statement from the applicant's affiliated institution certifying
his/her status at this institution. Written statement from the applicant or his/her affiliated institution affirming that the corpus will be used for research only, and not for any
The original copy of the Agreement.
(Please send two copies of the Agreement, one for you and the other for our records.)
The license fee:
- Nonprofit Institutions(for 2-10 users):US$1,000.-
- Nonprofit Institutions(for 11 or more users):US$2,500.-
Payment: please fill in the payment form
Address:c/o IIS, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan
Tel:886-2-27883799*1502, Fax:886-2-27881638, E-mail:firstname.lastname@example.org