ACLCLP


Application for Use of Sinica Balanced Corpus


The Sinica Balanced Corpus (Sinica Corpus) is the first balanced Chinese corpus with part-of-speech tagging. The corpus (Sinica 4.0) is open to the research community through the WWW (http://www.sinica.edu.tw/SinicaCorpus/). The size of this corpus is Ten million words. Each text in the corpus is classified and marked according to five criteria: genre, style, mode, topic, and source. The feature values of these classifications are assigned in a hierarchy. Subcorpora can be defined with a specific set of attributes to serve different research purposes. Texts in the corpus are segmented according to the word segmentation standard proposed by the ROC Computational Linguistic Society. Each segmented word is tagged with its part-of-speech. Linguistic patterns and language structures can be extracted from the tagged corpus via a corpus inspection program which can filter the data, generate statistics, sort, and identify collocations. 

Please complete the required documents as below and send them to ACLCLP at the following address:

The Association for Computational Linguistics and Chinese Language Processing
℅Institute of Information Science, Academia Sinica
128, Sec. 2, Academic Rd., Nankang, Taipei 115, Taiwan


Required documents:

The license fee: (institutional license is for 1-10 users)



Payment: please fill in the payment form  


Address:c/o IIS, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan
Tel:886-2-27883799*1502, Fax:886-2-27881638, E-mail:aclclp@aclclp.org.tw;aclclp@hp.iis.sinica.edu.tw
This website is maintained by Qi Huang. Send your comments and suggestions to jessie@iis.sinica.edu.tw