International Journal of Computa

International Journal of Computational Linguistics & Chinese Language Processing [銝剜�]
Vol. 19, No. 4, December 2014

POI Extraction from the Web: Store Name Recognition and Address Matching
Lin Yu-Yang and Chang Chia-Hui
[pdf | html]
Public Opinion Toward CSSTA: A Text Mining Approach
Yi-An Wu and Shu-Kai Hsieh
[pdf | html]
Automatic Move Analysis of Research Articles for Assisting Writing
Guan-Cheng Huang, Jian-Cheng Wu, Hsiang-Ling Hsu, Tzu-Hsi Yen, and Jason S. Chang
[pdf | html]
Exploring Concept Information for Mandarin Large Vocabulary Continuous Speech Recognition
Po-Han Hao, Ssu-Cheng Chen, and Berlin Chen
[pdf | html]
Some Prosodic Characteristics of Taiwan English Accent
Chao-yu Su, Chiu-yu Tseng and Jyh-Shing Roger Jang
[pdf | html]
Quantitative Assessment of Cry in Term and Preterm Infants: Long-Time Average Spectrum Analysis
Li-mei Chen
[pdf | html]

Title:
POI Extraction from the Web: Store Name Recognition and Address Matching

Author:
Lin Yu-Yang and Chang Chia-Hui

Abstract:
Mobility is one of the trends in 2014. According to the report of IDC (International Data Corporation), the worldwide shipments of tablets have exceeded PCs in 2013 Quarter 4, while smart phones has already exceeded other devices in unit shipments and market ratio. With this trend, many location-based services (LBS) have been proposed, for example, navigation, searching restaurants or gas stations. Therefore, how to construct a large POI (Point-of Interest) database is the key problem. In this paper, we solve three problems including Taiwan address normalization, store name extraction, and the matching of addresses and store names. To train a statistical model for store name extraction, we make use of existing store-address pair to prepare training data for sequence labeling. The model is trained using common characteristics from store names in addition to POS tags. When testing on search snippets, we obtain 0.791 F-measure for store name recognition.

Keywords: POI, Store Name Extraction, Name-address Matching, Sequence Labeling, Conditional Random Field

Title:
Public Opinion Toward CSSTA: A Text Mining Approach

Author:
Yi-An Wu and Shu-Kai Hsieh

Abstract:
Extracting policy positions from the texts of social media becomes an important technique since instant responses of political news from the public can be revealed, and also one can predict the electoral behavior from this information. The recent highly-debated Cross-Strait Service Trade Agreement (CSSTA) provides large amounts of texts, giving us an opportunity to test people's stance by the text mining method. We use the keywords of each position to do the binary classification of the texts and count the score of how positive or negative attitudes toward CSSTA. We further do the trend analysis to show how the supporting rate fluctuates according to the events. This approach saves human labor of the traditional content analysis and increases the objectivity of the judgement standard.

Keywords:
Policy Position, Opinion Mining, Politics, Social Media, Trend Analysis

Title:
Automatic Move Analysis of Research Articles for Assisting Writing

Author:
Guan-Cheng Huang, Jian-Cheng Wu, Hsiang-Ling Hsu, Tzu-Hsi Yen, and Jason S. Chang

Abstract:
Rhetorical moves are a useful framework for analyzing the hidden rhetorical organization in research papers, in teaching academic writing. We propose a method for learning to classify the moves of a given set sentences in a academic paper. In our approach, we learn a set of move-specific common patterns, which are characteristic of moves, to help annotate sentences with moves. The method involves using statistical method to find common patterns in a corpus of research papers, assigning the patterns with moves, using patterns to annotate sentences in a corpus, and train a move classifier on the annotated sentences. At run-time, sentences are transformed into feature vectors to predict the given sentences. We present a prototype system, MoveTagger, that applies the method to a corpus of research papers. The proposed method outperforms previous research with a significantly higher accuracy.

Keywords:
Academic English Writing, Computer-assisted Language Learning, Rhetoric, Context Analysis

Title:
Exploring Concept Information for Mandarin Large Vocabulary Continuous Speech Recognition

Author:
Po-Han Hao, Ssu-Cheng Chen, and Berlin Chen

Abstract:
Language modeling (LM) is part and parcel of automatic speech recognition (ASR), since it can assist ASR to constrain the acoustic analysis, guide the search through multiple candidate word strings, and quantify the acceptability of the final output hypothesis given an input utterance. This paper investigates and develops language model adaptation techniques for use in ASR and its main contribution is two-fold. First, we propose a novel concept language modeling (CLM) approach to rendering the relationships between a search history and an upcoming word. Second, the instantiations of CLM are constructed with different levels of lexical granularities, such as words and document clusters. In addition, we also explore the incorporation of word proximity cues into the model formulation of CLM, getting around the �營ag-of-words�� assumption. A series of experiments conducted on a Mandarin large vocabulary continuous speech recognition (LVCSR) task demonstrate that our proposed language models can offer substantial improvements over the baseline N-gram system, and achieve performance competitive to, or better than, some state-of-the-art language model adaptation methods.

Keywords:
Speech Recognition, Language Model, Concept Information, Model Adaptation

Title:
Some Prosodic Characteristics of Taiwan English Accent

Author:
Chao-yu Su, Chiu-yu Tseng and Jyh-Shing Roger Jang

Abstract:
The present study examines prosodic characteristics of Taiwan (TW) English in relation to native (L1) English and TW speakers�� mother tongue, Mandarin. The aim is to investigate 1) how TW second-language (L2) English is different from L1 English by integrated prosodic features 2) if any transfer effect from L2s�� mother tongue contributes to L2 accent and 3) What is the similarity/difference between L1 and L2 by prosodic patterns of word/sentence. Results show the prosody of TW L2 English is distinct from L1 English; however, TW L2 English and TW Mandarin share common prosodic characteristics which differentiate from L1 English. Analysis by individual prosodic feature shows distinct L2 features of TW English which might attribute to prosodic transfer of Mandarin. One feature is less tempo contrast in sentence that contributes to different rhythm; another is narrower loudness range of word stress that contributes to less strong/weak distinction. By examining prosodic patterns of word/sentence, similarity analysis suggests L1 and L2 speakers produce prosodic patterns with great within-group consistency respectively but their within-group patterns are distinct to counterpart group. One pattern is loudness of sentence and another one is timing/pitch patterns of word. The above prosodic transfer effect and distinct TW L2 patterns of prosody are found in relation to syntax-induced narrow focus and lexicon-defined word stress which echo our previous studies of TW L2 English and could be implemented to CALL development.

Keywords:
Prosody, L1, L2, Mandarin, English, Contrast, Lexical Prosody, Narrow Focus

Title:
Quantitative Assessment of Cry in Term and Preterm Infants: Long-Time Average Spectrum Analysis

Author:
Li-mei Chen

Abstract:
Long-time average spectrum (LTAS) was used to analyze the cry utterance of 26 infants under four months old; 16 of them were full-term and the other 10 infants were preterm. The results of first spectral peak (FSP), mean spectral energy (MSE), spectral tilt (ST), high frequency energy (HFE) were used to compare the cry production between term and preterm infants. In addition, cry duration and percent phonation were also compared. According to previous studies, cry production of term and preterm infants show significant differences because immature neurological development of preterm infants. Major findings in this study are: 1) no significant difference in unedited cry duration across groups; 2) no significant difference in percentage of cry utterance across groups; 3) no significant difference in FSP across groups, and higher FSP in term infants; 4) no significant difference in MSE across groups, and a decrease of MSE in both groups over time; 5) no significant difference in ST across groups, and a quicker reduction of energy with larger ST in preterm infants over time; 6) no significant difference in HFE across groups, and a significant decline of HFE over time in both groups. Systematic characterization of infant cry can help to estimate health condition of infants in order to provide appropriate care.

Keywords:
Long-time Average Spectrum, Infant Cry, Preterm Infants

��