A Corpus-based Investigation on the English Majors’ Use of Cohesive Devices in Argumentative Writing

: Many factors affect the quality of English writing. Among them, the cohesion of discourse has been an important topic in writing research. This study aims to investigate the use of local, global, and text cohesion features in Chinese English majors’ argumentative writing and how EFL learners of higher proficiency (Grade 4) develop their written discourse competence as compared with the lower proficiency (Grade 1). The Written English Corpus of Chinese Learners (WECCL 2.0) was adopted as the main corpus. The research finds that connectives and lexical overlaps are more frequently used in English writing, and a tiny minority of cohesive devices showed growth across the two grade levels. The findings provide important insights for researchers and teachers on second language acquisition, English writing development and English writing teaching.


Introduction 1.The Theory of Cohesion
The introduction of cohesion by Halliday and Hasan (1976) has engendered a vast body of research, much of which estimated the potential role of the cohesive system in text analysis and language teaching (McCarthy, 1991).This theory, for one thing, helps to analyze the association between text and its context or the way in which a text is organized; for another, it aids language learners in grasping how a text unfolds in virtue of the semantic system fabricated by cohesive ties within the text, thus promoting learners' awareness of the entire text as a macro holistic semantic entity.In other words, textual cohesion, a seemingly superficial surface-level linguistic device, is "a critical aspect of successful language processing and comprehension and is premised on building connections between ideas in text" (Crossley & McNamara, 2009).Endowed with multitudinous important functions, the theory of cohesion has obviously broadened the horizon of ESL/EFL investigators, and has blazed a novel trail for their writing teaching and studies as it should be.

Cohesive Devices in Writing
The use and effects of cohesive devices in student writing has been of interest for some time (McCutchen & Perfetti, 1982).For instance, the presence of local cohesive devices in writing produced by adult first language (L1) writers is often associated with judgments of lower writing quality (Crossley & McNamara, 2010).In contrast to L1 writing studies, a number of studies examining adult second language (L2) writing report positive correlations between the presence of local cohesive devices and writing quality (Jafarpur, 1991;Yang & Sun, 2012).One such explanation rests on differences in links between writing quality and the production of local cohesive devices, global cohesive devices, and text cohesive devices.Studies have reported differences between local and global cohesive devices and their relation to writing quality for L1 writers, with local cohesion negatively related to writing quality and global cohesion positively related to writing quality (Crossley & McNamara, 2011).No studies have explicitly examined differences between local, global, and text cohesive devices in L2 writing.Understanding differences between these types of cohesive devices in L2 writing may help to better explain L2 writing proficiency and differing expectations for L2 writers on the part of expert raters.
Beyond examining the relations between cohesive devices and writing quality, there has also been an interest in investigating the longitudinal development of cohesive devices for both L1 learners (Berninger, Fuller, & Whitaker, 1996;Hayes & Flower, 1980;Myhill, 2008)

Cohesive Devices in ESL/EFL Writing
In the last three decades, a number of studies (Connor, 1984;Green, Christopher, & Jacquelyn, 2000;Johnson, 1992), have examined cohesive devices in ESL/EFL writing, and some of them have also analyzed the association between the employment of cohesive items and writing quality.The findings, however, have been somewhat conflicting due to diverse research focus and approaches.Some scholars claimed that cohesion did not represent an element of writing quality ( Todd, Khongput, & Darasawang, 2007;Zhang, 2000); some, on the contrary, discovered that the use of cohesive devices correlated significantly with the quality of compositions (Chiang, 1999;Liu & Braine, 2005).In reality, the controversy over the association between cohesion and writing quality may probably stem from either the methodological flaws or restricted focus of the foregoing investigations.Thus the positive/negative correlation between the employment of cohesion and writing scores discerned by the investigators may not incarnate sufficient reliability when the participants were replaced by those across different proficiency levels.Given these observations, more upgraded studies are therefore needed to justify the power of cohesive networks in predicting writing quality.
In addition to the correlation between cohesion and the quality of compositions, researchers have also identified some problems concerning cohesion in the writings of ESL/EFL learners.Kang (2005) compared the cohesive devices in the compositions of American students as opposed to those of Korean EFL learners and showed that Korean students overused some reference devices.It contended that L1 interference might be responsible for redundant or inappropriate use of certain cohesive devices in ESL/EFL writing.All these mixed results of empirical studies, taken together, demonstrate that the existing research findings with reference to cohesive agents in writing are inconsistent and hardly conclusive, thus demanding an increasing number of relevant researches to tackle these disputes.
Overall, the research on the relations between cohesive ties and quality of English writing has made great strides, which enables us to have a general understanding of the characteristics in the use of cohesive devices in Chinese EFL learners.However, the researches on the correlations between the subcategories of the five cohesive ties and English writing quality are relatively inadequate.In addition, there are only a few researches which focused on the changing tendency and their differences of students in the use of cohesive devices at different grades.

Research Question
Given these observations raised above, the present study, to be specific, addresses the following questions: 1) What is the overall use of cohesive devices in Chinese English majors' argumentative writing?2) Which, if any, of the cohesive devices used by Chinese English majors are different across grade levels in argumentative writing?

Methodology 2.1. Corpus
The main corpus adopted in this study is WECCL 2.0 which is a sub-corpus of SWECCL 2.0 (Spoken and Written English Corpus of College Learners) developed under the leadership of Qiufang Wen.WECCL 2.0 contains a total of 4950 essays collected from more than 20 universities.All of the essays are tagged with writer's information in WECCL, such as English majors STU1 and STU2 for non-English majors, and the authors' grades 1, 2, 3 and 4 which can help sub-corpus generator easily choose the expected one.
Using the sub-corpus generator, firstly, 100 essays were selected from the WECCL 2.0 and the topic of the compositions is: Nowadays, more and more college students rent apartments and live outside campus.Is it appropriate?And each grade accounts for 50 essays.The essays that are less than 250 words and more than 400 words were deleted.According the purpose of this study, this study needs to extract the global devices between the paragraphs of the essays, so the author screened the remaining essays and deleted all of those which contain only one paragraph.Finally, the author selected 40 essays from each grade (Grade 1 and Grade 4) from the final remaining articles as the research corpus of this study.

Tools
There are mainly two tools involved in this study: Coh-Metrix and TAACO.Both of them are advanced computational tools which can be used to analyze the cohesion features of English essays.The version Coh-Metrix 3.0 can automatically analyze 106 lexical-grammatical and semantic features of a text.These features generally cover 11 dimensions: descriptive, latent semantic analysis, connectives etc.Here, the present study mainly focuses on specific indices that can be used to measure the cohesion features of a text.TACCO consists of about 150 cohesion indices including a number of type-token ratio indices (including specific parts of speech, lemmas, bigrams, trigrams and more), adjacent overlap indices (at both sentence and paragraph level) and connective indices.

Selected Cohesion Indices
To explore cohesion of a text, computational cohesion indices are selected from Coh-Metrix 3.0 and TAACO 1.5.2.Both of them can be used to examine the cohesive features in texts (Graesser, et al, 2004).Coh-Metrix mainly contains two kinds of cohesion indices: local cohesion and text cohesion.Local cohesion refers to cohesion at adjacent sentences level (i.e., cohesion between smaller chunks of text such as noun overlap between sentences or linking sentences through connectives) (Crossley, Kyle & McNamara, 2016).Local cohesion indices are the mostly used means of cohesion which can capture the characteristics of reference, substitution, connection and so on.In this paper, the author selected a number of local cohesion indices from Coh-Metrix (A total of twelve such variables) and TAACO (two such variables).Likewise, several indices of global cohesion are selected from TAACO and Coh-Metrix.These global cohesion indices measure cohesion at paragraph level, including indices of LSA synonym overlap and also lexical overlap.A number of text cohesion indices from Coh-Metrix and TAACO are also extracted to measure cohesion at the overall text level.And the more detailed information are presented in Table 2.

Data Collection And Analysis
As for the cohesion data of the 80 English argumentative compositions, firstly, the selected essays were entered into Coh-Metrix one by one.The author selected 17 target cohesion indices that can be used to measure the cohesion features of texts.Because each cohesion feature is measured from two aspects (means and standard deviations), the author only selected the means of cohesion features.The results were saved in excel.Then the 13 cohesive indices of TAACO were extracted and saved in the same way.
Coh-Metrix and TAACO provide us the scores of different kinds of cohesion indices.In Coh-Metrix, an incidence score is the the number of specific units among 1000 words.For example, The scores of connective indices represent the occurrences of connectives per 1000 words.And an overlap score is the ratio of adjacent pairs of units in the sequence that are in the same category.Ratio scores compare two categories of units.In TAACO, the score of synonym overlap between adjacent sentences (e.g., syn_overlap_para_noun) is calculated as the total number of synonyms between adjacent sentences divided by the total number of sentences (except last sentence).As for the overlap score between adjacent paragraphs, only synonym overlap is calculated as the total number of synonyms between adjacent paragraphs divided by the total number of paragraphs (except last paragraph).And the other overlap scores between adjacent paragraphs (e.g., adjacent_overlap_cw_para) are computed as the number of repeated words between adjacent paragraphs divided by the total number of words considered.The score of pronoun_density represents the ratio between the total number of third person pronouns to the number of words.The scores of repeated_content_lemmas and repeated_content_and_ pronoun_lemmas are computed in the same way as pronoun_density.
In order to examine the the overall employment of cohesive devices in compositions of English majors, the means and standard deviations of three kinds of cohesive devices of 80 English compositions were calculated.The Statistical Package for the Social Science (SPSS, version 22.0) was employed to process the data of essays.The second purpose is to examine the differences between the use of cohesive devices by two groups.For this analysis the One-way ANOVA was carried out with the remaining variables that were selected from Coh-Metrix and TAACO indices.Through repeated ANOVAS analysis, the results can provide us with information about which cohesive indices demonstrated significant differences.

The Overall Use of Cohesive Devices in Argumentative Writing of WECCL
In order to examine the overall use of cohesive devices, the means and standard deviations of scores of the selected 30 cohesion indices in 80 English compositions were computed through SPSS 22.0.The results are shown in Table 3.
We can see that a variety of cohesive devices are employed by Chinese English Majors in their English writing.As for the 14 global cohesion indices, the frequency score of noun overlap is .27,which means that nearly 27 percent of adjacent sentences contain at least an overlap noun.And about 51 percent of adjacent sentences contain at least an argument overlap.Besides, the scores of stem overlap and content overlap between adjacent sentences are respectively .32 and .11. we can infer that nearly 32 percent of adjacent sentences include at least a stem overlap and 11 percent of adjacent sentences contain at least a content overlap.In reference to connective indices, the score of all connectives shows that nearly 53 connectives occur among per 1000 words.And the causal connectives occur 17 times among per 1000 words.Besides, the occurrences of negative connectives are significantly higher than that of positive connectives, which means that Chinese English majors tend to use more negative connectives than positive connectives in their English argumentative compositions.As for the overlap synonyms between adjacent sentences, the data show that the occurrences of noun synonyms and verb synonyms among a sentence (except last sentence) are respectively .44 and 1.45.With regard to global cohesive devices, the score of LSA between adjacent sentences is .30.As for the 4 lexical cohesion indices, the score of overlap pronouns represents that the ratio of the number of overlap pronouns between adjacent paragraphs to the total number of words in a text is .45.The score of overlap pronouns is relatively higher than that of content word overlap, function overlap, noun overlap and verb overlap.Besides, the scores of noun synonym overlap and verb synonym overlap between adjacent paragraphs represent that a paragraph (except last paragraph) in a text may respectively contain about 4.03 noun synonym overlaps and 15.42 verb synonym overlaps.About the overall use of text cohesion indices, the results reveal that the score of LSA between all sentences is .69.And the repetition score for tense and aspect is .68.The ratio of casual particles to causal verbs is .09.As for the employment of third person pronouns, research analysis data demonstrate that the ratio between third person pronouns to the total number of words in a text is .04,and the ratio of third person pronouns to nouns is .20.
To examine the distribution of the differences on the 9 local cohesion indices (significance<0.05), the multiple comparisons are made.

The Differences on Global Cohesive Devices
Then one-way ANOVAS were conducted to examine if the differences are significant between these global cohesion indices.The results are shown in Table 5.The results show that the scores of temporal cohesion, tense and aspect repetition (sig.=.000)and pronoun_density (sig.=.021) of the two grade levels are significantly different from each other.And the other six text cohesive devices don't show significant differences.

The Differences on Text Cohesive Devices
The results (Table 6) show that the scores of temporal cohesion, tense and aspect repetition (sig.=.000)and pronoun_density (sig.=.021) of the two levels are significantly different from each other.And the other six text cohesive devices don't show significant differences.

Discussion
From the above results, we can draw a conclusion that Chinese English majors are likely to use a variety of cohesive devices.In reference to the use of the three kinds of cohesive devices, from Table 2, we could infer that Chinese students frequently use lexical overlaps (among local cohesion) and connectives.The research findings are consistent with that of Zeng (2014) who conducted a study to examine the features of the use of cohesive devices in Chinese Non-English majors' English argumentative essays.And the research results reveal that students are likely to use various kinds of cohesive devices in English writing.Among the five subcategories of cohesive devices, lexical cohesive devices are used mostly, and the frequencies of references and connectives rank the second and the third.About the global cohesive devices, the score of LSA between adjacent sentences is .15which is lower than the score of LSA between adjacent paragraphs.We can deduce from the above research findings that the cohesion between paragraphs is better than that of between sentences concerning the score of LSA.With regard to the other global cohesion indices, the score of content word overlap between adjacent paragraphs is .21and the score of function word overlap is nearly the twice as content word overlap scores.Compared to content words, Chinese English majors are more likely to use function words to achieve the cohesion of essays.As for the text cohesion, the scores of LSA given /new, sentences, pronoun density, pronoun-noun ratio, repeated content and pronoun lemmas show that the givenness of the essays is relatively at intermediate level.Chinese English majors still can't use global cohesive devices skillfully to achieve the cohesion at text level.
Among the 14 local cohesive devices, 7 of them belong to the connectives.The analysis results of One-way ANOVA represent that no significant differences can be found in the use of connectives between grade 1 and grade 4 except adversative and contrastive connectives, positive connectives and negative connectives.Crossley, Kyle & McNamara (2016) also reported that most connective indices did not show significant growth patterns across four terms.We can draw a conclusion that the L2 writers don't not show clearly development in terms of explicit cohesion links like connectives.Lexical overlap is an important means of discourse cohesion.Liu and Braine(2005) pointed that the use of lexical overlap, especially the keyword overlap, can effectively construct a coherent discourse.As for the results of One-way ANOVA also show that the frequencies of the noun overlap, stem overlap and syn_overlap_sent_noun are found to be significantly different between the two levels.According to the above results, we can draw a conclusion that for Chinese English Majors, the ability of constructing a coherent discourse with lexical overlap has been improved through the three year learning.Such a finding is in line with previous studies (Crossley, Kyle & McNamara, 2016).The reason for the rise may partly lie in the vocabulary enlargement and the improvement of English reading ability with in-depth study.For example, Cherry and Cooper (cited in Xu Yucheng, 2003) once also pointed out that with the promotion of students' English grade levels, students may more often use lexical cohesive devices than connectives.
As for the global cohesion, the above results show that students of the two grade levels don't show significant differences in the terms of global cohesion indices except LSA overlap and verb overlap between adjacent paragraphs.The results are a little different from that of Crossley, Kyle & McNamara According to their research, there are three global cohesive devices which showed significant differences across semesters including LSA overlap between adjacent paragraphs, adjacent_overlap_noun_para and syn_overlap_para_noun.
The last part comes to text cohesive devices and the results show that no significant difference can be found in the use of text cohesive devices except temporal cohesion, tense and aspect repetition and pronoun_density in compositions of the two grade levels.The author deduces that, compared to the development of local cohesion, the Chinese English majors' ability of creating text cohesion links is still underdevelopment.

Conclusion
This study provides evidence for the use of computational tools to investigate the employment of cohesive devices.The primary purpose is to investigate the overall use of cohesive devices of L2 learners.And the secondary object is to examine the development trend of cohesive devices across four grade levels in Chinese English majors' writing.As for the overall employment of cohesive devices in Chinese English majors' writing, the results of statistical analysis show that, on the whole, Chinese English majors use a variety of cohesive ties in their English writing.And they are more likely to employ local cohesive devices (e.g., connectives and lexical overlaps) than global and text cohesive ties.With regard to the second research question, the results indicate that the scores of the noun overlap, stem overlap, LSA overlap, adversative and contrastive connectives incidence, positive connectives, negative connectives incidence and synonym overlap (both noun and verb synonyms) which are local in nature show significant differences between the two grade levels.In regard to the global cohesion, only the scores of LSA overlap between adjacent paragraphs display significant difference.And no significant differences can be found among the other remaining global cohesion indices.
It is inevitable for the present study to have some limitations even if some positive findings have been obtained.Because of the limitation of author's ability, further researches are needed.Firstly, the present study is a synchronic study.It is more ideal to make a diachronic study which collects the essays written by participants of the same group from the first to the last year in college.This kind of diachronic study could offer more reliable implications on the use and development of cohesive devices.Secondly, the investigated sample of 80 essays written by undergraduate Chinese English majors seemed still small in size.Thus, in future studies, a larger sample should be examined to ensure the reliability and validity of research findings.Thirdly, it is better to compare Chinese EFL learners with English native speakers on the use of cohesive devices.In future studies, researchers could try to build a comparable corpus of native speakers to make a comparative study between Chinese EFL learners and English native speakers on the use of cohesive devices.
However, the global cohesion indices are rare.Compared with Coh-Metrix, TAACO provides more global cohesion indices."TAACO also provides synonym overlap indices and part of speech (POS) tagged cohesion indices.Coh-Metrix, unlike TAACO, uses Latent Semantic Analysis (LSA) to provide semantic overlap.

Table 2 .
The selected cohesion indices

Table 3 .
The means and standard deviations of selected cohesion indices

Table 4 .
One-way ANOVA: the differences of local cohesion indices

Table 5 .
One-way ANOVA of global cohesion indices

Table 6 .
One-way ANOVA of text cohesion indices