Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 115 Reading in a Foreign Language 26(1) reliance on multiple choice tests in China because it may impede the development and growth of the reader. With multiple choice items, the questions and answers are predetermined, and consequently there is no room for the reader to generate inferences or think critically about the reading. Pang also included very specific directions for future research that include the examination of word level issues, extensive reading, strategies, and testing. All of these suggestions for future research involve a measure of comprehension. Thus, the present study of learners of English in China examines the grading of an assessment task that is regarded by reading researchers in the USA (Bernhardt, 1991; Brantmeier, 2006) to be the most revealing of readers' comprehension: the written recall protocol. The written recall protocol is a common procedure utilized to measure reading comprehension in both first language (L1) (Fitzgerald & Spiegel, 1983; Pearson & Camperell, 1981; Rand, 1984; Snyder & Downey, 1983) and second language (L2) investigations (Bernhardt, 1983; Brantmeier, 2002; Carrell, 1983a; Lee, 1986; Young & Oxford, 1997). In the written recall task students are asked to read a text and write down everything they can remember about what they just read without looking back at the text. Bernhardt (1991) initially postulated that the written recall was the purest measure of L2 reading comprehension, and to date this protocol continues to be used in many investigations that involve L2 reading comprehension. With the written recall there is no interference by the instructor who is proctoring the test and there are no retrieval cues provided. Prior L2 Research and the Written Recall Protocol Prior L2 research that examines variables involved in comprehension assessment tasks have primarily highlighted language of assessment as a key factor (Hock & Poh, 1979; Shohamy, 1984), and investigations have specifically examined language as a key variable in written recall protocol research (Brantmeier, 2006; Lee, 1986). Findings of these studies have guided important methodological decisions for empirical research on L2 reading. With students from beginning and intermediate levels of Spanish in the USA, Lee (1986) reported that written recall of a text was significantly better when completed in L1 rather than L2. Alderson (2000) also claimed that "recall should be completed in the test taker's L1 because otherwise it becomes a test of writing rather than reading" (p. 230). Bernhardt (2005) reinforced Alderson's recommendation, but only until learners reached the advanced L2 proficiency/fluency levels. With advanced learners of Spanish in the USA, Brantmeier (2006) investigated the extent to which L2 reading comprehension is a function of language of the recall task and/or former L2 reading performance. Findings indicated that overall language of recall does not matter with learners from advanced levels of language instruction. In that experiment, language of recall accounted for only 3% of variance in written recall. These findings underscore Bernhardt's (1991) earlier assertion that written recalls be completed in the native language at the beginning and intermediate stages of acquisition, but not at the advanced levels. Review of codification schemes for written recall In the two studies highlighted above that utilize learners of Spanish in the USA (Brantmeier, 2006; Lee, 1986), the scoring scheme used to codify written recalls was the pausal unit. Research with English as a Foreign Language (EFL) learners who are native Chinese speakers has also
Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 116 Reading in a Foreign Language 26(1) explored factors that affect reading recall. One such study, by Chu, Swaffar and Charney (2002) examined the impact of culture specific conventions on written recalls. They reported that different rhetorical conventions had significant effects on both immediate and delayed recalls. More specifically, researchers reported that most Taiwanese EFL students were not aware of cohesive devices when reading English texts as they occur less often in Chinese. English language learners in Taiwan rarely use cohesive devices for integrating textual information, and the authors contend that the difficulty in identifying cohesive ties and finding out the relationships among these devices in a text may negatively impact reading in English. In this study, the researchers also used the pausal unit to codify recalls. In all three studies (Brantmeier, 2006; Chu et al., 2002; Lee, 1986), researchers described the pausal unit as the natural pause during oral reading by a native speaker. In other words, a native speaker reads the passage out loud and inserts a bracket wherever there is a natural pause in speech. All information within a bracket is a pausal unit. This scoring process was first developed by Johnson (1970) and then validated by Bernhardt (1991) in a comparison of pausal units and the Meyer (1975) system, a protocol that includes hierarchically based content structures. Bernhardt (1991) argued that both systems tap the same L2 reading behaviors, but that the pausal unit is more efficient and less time consuming to grade. Consequently, many L2 reading researchers utilize the pausal unit as a system to score written recalls. Lee and Ballman (1987) blended Johnson's (1970) definition of a pausal unit with Carrell's (1983b) notion of an idea unit, a unit that corresponds to a proposition or phrase, to divide an expository passage into units. They then ranked the importance of each unit. Their experiments specifically examined the effects of exposure to grammar on written recall. Findings revealed a low representation of important units in the L2 recalls. Since then, some L2 reading researchers have used a criterion from Riley and Lee (1996) whereby the text is divided into 'units of analysis,' which consist of ideas, propositions, and constituent structures. When dividing a reading into idea units, a small to medium grain size is used to determine the number of idea units within each passage. Each piece of information that adds meaning to the recall is classified as an idea unit. A list of idea units is agreed upon among raters. With this procedure, the entire written recall is analyzed by a researcher and an additional rater to identify the units, and then these units are compared to those of the test to ensure that the information in the written recall either appeared in the reading passage, or was implied in it. The total number of ideas recalled correctly is used for the recall score. Alderson (2000) pointed out that an idea unit was somewhat difficult to define, and that this issue has rarely been addressed in reading research. He was also careful to say that the objectivity of scoring with the idea unit depends on the rater reliability index and that high reliability is necessary. One mechanism that merits further research is the scoring system used to codify written recalls, specifically with learners of English in China. To date, it appears that no study has examined the type of unit (pausal or idea) that is utilized to codify written recalls for native Chinese readers of English. The type of unit can also be described as the nature of the divisions within the written recall. The present study examines the relationship between the condition of unit (idea or pausal) under which recalls are scored. It also investigates the possible effects of moderating variables, such as the length of time spent studying English, the amount of pleasure reading done in English, and the type of passage used. These moderating variables were selected after discussions with the professors who teach the English courses at the university in China. These
Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 117 Reading in a Foreign Language 26(1) professors spent one full year studying and doing research at a university in the USA, where they attended a course that examined the research and theory that drives L2 reading. After completing course readings on the interacting variables involved in reading, the professors felt that students who have studied English for a long time and frequently read in English for pleasure would not be affected by the condition of the unit. However, they also thought that the type of passage (passages with or without inserted questions) would impact recalls for all learners in the study. For the present study, the pausal unit is considered to be the natural pause produced by a native speaker while reading the passage orally (Bernhardt, 1991). The idea unit is an idea, proposition, or constituent structure (Riley & Lee, 1996). In studies that use the pausal or idea unit protocol, the total number of correct units recalled is traditionally used as the dependent variable for comprehension. For the present investigation, the idea units are not weighted. In other words, no hierarchy was placed on the units recalled. The recalls were graded by counting the number of correct pausal units and the number of correct idea units. The present study is motivated in part by prior findings from Brantmeier, Callender, Yu, and McDaniel (2012). In that study the written recalls were codified with the pausal unit protocol and were completed by two different researchers in China, both native speakers of Chinese and professors of English. The Chinese researchers had not used pausal unit in prior experiments, and after completing the scoring of all recalls they questioned the rubric because they thought it might have impacted scores. That investigation specifically examined the effects of textual enhancements on reading comprehension with adult native Chinese speakers learning English at the university. An adjunct was inserted into the readings that consisted of a 'what' question along with instructions to either 'pause and consider,' or 'pause and write.' Participants achieved almost the same scores on recall and sentence completion for versions with no adjuncts and versions with 'pause and consider' adjuncts, and they scored significantly lower on the assessment tasks for the passage versions with pause and write adjuncts. Could the scoring system used to codify recalls have made a difference in the outcomes? In other words, if recalls from that study had been codified with idea units, would different findings have emerged? The present study used a subset of data from the prior experiment to answer the following research questions. The prior investigation did not specifically address the research questions in the present study, as these new questions emerged after making methodological changes to the earlier study. Research Questions The following overall research questions guide the present study: 1. What is the relationship between pausal and idea units used to codify the written recalls of high intermediate learners of English whose native language is Chinese? 2. Does the strength of the relationship between pausal and idea units depend on other variables, such as the length of time studying in English, the amount of pleasure reading done in English, or the version of passage used?
Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 118 Reading in a Foreign Language 26(1) Method Participants Participants included 185 students (aged 19 to 24 years) enrolled in a third year English course at a large university in the northeastern region of China. All participants (24 male and 161 female) declared English as their primary major. All students enrolled in this course participated in the experiment, and therefore the gender of the participants is not proportional. The English language program classified the learners as upper intermediate, and all participants were enrolled in a required Advanced Reading course for English majors. The course goals involved development of the ability to analyze and paraphrase a variety of readings that included passages from professional textbooks as well as authentic texts written by native English speakers for English speakers. Through the introduction of different cultural knowledge, the course also examined rhetorical devices and writing features. Only those students who met the following criteria were included in the final data analysis: (a) students whose native language was Chinese, and (b) students who completed all tasks during data collection. During their previous two years at the university, participants had been enrolled in various reading related classes, such as Intensive Reading, Extensive Reading, and Journal Reading. In all these courses, students were required to read a lot outside of class. A reading list was provided before each semester, and multiple choice tests of vocabulary and comprehension were conducted once a month to test for comprehension. Students were not familiar with the written recall protocol to measure reading comprehension. It is important to note that before enrolling in the course, all participants were required to have passed the national proficiency test called TEM 4 (Test for English Majors, Band 4) administered by the Ministry of Education of China. Procedures Data collection procedures. Data collection took place toward the end of the fall semester of 2010. The university is known for the outstanding quality of its language programs. Participants attending the Advanced Reading course were invited to take part in the experiment. Everyone agreed and then signed a consent form to that effect. Students had the option of extra credit points or monetary compensation for participation, and all participants preferred points. When participants were invited to participate in the experiment, the researcher told them that the investigation would examine variables involved in the reading of English, but no specific details were provided. The researcher was present during data collection to ensure that students did not turn back to previous pages while completing the tasks. The experiment was held on a single day outside of class time. Demographic Questionnaire. A demographic questionnaire was developed in order to examine other factors that may be involved in L2 reading comprehension as measured via the written recall protocol. Questions such as 'What is your native language?' and 'How many years have you studied English?' were offered as part of the questionnaire. The complete questionnaire is included in Appendix A. As indicated earlier, participants completed the demographic questionnaire prior to the readings as part of the data collection packet. Data was then entered
Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 119 Reading in a Foreign Language 26(1) directly into the SPSS database as the questions were direct and no codification process was necessary. Reading Passages and Embedded Questions. The two reading passages for the study were taken from social psychology textbooks and from passages utilized in prior L1 reading research (Callender & McDaniel, 2007) and L2 reading research (Brantmeier et al., 2012). The first passage included details about theories of first impressions, the primacy effect, and schemas. The second passage detailed implicit personality theory and attribution theory. Table 1 lists the passage length, total number of sentences, and total number of embedded clauses, and also includes factors related to text difficulty. For the sake of consistency, the authors of the present study included the above factors to determine passage difficulty (Brantmeier et al., 2012). A future study of this type may also analyze text difficulty with Flesch Reading Ease scales to further analyze the effects of text difficulty. Table 1. Text difficulty by passage Passage Content Length Number of Sentences Number of Embedded Clauses Passage 1: First impressions, the primacy effect, and schemas 525 words 21 sentences 15 embedded clauses Passage 2: Implicit personality theories with a detailed explanation of attribution theory 646 words 22 sentences 16 embedded clauses Appendix B includes the breakdown of pausal and idea units for each passage, and it includes both the English and Chinese translations. Passage 1 the total number of pausal units was 26 and total number of idea units was 52. For Passage 2 the total number of pausal units was 24 and the total number of idea units was 48. It is important to note that generally speaking, the breaks for pausal units do not occur in the same locations as the breaks for idea units. Three different versions of each passage were included in the study (Brantmeier et al., 2012). Version 1 included no inserted questions. Version 2 included inserted 'what' questions that required learners to pause and consider, and Version 3 included inserted 'what' questions that asked learners to pause and write. Recall. As discussed in detail earlier, the instructions for the written recall asked: 'Please write down as much as you can remember about what you just read.' The third author of this study, a native speaker of Mandarin Chinese, scored recalls first for the presence or absence of pausal units, then for the presence or absence of idea units. For inter rater reliability, a second native Chinese speaker was trained to score recalls for the two types of units, and then randomly scored written recalls. The inter rater reliability coefficient was .97 for pausal units and .96 for idea units. The total number of possible pausal units was 26 for Passage 1, and 24 for Passage 2. The total number of possible idea units for was 51 for Passage 1, and 48 for Passage 2. Results Means, standard deviations, and correlations for total pausal and idea units recalled for the combined readings are listed in Table 2. Across passages, participants scored higher on the
Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 120 Reading in a Foreign Language 26(1) recalls that were codified according to idea units than they did on the recalls that used pausal units. Recall across passages was strongly correlated for both pausal units and idea units; within passages, pausal and idea unit recall was even more strongly correlated. To simplify subsequent analyses, the pausal unit recall scores for the two passages were averaged (after standardization). Similarly, the idea unit recall scores for the two passages were also averaged. The percentage of correct pausal units was 81% for Passage 1 and for 82% for Passage 2. The percentage of correct idea units was 75% for for Passage 1 and 65% for Passage 2. Table 2. Means, standard deviations and correlations for pausal and idea unit recall Pausal Units Idea Units Passage 1 Passage 2 Passage 1 Passage 2 Pausal Units Passage 1 1.00 Passage 2 0.66 1.00 Idea Units Passage 1 0.94 0.68 1.00 Passage 2 0.66 0.95 0.70 1.00 Mean 21.37 19.97 37.92 30.81 SD 11.90 11.97 19.34 17.31 Note. All correlations are statistically significant, p < .001. To illustrate the strong linear relationship between pausal unit and idea unit recall, Figure 1 shows the scatter diagram for the averaged data. As depicted on Figure 1, the relationship is very high, with over 90% of the variance in pausal unit recall accounted for by idea unit recall.
Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 121 Reading in a Foreign Language 26(1) Figure 1. Relationship between pausal and idea units A close inspection of the scatterplot reveals somewhat greater deviations from the regression line near the upper range of recall, suggesting the possible presence of moderating variables. To explore this possibility, multiple regression analyses were conducted with total pausal unit recall as the outcome, total idea unit recall as one predictor, an additional variable as another predictor (e.g., length of time studying in English, amount of leisure reading done in English, or version of passage read), and the product of idea unit recall and the additional predictor included to test the interaction (i.e., presence of a moderator). The only significant factor that emerged was version of passage read, F (2, 179) = 4.46, p = .013. The factors of length of study and amount of pleasure reading did not emerge as significant variables in the present study. Figure 2 illustrates the slightly different relations between pausal unit recall and idea unit recall for the three different versions, with slight attenuation for Version 1(no inserted questions) and Version 2 (pause and consider) relative to Version 3 (pause and write). Note, however, that these differences are rather small amid a large relationship between the two recall measures (r = .96).
Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 122 Reading in a Foreign Language 26(1) 1.25 0.75 0.25 0.25 0.75 1.25 1 SD+1 SD Recall Idea Recall Pausal Version 1 Version 2 Version 3 Figure 2. The effect of version on the relation between idea unit recall and pausal unit recall Discussion Overall, the present study found a strong correlation between idea units and pausal units for written recalls. This correlation appears to underscore prior findings by Bernhardt (1991) with native English speakers learning to read in German. Bernhardt provides examples of a weighted propositional analysis of recalls, the Meyer system, and compares this to the simple pausal unit analysis of recalls. In her study, correlations between the Meyer system and the pausal system were .96 for one text. In the second text, however, the correlation was only .54. With the second text Bernhardt followed with additional weighted scoring that brought the correlation up to .84. In the end, Bernhardt demonstrated that the pausal unit was just as effective as the idea unit, and that it was far more efficient to use it, as it was less time consuming for the researchers. This was also the case for the present experiment with native Chinese learners of English. The strong correlation between idea and pausal unit scores in the present study underscores Bernhardt's (1991) earlier assertion that the pausal and idea units tap the same behavior. More specifically, even though the number of possible idea units for each passage was higher than the number of possible pausal units, the findings echoed Bernhardt (1991). The present study did not scale or rate the types of idea units recalled, and it did not find that the idea units were a better system for grading. When they were completing coding the recall tasks, the Chinese researchers questioned the use of pausal units and asked if idea units would better serve
Brantmeier, Strube, & Yu: Scoring recalls for L2 readers of English in China 123 Reading in a Foreign Language 26(1) these readers. The researchers were not accustomed to using the written recall protocol as a measure of comprehension, probably because multiple choice questions are more commonly used in China. The findings demonstrate that the pausal unit grading system is appropriate for English learners in China. It is important to discuss the fact that although the high correlation between the two measures suggests that they are interchangeable, this is only true in a relative sense, not in an absolute sense. The percentage of correct idea units was 75% for Passage 1 and 65% for Passage 2. The idea unit approach seems to produce systematically lower scores. A follow up study could examine more qualitative information within recalls, such as misunderstandings, and how these misunderstanding impact comprehension. For example, Young and Nakuma (2009) used a subset of data from an earlier experiment (Young, 1999) to examine the misunderstandings present in recalls of Spanish learners. Their second year Spanish students read culturally unfamiliar texts and were asked to recall as much as they could about what they had read in their native language. The researchers categorized the incorrectly recalled units as either linguistically based misunderstandings, where students' inadequate knowledge of the language code may explain misunderstandings, or cognitively related misunderstandings that are attributable to propositional deficits. The misunderstandings consisted of students rewriting the text to fit their personal assumptions. Findings revealed that when readers at this stage of acquisition read a text with little background knowledge, they became more dependent on the literal language of the text. The researchers found that the readers used more word level approaches to comprehend and that their linguistic deficits also strongly influenced misunderstandings of the reading passages. The recalls provided in the present study could be examined for misunderstandings with a linguistically based and cognitively related rubric in order to find out whether the students of English in China were using this type of compensatory behavior because of deficiencies in linguistic abilities or lack of familiar content located within domain specific texts. The result of such analysis could hold strong implications for the teaching of reading in China because the nature of misunderstandings may be more easily revealed with one type of unit than the other. The present study indicated that from an overall performance standpoint it does not matter which type of grading unit is used, but it may matter for a more fine grained analysis. This awaits additional research to resolve. Brantmeier et al. (2012) discussed reasons for decreased performance in the 'pause and write' condition by explaining that the question and writing process may have caused the readers to focus on the information addressed in the question, rather than generating inferences and other details from the text. The questions for 'pause and write' were predetermined retrieval cues. Consequently, the readers may have thought this information was of most importance. As inferences are included in the idea unit rubric but not on the pausal unit protocol, we could assert that the embedded adjuncts did not help the reader with the idea units. When writing the recall the readers may have been influenced by the nature of these questions. A close examination of the recalls revealed that participants did not include other information from the text; this may have had an impact on the nature of units recalled. Scholars and instructors should be cautious about how far the current findings are to be applied. The results indicated that pausal units are indeed a fine proxy for idea unit recall and are easier to use. However this finding applied to the overall performance on the recall task. Future research