2007年8月28日火曜日

Expanding the motivation construct in language learning

Trembley, P., & Gardner, R. (1995). Expanding the motivation construct in language learning. The Modern Language Journal, 79(4). 505-518.

Trembely & Gardner (1995)はまず動機付けの研究分野を紹介し、そしておもにGardnerの研究について言及した。しかし、近年教育心理学の分野では動機づけの研究が進み、さまざまな理論(e.g., self-efficacy, goal setting theory)が提唱されているのにもかかわらず、Gardnerが開発したモデルであるSocio-educational modelに取り込まれていないとを指摘した。よって、この研究も目的はSocio-educational modelに近年提唱されている理論を付加することである。オンタリオにいる75名の高校生が調査参加者である。25スケールからなる質問紙やテストが実施され、質問紙の信頼性係数アルファ値は低いスケールで.26で高いものでは.92と、信頼性はやや高いといえる。フランス語の習熟度を測定するためエッセイを書くライティングテストとフランス語の成績が用いられた。構造方程式モデルを用いて潜在変数の関係性を因果モデルで表し、結果すべてのパス係数は有意であるという結果となった。モデルの適合度であるGFIやAGFIは低く、それぞれ.70と.63となり、著者はモデルが複雑であるので適合度が低くなったと理由を挙げた。他に、適合度が低い理由としては、サンプルが少ない、一つの潜在変数に一つしか観測変数がないなどがあげられる。このモデルはいくつかの動機づけの理論を取り入れた類をみないもので、新たな視点を動機づけ研究に取り入れた。

2007年8月27日月曜日

Assessing motivational factors in foreign language learning: Cultural variation in key constructs

Rueda, R. & Chen, C. (2005). Assessing motivational factors in foreign language learning: Cultural variation in key constructs, Educational Assessment, 10(3), 209-229.

Rueda and Chen (2005) argued that in foreign language learning cultural differences in motivation had not been taken account and criticized the existing models of motivation for saying they were developed from a western perspective only. Thus, the purpose of their study was to investigate the extent to which the motivational factors impact the learning of Asian heriage and non-Asian heritage learners of Chinese language. One hundred fifty university students comprising 116 Asian heritage and 34 non-Asian heritage learnners took two kinds of instruments. The first kind was motivation questionnaire. No scree plot, factor loadings, eigen values were reported, but seven factors were extracted, namely instrumentality, intrinsic motivation, passivity toward requirements, task value, belief about effort, seld-efficacy, and effort develoted to target language learning. The second kind was learning outcomes questionnaire. No descriptive statistics were reported and assumptions prior to conducting t-tests, correlation, and structural equation modeling were not mentioned. The results of the t-test showed that there was a significant difference in the passitivity toawrd requirements factor. The causual relationships among the seven factors were displayed, but no chi-square value and fit indexes were reported. It showed that task value was the best predictor of effort devoted factor. In conclusion, they mentioned that there was no significant difference in terms of their motivational belifes, but argued that motivational constucts were influenced by cultural factors.

2007年8月26日日曜日

Score reliability and placement testing

Westrick, P. (2005). Score reliability and placement testing. JALT Journal, 27(1), 71-94.

Westrick (2005) discussed three reasons for the implementation of Quick Placement Test-Pen and Paper Test (QPT-PPT) within a curriculum and reported the results of a placement test administered to 161 Japanese university students. The first reason was the status. The QPT-PPT was developed by a prestigious institution, and some believed that implementing it could somehow improve the image of the curriculum. The second reason was that it was extremely difficult to develop their in-house placement tests. The third reason was paucity of time. Usually, placement tests are administered during a busy period of the academic year, and it is difficult to find time to administer tests during an orientation period. In addition, administrators have to declare the results in a short period of time in order to announce the classes in which students were placed. Thus, the tests have to be brief and easy. As for the results, the K-R 20 internal consistency reliability coefficient was .66 with the item number of 120 when 161 students took the cloze and multiple-choice test that tested reading, grammar, and vocabulary skills. Westrick concluded that the QPT-PPT might be effective with other groups but not for his participants and urged on the development of in-house placement tests that were connected to curricular goals and objectives.

Using a commercially produced proficiency test in a one-year core EFL curriculum in Japan for placement purposes.

Culligan, B., & Gorsuch, G. (1999). Using a commercially produced proficiency test in a one-year core EFL curriculum in Japan for placement purposes. JALT Journal, 21(1), 7-25.

Culligan and Gorsuch (1999) discussed the adequacy of employing commercially produced proficiency tests for making placement decisions. Second level English proficiency Test (SLEP), composed of reading and listening tests, was administered twice to 487 Japanese university students as pretest and posttest. Based on a norm-referenced item analysis, known as item discrimination (ID), it was discovered that less than half the items did not discriminate between high and low scoring students. The result of a criterion-referenced item analysis, referred to as difference index (DI), indicated that students learned only one-third of the items in the program. The researchers concluded that SLEP should not be used for making placement decisions because the reliability was only .81 for the entire test, with the total number of items being 150 and it had a wide range of standard error of measurement. No sectional reliability coefficients were reported. They also mentioned that SLEP was inadequate because the test did not estimate their students’ speaking proficiency, which was the major goal of the program. They suggested that only items with a certain item discrimination value be used. They also recommended the use of item response theory to make more precise placement decisions (2000).

Translation as a language testing procedure: Does it work?

Buck, G. (1992). Translation as a language testing procedure: Does it work? Language Testing, 9(2), 123-148.

Buck(1992)はKlein-Braley(1987)の研究を引用し、訳テストがテスト方法として妥当な手法ではないことの理由を3点あげた。1つめの理由は、第2言語能力が高い学習者であってもよい訳ができるとは限らない。訳すという能力はもちろん高度な第2言語(L2)能力と第1言語(L1)能力を必要とはするが、正確な通訳や訳ができるようになるにはトレーニングが必要であり、訳すこととL2能力とは別の能力であると考えられる。2つめの理由は、訳テストは採点者を要す一種のパフォーマンステストであり、採点者の採点結果は必ずしも同一にならなく、一貫性に欠け、信頼性に問題がある。3つめの理由は、訳テストとはクローズ(cloze)テスト同様、一つの項目でいくつかの能力を測定する統合的(integrative)テストであり、いくつもの項目の集合体で一つの構成概念を測定する部分的測定(discrete-point)テストとは異なり、テストの妥当性を検証することが困難である。そして、訳テストの妥当性を検証するため、121名の日本人英語学習者に同じ内容のテキストから統合的項目であるクローズと訳の項目をそれぞれ36問、2問と部分的測定項目である多岐選択式読解の項目を23問実施した。7人の採点者はトレーニングを行わず6件法で訳を採点した。結果、訳の採点の厳しさが異なり、採点結果が有意に異なったが、採点者間の得点の相関係数は高く、よって採点者間信頼性(inter-rater reliability)が高いということで、信頼性は高いという結果となった。また、3つのテスト得点の相関係数は高く、読解力を測定しているかは不明であるが、妥当性も高いと結んだ。

Relationships among IRT item discrimination and item fit indices in criterion-referenced language testing.

Hudson, T. (1991). Relationships among IRT item discrimination and item fit indices in criterion-referenced language testing. Language Testing, 8(2), 160-181.

Hudson (1991) applied IRT to analyze CRTs. The Rasch model or the one-parameter logistic model was compared with the two-parameter logistic model and used to analyze two forms of general tests of English language proficiency (GTELP). He reported the Cronbach alpha internal consistency reliability that was used for norm-referenced tests, and found all the subtests to be highly reliable. The results indicated that strong correlations were found among point-biserial, infit, outfit, and slope parameter. He recommended the two-parameter logistic model over the Rasch model because the slope parameter was easier to interpret; the infit/outfit statistics could be a substitute for the slope parameter. He concluded that highly discriminant items should be omitted in the test development. Item difficulty should also be taken into account. Selecting items with item difficulty near the cut-off point should be included to arrive at more dependable pass/fail decisions.

Improving ESL placement tests using two perspectives.

Brown, J. D. (1989). Improving ESL placement tests using two perspectives. TESOL Quarterly, 23(1), 65-83.

Brown (1989) pointed out the difference between NRTs and CRTs and argued that placement tests were NRTs that should spread examinees out along a continuum but that had to be curriculum-specific. In other words, placement tests should include items that could discriminate between groups of high or low scoring examinees within the curriculum. He conducted both norm-referenced and criterion-referenced item statistics, such as item facility, item discrimination, and difference index for the purpose of selecting 35 out of 60 reading items that were sensitive to curriculum content and which could discriminate between examinees. He reported the internal consistency Kuder-Richardson formula 20 reliability coefficient (K-R20) and found the test to be reliable at .89 with 60 items. After the revision was made, there remained reliability at .85 with 35 items. To support the construct validity of the test, Brown reported the pretest/posttest score gain and found a statistically significant gain.