SlideShare a Scribd company logo
Practical Language Testing
Fulcher (2010)
Two paradigms in educational
measurement and language testing
1) Norm-referenced testing: The meaning of the score on
a test is derived from the position of an individual in
relation to the group. It discriminate between test takers
and separates them out (i.e., distribute) very effectively.
Decision making with norm-referenced tests involves
value judgments about the meaning of scores in terms
of the intended effect of the test.
Two paradigms in educational measurement
and language testing (Cont.)
 Criterion-referenced testing: The aim is to make a decision
about whether an individual test taker has achieved a pre-
specified criterion, or standard, that is required for a particular
decision context.
What is a standardized test?
 A standardized test is a form of NRT that
1) requires all test takers to answer the same questions, or a
selection of questions from common bank of questions,
in the same way;
2) is scored in a “standard” or consistent manner, which
makes it possible to compare the relative performance of
individual students or groups of students.
 The term is primarily associated with large-scale tests
administered to large populations of students
Why testing is viewed as a ‘science’
 The early scientific use of tests initiated by the
introduction of statistical analysis in testing area during
First World War
 Greenwood (1919): “When you can measure what you
are speaking about and express it in numbers, you know
something about it, but when you cannot measure it,
when you cannot express it in numbers, your knowledge
is of a meagre and unsatisfactory kind” (p. 186)
 Fulcher (2010): “tests, like scientific instruments,
provide the means by which we can observe and
measure consistencies in human ability”.
Why testing is viewed as a ‘science’ (Cont.)
Shohamy (2001): “Testing is perceived as a scientific
discipline because it is experimental, statistical and uses
numbers. It therefore enjoys the prestige granted to science
and is viewed as objective, fair, true and trustworthy” (p.
21) which are key features of the “power of testing”.
Lipman (1922): Strong trait theory is untenable. In fact,
most of the traits or constructs that we work with are
extremely difficult to define, and if we are not able to
define them, measurement is even more problematic.
The curve and score meaning
 In NRT, the meaning of a score is directly related to its place in
the curve of the distribution (or a bell curve) from which it is
drawn.
-3SD -2SD -1SD 0 1SD 2SD 3SD
Central tendency
Central tendency: The most typical behavior of the
group
 Mode: Scores occurs most frequently
Bimodal with two peaks
Trimodal with three peaks
 Median: The point below which 50 percent of the
scores fall and above which 50 percent fall.
 Midpoint: The point halfway between the highest
score and the lowest score on the test (high+low/2)
 Mean:
(The midpoint for NRT is the mean)
Dispersion
Dispersion: How the individual performances vary from the central
tendency.
 Range: The number of points between the highest score and the
lowest one plus 1.
 Standard deviation (SD): A sort of average of the differences of
all scores from mean (the square root of the sum of the
squared deviation scores, divided by N – 1).
Deviation score: The score obtained from the subtraction of the
mean from each of the individual scores ( ) (The mean of
these scores is always zero).
Dispersion (Cont.)
SD formula:
N-1 for sample
N for population group
SD is better than the range since it is the result of
averaging process and lessen the effects of extreme
scores not attributable to performance on the test.
Variance: The squared value of SD
Example
Score Mean X-M (X-M)2
77 71 6 36
75 71 4 16
72 71 1 1
72 71 1 1
70 71 -1 1
65 71 -6 36
66 71 -5 25
Central tendency Dispersion
Mode =72
Median =72
Midpoint =77+66/2=71.5
Mean = 77+75+72+72+70+65+66/7=71
Range = 77-66+1=12
SD =√(36+16+1+1+1+36+25)2/7= 4
Variance = s2 = 42 =16
Example (cont.) (with raw score)
In the normal curve, mean, mode, midpoint, and median are all the
same.
Score 76: 50% +34.13% = 84.13% (Percentile: The total percentage
of students who scored equal to or below a given point in normal
distribution)
60 64 68 72 76 80 84
Standardized tests: a) z scores
 A z-score: The raw score expressed in standard deviations.
 Z score formula:
The mean of z scores is always zero.
The SD of z scores is 1.
3 ≤ z scores ≤ +3
-.5
Z= 70-72/4= -.5sd
Standardized tests: a) z scores (Cont.)
 Three problems of z scores:
1. They are relatively small, ranging from -3 to +3.
2. They can turn out to be negative and positive.
3. They turn out to include several decimal places.
Reporting scores in form of z scores can be demotivating
for the students.
To overcome its problems, z scores should be transformed
to some standardized scales
Standardized tests: b)T scores
Main formula of standardized scales (linear transformation
of z scores):
 T score formula: T = 10z +50
Mean = 50 SD = 10 range = 10-90
 Example: raw score = 70 z score = -0.5
T score = 10 * -0.5 + 50 = 45
Standardized tests: c) CEEB scores
 CEEB (College Entrance Examination Board) is the standardised
Gaokao examination and used for SAT, GRE, TOEFL, etc.
 CEEB formula: CEEB = 100z +500
Mean = 500 SD = 100 range = 100-900
 Example: raw score = 70 z score = -0.5
CEEB score = 100 * -0.5 + 500 = 450
Item analysis
 Item facility/item easiness/ item difficulty/facility index: The
statistics used to examine the percentage of students who
correctly answer a given item.
IF formula = Ncorrect /Ntotal
 Item discrimination (ID): The degree to which an item
separates the students who performed well from those who
did poorly on the test as a whole.
ID formula = IF upper – IF lower
Range Acceptable Best
0 ≤ IF ≤ 1 .3 ≤ IF ≤ .7 IF = .5
-1 ≤ ID ≤ +1 .4 ≤ ID ID = 1
Reliability
 Reliability: Consistency of scores under different
circumstances.
 Reliability differs from scorability
 Reliability indicates the degree to which the observed
score and true score match.
 The observed score (X) is made up of the ‘true’ score of
an individual’s ability on what the test measures (T),
plus the error (E) that can come from a variety of
sources.
Threatens to reliability (Lado)
1. Variation in conditions of administration: Fluctuation of scores over time,
in different places or under slightly different conditions (such as a
different room, or with a different invigilator)
2. The quality of the test itself: Problems with sampling what language to
test – as we can’t test everything in a single test. If a test consists of items
that test very different things, reliability is also reduced. This is because
in standardised tests any group of items from which responses are added
together to create a single score are assumed to test the same ability, skill
or knowledge. The technical term for this is item homogeneity.
3. Variability in scoring: If humans are scoring multiple-choice items they
may become fatigued and make mistakes, or transfer marks inaccurately
from scripts to computer records. However, there is more room for
variation when humans are asked to make judgments.
Calculating reliability
 The method we use to calculate reliability depends upon
what kind of error we wish to focus on.
 The notion of correlation is at the very center of the
notion of reliability.
 A reliability coefficient is calculated that ranges from 0
(randomness) to 1, and no test is ‘perfectly’ reliable.
There is always error of measurement.
Calculating reliability
1. Variation in conditions of administration
 The statistical technique of correlation used is Pearson Product
Moment Correlation.
 Assumptions: 1. Interval scale, 2. Independence: each pair of scores is
independent from all other pairs, 3. Normally distributed, 4. Linearity
 -1 ≤ r ≤ +1:
1. –1 : There is an inverse relationship between the scores
2. 0 : There is no relation between the two sets of scores
3. 1 : The scores are exactly the same on both administrations of the test.
The closer the result is to 1, the more test–retest reliability we have
Coefficient of determination
 Statistical significance is a necessary precondition for a
meaningful correlation but not sufficient in itself.
 Coefficient of determination is simply correlation
coefficient squared (r2), and represents the proportion of
overlapping variance between two sets of scores (i.e., as the
score on one test increases, so it increases proportionally on
the other test)
0 ≤r2≤ 60 low (one third overlapping variance)
60 ≤r2≤ 80 moderate (one third to two third overlapping variance)
80 ≤r2≤100 high (two third to complete overlapping variance)
2. The quality of the test itself (internal
consistency)
 Reliability is addressed in terms of homogeneity of items (they
must all be highly correlated).
 Requirements:
1. Parralelism: Two tests should be parallel (with same means,
variances, same correlation with another well-established
measure of that construct)
2. Independence: The response to any specific item must be
independent of the response to any other item; put another way,
the test taker should not get one item correct because they have
got some other item correct. The technical term for this is the
stochastic independence of items.
 Statistics used: Split-half methods and methods based on item
variance
Split-half method
 Main procedure: Split the test into two equal halves, calculate the
correlation between the two halves.
1. Spearman-Brown split-half reliability estimate: Since reliability is
directly related to the length of a test, correct the correlation for
length via Spearman Brown correction formula (Pallarellism and
independence are required)
2. Guttman split-half reliability estimate (Pallarellism is not required
but independence is required)
Methods based on item variances
 Estimates based on item variances (Pallarellism and independence
are required)
1. Cronbach’s Coefficient alpha for dichotomously scored items
(scored ‘right’ or ‘wrong’)
2. K-R20 /K-R21
3. Variability in scoring (grading and
marking)
 Whatever rater is making the judgment should be a matter of
indifference to the test taker
 Inter-rater reliability: Our concern is with variation between raters
because some raters are more lenient than others, or some raters
may rate some test takers higher than others (perhaps because
they are familiar with the first language and are more sympathetic
to errors).
 Intra-rater reliability: Our concern is with variation within one
rater over time.
 Statistics: Cronbach’s alpha for partial credit judgments
Standard Error of Measurement (SEM)
 One of the most important tools in standardised testing is the standard
error of measurement.
 While the reliability coefficient tells us how much error there might be
in the measurement, it is the standard error of measurement that tells us
what this might mean for a specific observed score more
informative for interpreting the practical implication of reliability
 SEM formula:
 Confidence interval: SEM gives us a confidence interval around an
observed test score, which tells us by how much the true score may be
above or below the observed score that the test taker has actually got on
our test.
Example
Example: SD= 4 r = .64 SEM =4 √1 - .64= 2.4
Raw score = 74 SEM = 2.4
68% (between +1SEM and –1SEM) 71.6 ≤true score ≤76.4
95% (between +2SEM and –2SEM) 69.2 ≤true score ≤ 78.8
99% (between +3SEM and –3SEM) 66.8 ≤true score ≤81.2
100% (between +4SEM and –4SEM) 66.8 ≤true score ≤81.2
Reliability and test length
 In standardised tests with many items, each item provides a piece of
information about the ability of the test taker, therefore, as we increase
the number of items, the reliability will increase.
 Formula for looking at the relationship between reliability and test
length
A: The proportion by which you would have to lengthen the test to get the desired
reliability
rAA : The desired reliability
r11 : The reliability of the current test.
 However, the best way to increase reliability is to produce better items
Relationships with other measures
 One key part of standardised testing: The comparison of
two measures of the same construct.
 If two different measures were highly correlated this
provided evidence of validity. This aspect of external
validity is criterion-related evidence, or evidence that
shows one test is highly correlated with a criterion that
is already known to be a valid measure of its construct
(called evidence for convergent validity)
 Measurement as understood in Classical Test Theory
Practical Language Testing by Fulcher (2010)
Ad

More Related Content

What's hot (20)

Introduction to Test and Assessment
Introduction to Test and Assessment Introduction to Test and Assessment
Introduction to Test and Assessment
soerdepoer
 
ASSESSMENT: PRINCIPLES OF ASSESSMENT
ASSESSMENT: PRINCIPLES OF ASSESSMENTASSESSMENT: PRINCIPLES OF ASSESSMENT
ASSESSMENT: PRINCIPLES OF ASSESSMENT
A. Tenry Lawangen Aspat Colle
 
Testing as a problem solving
Testing as a problem solving Testing as a problem solving
Testing as a problem solving
noviarabbani
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing
Seray Tanyer
 
Validity, reliablility, washback
Validity, reliablility, washbackValidity, reliablility, washback
Validity, reliablility, washback
Maury Martinez
 
Elt curriculum and syllabus
Elt curriculum and syllabusElt curriculum and syllabus
Elt curriculum and syllabus
Pao Cossu
 
Fundamental concepts and principles in Language Testing
Fundamental concepts and principles in Language TestingFundamental concepts and principles in Language Testing
Fundamental concepts and principles in Language Testing
Phạm Phúc Khánh Minh
 
Teaching and testing
Teaching and testingTeaching and testing
Teaching and testing
Mohammed Alkhamali
 
introducing language testing and assessment
 introducing language testing  and assessment introducing language testing  and assessment
introducing language testing and assessment
Najah M. Algolaip
 
Types of Tests,
Types of Tests, Types of Tests,
Types of Tests,
Wardah Azhar
 
Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6
ahfameri
 
Test methods in Language Testing
Test methods in Language TestingTest methods in Language Testing
Test methods in Language Testing
Seray Tanyer
 
Approaches to language Assessment
Approaches to language Assessment Approaches to language Assessment
Approaches to language Assessment
AliAlZurfi
 
Chapter 2: Principles of Language Assessment
Chapter 2: Principles of Language AssessmentChapter 2: Principles of Language Assessment
Chapter 2: Principles of Language Assessment
Hamid Najaf Pour Sani
 
Language Assessment - Beyond Test-Alternatives Assessment by EFL Learners
Language Assessment - Beyond Test-Alternatives Assessment by EFL LearnersLanguage Assessment - Beyond Test-Alternatives Assessment by EFL Learners
Language Assessment - Beyond Test-Alternatives Assessment by EFL Learners
EFL Learning
 
Me1
Me1Me1
Me1
Mansooreh Alavi
 
Discussion summary emergentism
Discussion summary emergentismDiscussion summary emergentism
Discussion summary emergentism
lichengcheng
 
Assessing receptive skills.pptx
Assessing receptive skills.pptxAssessing receptive skills.pptx
Assessing receptive skills.pptx
Lahcen Biz
 
Testing reading
Testing readingTesting reading
Testing reading
Lucía Rubio Rubio
 
Bilingualism
Bilingualism Bilingualism
Bilingualism
Al Alva
 
Introduction to Test and Assessment
Introduction to Test and Assessment Introduction to Test and Assessment
Introduction to Test and Assessment
soerdepoer
 
Testing as a problem solving
Testing as a problem solving Testing as a problem solving
Testing as a problem solving
noviarabbani
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing
Seray Tanyer
 
Validity, reliablility, washback
Validity, reliablility, washbackValidity, reliablility, washback
Validity, reliablility, washback
Maury Martinez
 
Elt curriculum and syllabus
Elt curriculum and syllabusElt curriculum and syllabus
Elt curriculum and syllabus
Pao Cossu
 
Fundamental concepts and principles in Language Testing
Fundamental concepts and principles in Language TestingFundamental concepts and principles in Language Testing
Fundamental concepts and principles in Language Testing
Phạm Phúc Khánh Minh
 
introducing language testing and assessment
 introducing language testing  and assessment introducing language testing  and assessment
introducing language testing and assessment
Najah M. Algolaip
 
Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6
ahfameri
 
Test methods in Language Testing
Test methods in Language TestingTest methods in Language Testing
Test methods in Language Testing
Seray Tanyer
 
Approaches to language Assessment
Approaches to language Assessment Approaches to language Assessment
Approaches to language Assessment
AliAlZurfi
 
Chapter 2: Principles of Language Assessment
Chapter 2: Principles of Language AssessmentChapter 2: Principles of Language Assessment
Chapter 2: Principles of Language Assessment
Hamid Najaf Pour Sani
 
Language Assessment - Beyond Test-Alternatives Assessment by EFL Learners
Language Assessment - Beyond Test-Alternatives Assessment by EFL LearnersLanguage Assessment - Beyond Test-Alternatives Assessment by EFL Learners
Language Assessment - Beyond Test-Alternatives Assessment by EFL Learners
EFL Learning
 
Discussion summary emergentism
Discussion summary emergentismDiscussion summary emergentism
Discussion summary emergentism
lichengcheng
 
Assessing receptive skills.pptx
Assessing receptive skills.pptxAssessing receptive skills.pptx
Assessing receptive skills.pptx
Lahcen Biz
 
Bilingualism
Bilingualism Bilingualism
Bilingualism
Al Alva
 

Similar to Practical Language Testing by Fulcher (2010) (20)

Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
Rosario National High School
 
Measurement and instrumentaion
Measurement and instrumentaionMeasurement and instrumentaion
Measurement and instrumentaion
ahmedabbas1121
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
Louzel Linejan
 
MCQ test item analysis
MCQ test item analysisMCQ test item analysis
MCQ test item analysis
Soha Rashed
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)
Tahere Bakhshi
 
educatiinar.pptx
educatiinar.pptxeducatiinar.pptx
educatiinar.pptx
NithuNithu7
 
3-Psychometric-Mrs-Caigas.pptxjjjjjjjjjj
3-Psychometric-Mrs-Caigas.pptxjjjjjjjjjj3-Psychometric-Mrs-Caigas.pptxjjjjjjjjjj
3-Psychometric-Mrs-Caigas.pptxjjjjjjjjjj
relucioglaiza5
 
UNIT 6_ Reliability-psychtesting-psychassess.pdf
UNIT 6_ Reliability-psychtesting-psychassess.pdfUNIT 6_ Reliability-psychtesting-psychassess.pdf
UNIT 6_ Reliability-psychtesting-psychassess.pdf
aishi1231
 
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
Videoconferencias UTPL
 
Item analysis
Item analysisItem analysis
Item analysis
Sarat Kumar Doley
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testing
Phuong Tran
 
Item analysis
Item analysisItem analysis
Item analysis
Melanio Florino
 
CHAPTER 6 Assessment of Learning 1
CHAPTER 6 Assessment of Learning 1CHAPTER 6 Assessment of Learning 1
CHAPTER 6 Assessment of Learning 1
FriasKentOmer
 
Adapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docxAdapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docx
nettletondevon
 
Administering,scoring and reporting a test ppt
Administering,scoring and reporting a test pptAdministering,scoring and reporting a test ppt
Administering,scoring and reporting a test ppt
Manali Solanki
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
cyrilcoscos
 
Aligning tests to standards
Aligning tests to standardsAligning tests to standards
Aligning tests to standards
Fariba Chamani
 
Establishing Validity-and-Reliability-Test ppt.pptx
Establishing Validity-and-Reliability-Test ppt.pptxEstablishing Validity-and-Reliability-Test ppt.pptx
Establishing Validity-and-Reliability-Test ppt.pptx
RayLorenzOrtega
 
Fulcher standardized testing
Fulcher standardized testingFulcher standardized testing
Fulcher standardized testing
Melikarj
 
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...
yanuarrizal6
 
Measurement and instrumentaion
Measurement and instrumentaionMeasurement and instrumentaion
Measurement and instrumentaion
ahmedabbas1121
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
Louzel Linejan
 
MCQ test item analysis
MCQ test item analysisMCQ test item analysis
MCQ test item analysis
Soha Rashed
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)
Tahere Bakhshi
 
educatiinar.pptx
educatiinar.pptxeducatiinar.pptx
educatiinar.pptx
NithuNithu7
 
3-Psychometric-Mrs-Caigas.pptxjjjjjjjjjj
3-Psychometric-Mrs-Caigas.pptxjjjjjjjjjj3-Psychometric-Mrs-Caigas.pptxjjjjjjjjjj
3-Psychometric-Mrs-Caigas.pptxjjjjjjjjjj
relucioglaiza5
 
UNIT 6_ Reliability-psychtesting-psychassess.pdf
UNIT 6_ Reliability-psychtesting-psychassess.pdfUNIT 6_ Reliability-psychtesting-psychassess.pdf
UNIT 6_ Reliability-psychtesting-psychassess.pdf
aishi1231
 
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
Videoconferencias UTPL
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testing
Phuong Tran
 
CHAPTER 6 Assessment of Learning 1
CHAPTER 6 Assessment of Learning 1CHAPTER 6 Assessment of Learning 1
CHAPTER 6 Assessment of Learning 1
FriasKentOmer
 
Adapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docxAdapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docx
nettletondevon
 
Administering,scoring and reporting a test ppt
Administering,scoring and reporting a test pptAdministering,scoring and reporting a test ppt
Administering,scoring and reporting a test ppt
Manali Solanki
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
cyrilcoscos
 
Aligning tests to standards
Aligning tests to standardsAligning tests to standards
Aligning tests to standards
Fariba Chamani
 
Establishing Validity-and-Reliability-Test ppt.pptx
Establishing Validity-and-Reliability-Test ppt.pptxEstablishing Validity-and-Reliability-Test ppt.pptx
Establishing Validity-and-Reliability-Test ppt.pptx
RayLorenzOrtega
 
Fulcher standardized testing
Fulcher standardized testingFulcher standardized testing
Fulcher standardized testing
Melikarj
 
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...
4ESTABLISHING_TEST_RELIABILITY.pptx;filename= UTF-8''4ESTABLISHING TEST RELIA...
yanuarrizal6
 
Ad

More from Mahsa Farahanynia (11)

Situational syllabi
Situational syllabiSituational syllabi
Situational syllabi
Mahsa Farahanynia
 
The acquisition of cultural competence an ethnographic framework for cultural...
The acquisition of cultural competence an ethnographic framework for cultural...The acquisition of cultural competence an ethnographic framework for cultural...
The acquisition of cultural competence an ethnographic framework for cultural...
Mahsa Farahanynia
 
Culture and nonverbal communication
Culture and nonverbal communicationCulture and nonverbal communication
Culture and nonverbal communication
Mahsa Farahanynia
 
Input and Interaction in second language learning
Input and Interaction in second language learningInput and Interaction in second language learning
Input and Interaction in second language learning
Mahsa Farahanynia
 
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Mahsa Farahanynia
 
Mixed between-within groups ANOVA
Mixed between-within groups ANOVAMixed between-within groups ANOVA
Mixed between-within groups ANOVA
Mahsa Farahanynia
 
Cognitive Approaches to Learning:Socio-cultural theory by Lev Vygotsky
Cognitive Approaches to Learning:Socio-cultural theory by Lev Vygotsky  Cognitive Approaches to Learning:Socio-cultural theory by Lev Vygotsky
Cognitive Approaches to Learning:Socio-cultural theory by Lev Vygotsky
Mahsa Farahanynia
 
Uses of language by Brown 1990
Uses of language by Brown 1990Uses of language by Brown 1990
Uses of language by Brown 1990
Mahsa Farahanynia
 
Enhancing fairness through a social contract
Enhancing fairness through a social contractEnhancing fairness through a social contract
Enhancing fairness through a social contract
Mahsa Farahanynia
 
Standards based assessment
Standards based assessmentStandards based assessment
Standards based assessment
Mahsa Farahanynia
 
Aptitude
AptitudeAptitude
Aptitude
Mahsa Farahanynia
 
The acquisition of cultural competence an ethnographic framework for cultural...
The acquisition of cultural competence an ethnographic framework for cultural...The acquisition of cultural competence an ethnographic framework for cultural...
The acquisition of cultural competence an ethnographic framework for cultural...
Mahsa Farahanynia
 
Culture and nonverbal communication
Culture and nonverbal communicationCulture and nonverbal communication
Culture and nonverbal communication
Mahsa Farahanynia
 
Input and Interaction in second language learning
Input and Interaction in second language learningInput and Interaction in second language learning
Input and Interaction in second language learning
Mahsa Farahanynia
 
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Study quality in quantitative l2 research (1990–2010) a methodological synthe...
Mahsa Farahanynia
 
Mixed between-within groups ANOVA
Mixed between-within groups ANOVAMixed between-within groups ANOVA
Mixed between-within groups ANOVA
Mahsa Farahanynia
 
Cognitive Approaches to Learning:Socio-cultural theory by Lev Vygotsky
Cognitive Approaches to Learning:Socio-cultural theory by Lev Vygotsky  Cognitive Approaches to Learning:Socio-cultural theory by Lev Vygotsky
Cognitive Approaches to Learning:Socio-cultural theory by Lev Vygotsky
Mahsa Farahanynia
 
Uses of language by Brown 1990
Uses of language by Brown 1990Uses of language by Brown 1990
Uses of language by Brown 1990
Mahsa Farahanynia
 
Enhancing fairness through a social contract
Enhancing fairness through a social contractEnhancing fairness through a social contract
Enhancing fairness through a social contract
Mahsa Farahanynia
 
Ad

Recently uploaded (20)

Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
How to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo SlidesHow to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo Slides
Celine George
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
114P_English.pdf114P_English.pdf114P_English.pdf
114P_English.pdf114P_English.pdf114P_English.pdf114P_English.pdf114P_English.pdf114P_English.pdf
114P_English.pdf114P_English.pdf114P_English.pdf
paulinelee52
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
PUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for HealthPUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for Health
JonathanHallett4
 
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFAMCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
Dr. Nasir Mustafa
 
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit..."Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
AlionaBujoreanu
 
materi 3D Augmented Reality dengan assemblr
materi 3D Augmented Reality dengan assemblrmateri 3D Augmented Reality dengan assemblr
materi 3D Augmented Reality dengan assemblr
fatikhatunnajikhah1
 
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docxPeer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
19lburrell
 
Conditions for Boltzmann Law – Biophysics Lecture Slide
Conditions for Boltzmann Law – Biophysics Lecture SlideConditions for Boltzmann Law – Biophysics Lecture Slide
Conditions for Boltzmann Law – Biophysics Lecture Slide
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
Search Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo SlidesSearch Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo Slides
Celine George
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
How to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo SlidesHow to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo Slides
Celine George
 
114P_English.pdf114P_English.pdf114P_English.pdf
114P_English.pdf114P_English.pdf114P_English.pdf114P_English.pdf114P_English.pdf114P_English.pdf
114P_English.pdf114P_English.pdf114P_English.pdf
paulinelee52
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
PUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for HealthPUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for Health
JonathanHallett4
 
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFAMCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
Dr. Nasir Mustafa
 
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit..."Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
AlionaBujoreanu
 
materi 3D Augmented Reality dengan assemblr
materi 3D Augmented Reality dengan assemblrmateri 3D Augmented Reality dengan assemblr
materi 3D Augmented Reality dengan assemblr
fatikhatunnajikhah1
 
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docxPeer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
Peer Assessment_ Unit 2 Skills Development for Live Performance - for Libby.docx
19lburrell
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
MCQ PHYSIOLOGY II (DR. NASIR MUSTAFA) MCQS)
Dr. Nasir Mustafa
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
Search Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo SlidesSearch Matching Applicants in Odoo 18 - Odoo Slides
Search Matching Applicants in Odoo 18 - Odoo Slides
Celine George
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 

Practical Language Testing by Fulcher (2010)

  • 2. Two paradigms in educational measurement and language testing 1) Norm-referenced testing: The meaning of the score on a test is derived from the position of an individual in relation to the group. It discriminate between test takers and separates them out (i.e., distribute) very effectively. Decision making with norm-referenced tests involves value judgments about the meaning of scores in terms of the intended effect of the test.
  • 3. Two paradigms in educational measurement and language testing (Cont.)  Criterion-referenced testing: The aim is to make a decision about whether an individual test taker has achieved a pre- specified criterion, or standard, that is required for a particular decision context.
  • 4. What is a standardized test?  A standardized test is a form of NRT that 1) requires all test takers to answer the same questions, or a selection of questions from common bank of questions, in the same way; 2) is scored in a “standard” or consistent manner, which makes it possible to compare the relative performance of individual students or groups of students.  The term is primarily associated with large-scale tests administered to large populations of students
  • 5. Why testing is viewed as a ‘science’  The early scientific use of tests initiated by the introduction of statistical analysis in testing area during First World War  Greenwood (1919): “When you can measure what you are speaking about and express it in numbers, you know something about it, but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind” (p. 186)  Fulcher (2010): “tests, like scientific instruments, provide the means by which we can observe and measure consistencies in human ability”.
  • 6. Why testing is viewed as a ‘science’ (Cont.) Shohamy (2001): “Testing is perceived as a scientific discipline because it is experimental, statistical and uses numbers. It therefore enjoys the prestige granted to science and is viewed as objective, fair, true and trustworthy” (p. 21) which are key features of the “power of testing”. Lipman (1922): Strong trait theory is untenable. In fact, most of the traits or constructs that we work with are extremely difficult to define, and if we are not able to define them, measurement is even more problematic.
  • 7. The curve and score meaning  In NRT, the meaning of a score is directly related to its place in the curve of the distribution (or a bell curve) from which it is drawn. -3SD -2SD -1SD 0 1SD 2SD 3SD
  • 8. Central tendency Central tendency: The most typical behavior of the group  Mode: Scores occurs most frequently Bimodal with two peaks Trimodal with three peaks  Median: The point below which 50 percent of the scores fall and above which 50 percent fall.  Midpoint: The point halfway between the highest score and the lowest score on the test (high+low/2)  Mean: (The midpoint for NRT is the mean)
  • 9. Dispersion Dispersion: How the individual performances vary from the central tendency.  Range: The number of points between the highest score and the lowest one plus 1.  Standard deviation (SD): A sort of average of the differences of all scores from mean (the square root of the sum of the squared deviation scores, divided by N – 1). Deviation score: The score obtained from the subtraction of the mean from each of the individual scores ( ) (The mean of these scores is always zero).
  • 10. Dispersion (Cont.) SD formula: N-1 for sample N for population group SD is better than the range since it is the result of averaging process and lessen the effects of extreme scores not attributable to performance on the test. Variance: The squared value of SD
  • 11. Example Score Mean X-M (X-M)2 77 71 6 36 75 71 4 16 72 71 1 1 72 71 1 1 70 71 -1 1 65 71 -6 36 66 71 -5 25 Central tendency Dispersion Mode =72 Median =72 Midpoint =77+66/2=71.5 Mean = 77+75+72+72+70+65+66/7=71 Range = 77-66+1=12 SD =√(36+16+1+1+1+36+25)2/7= 4 Variance = s2 = 42 =16
  • 12. Example (cont.) (with raw score) In the normal curve, mean, mode, midpoint, and median are all the same. Score 76: 50% +34.13% = 84.13% (Percentile: The total percentage of students who scored equal to or below a given point in normal distribution) 60 64 68 72 76 80 84
  • 13. Standardized tests: a) z scores  A z-score: The raw score expressed in standard deviations.  Z score formula: The mean of z scores is always zero. The SD of z scores is 1. 3 ≤ z scores ≤ +3 -.5 Z= 70-72/4= -.5sd
  • 14. Standardized tests: a) z scores (Cont.)  Three problems of z scores: 1. They are relatively small, ranging from -3 to +3. 2. They can turn out to be negative and positive. 3. They turn out to include several decimal places. Reporting scores in form of z scores can be demotivating for the students. To overcome its problems, z scores should be transformed to some standardized scales
  • 15. Standardized tests: b)T scores Main formula of standardized scales (linear transformation of z scores):  T score formula: T = 10z +50 Mean = 50 SD = 10 range = 10-90  Example: raw score = 70 z score = -0.5 T score = 10 * -0.5 + 50 = 45
  • 16. Standardized tests: c) CEEB scores  CEEB (College Entrance Examination Board) is the standardised Gaokao examination and used for SAT, GRE, TOEFL, etc.  CEEB formula: CEEB = 100z +500 Mean = 500 SD = 100 range = 100-900  Example: raw score = 70 z score = -0.5 CEEB score = 100 * -0.5 + 500 = 450
  • 17. Item analysis  Item facility/item easiness/ item difficulty/facility index: The statistics used to examine the percentage of students who correctly answer a given item. IF formula = Ncorrect /Ntotal  Item discrimination (ID): The degree to which an item separates the students who performed well from those who did poorly on the test as a whole. ID formula = IF upper – IF lower Range Acceptable Best 0 ≤ IF ≤ 1 .3 ≤ IF ≤ .7 IF = .5 -1 ≤ ID ≤ +1 .4 ≤ ID ID = 1
  • 18. Reliability  Reliability: Consistency of scores under different circumstances.  Reliability differs from scorability  Reliability indicates the degree to which the observed score and true score match.  The observed score (X) is made up of the ‘true’ score of an individual’s ability on what the test measures (T), plus the error (E) that can come from a variety of sources.
  • 19. Threatens to reliability (Lado) 1. Variation in conditions of administration: Fluctuation of scores over time, in different places or under slightly different conditions (such as a different room, or with a different invigilator) 2. The quality of the test itself: Problems with sampling what language to test – as we can’t test everything in a single test. If a test consists of items that test very different things, reliability is also reduced. This is because in standardised tests any group of items from which responses are added together to create a single score are assumed to test the same ability, skill or knowledge. The technical term for this is item homogeneity. 3. Variability in scoring: If humans are scoring multiple-choice items they may become fatigued and make mistakes, or transfer marks inaccurately from scripts to computer records. However, there is more room for variation when humans are asked to make judgments.
  • 20. Calculating reliability  The method we use to calculate reliability depends upon what kind of error we wish to focus on.  The notion of correlation is at the very center of the notion of reliability.  A reliability coefficient is calculated that ranges from 0 (randomness) to 1, and no test is ‘perfectly’ reliable. There is always error of measurement.
  • 21. Calculating reliability 1. Variation in conditions of administration  The statistical technique of correlation used is Pearson Product Moment Correlation.  Assumptions: 1. Interval scale, 2. Independence: each pair of scores is independent from all other pairs, 3. Normally distributed, 4. Linearity  -1 ≤ r ≤ +1: 1. –1 : There is an inverse relationship between the scores 2. 0 : There is no relation between the two sets of scores 3. 1 : The scores are exactly the same on both administrations of the test. The closer the result is to 1, the more test–retest reliability we have
  • 22. Coefficient of determination  Statistical significance is a necessary precondition for a meaningful correlation but not sufficient in itself.  Coefficient of determination is simply correlation coefficient squared (r2), and represents the proportion of overlapping variance between two sets of scores (i.e., as the score on one test increases, so it increases proportionally on the other test) 0 ≤r2≤ 60 low (one third overlapping variance) 60 ≤r2≤ 80 moderate (one third to two third overlapping variance) 80 ≤r2≤100 high (two third to complete overlapping variance)
  • 23. 2. The quality of the test itself (internal consistency)  Reliability is addressed in terms of homogeneity of items (they must all be highly correlated).  Requirements: 1. Parralelism: Two tests should be parallel (with same means, variances, same correlation with another well-established measure of that construct) 2. Independence: The response to any specific item must be independent of the response to any other item; put another way, the test taker should not get one item correct because they have got some other item correct. The technical term for this is the stochastic independence of items.  Statistics used: Split-half methods and methods based on item variance
  • 24. Split-half method  Main procedure: Split the test into two equal halves, calculate the correlation between the two halves. 1. Spearman-Brown split-half reliability estimate: Since reliability is directly related to the length of a test, correct the correlation for length via Spearman Brown correction formula (Pallarellism and independence are required) 2. Guttman split-half reliability estimate (Pallarellism is not required but independence is required)
  • 25. Methods based on item variances  Estimates based on item variances (Pallarellism and independence are required) 1. Cronbach’s Coefficient alpha for dichotomously scored items (scored ‘right’ or ‘wrong’) 2. K-R20 /K-R21
  • 26. 3. Variability in scoring (grading and marking)  Whatever rater is making the judgment should be a matter of indifference to the test taker  Inter-rater reliability: Our concern is with variation between raters because some raters are more lenient than others, or some raters may rate some test takers higher than others (perhaps because they are familiar with the first language and are more sympathetic to errors).  Intra-rater reliability: Our concern is with variation within one rater over time.  Statistics: Cronbach’s alpha for partial credit judgments
  • 27. Standard Error of Measurement (SEM)  One of the most important tools in standardised testing is the standard error of measurement.  While the reliability coefficient tells us how much error there might be in the measurement, it is the standard error of measurement that tells us what this might mean for a specific observed score more informative for interpreting the practical implication of reliability  SEM formula:  Confidence interval: SEM gives us a confidence interval around an observed test score, which tells us by how much the true score may be above or below the observed score that the test taker has actually got on our test.
  • 28. Example Example: SD= 4 r = .64 SEM =4 √1 - .64= 2.4 Raw score = 74 SEM = 2.4 68% (between +1SEM and –1SEM) 71.6 ≤true score ≤76.4 95% (between +2SEM and –2SEM) 69.2 ≤true score ≤ 78.8 99% (between +3SEM and –3SEM) 66.8 ≤true score ≤81.2 100% (between +4SEM and –4SEM) 66.8 ≤true score ≤81.2
  • 29. Reliability and test length  In standardised tests with many items, each item provides a piece of information about the ability of the test taker, therefore, as we increase the number of items, the reliability will increase.  Formula for looking at the relationship between reliability and test length A: The proportion by which you would have to lengthen the test to get the desired reliability rAA : The desired reliability r11 : The reliability of the current test.  However, the best way to increase reliability is to produce better items
  • 30. Relationships with other measures  One key part of standardised testing: The comparison of two measures of the same construct.  If two different measures were highly correlated this provided evidence of validity. This aspect of external validity is criterion-related evidence, or evidence that shows one test is highly correlated with a criterion that is already known to be a valid measure of its construct (called evidence for convergent validity)  Measurement as understood in Classical Test Theory
  翻译: