This is a metaphor I came up with when I was helping my master supervisor with his book 4 or 5 years ago. But now I still think it might be a good metaphor. Therefore I extracted the content that I wrote (reviewed, revised and edited by Prof Li) here.
There are different types of validity based on different classification criteria. In general, validity is divided into face validity, content validity, construct validity and criteria validity.
To illustrate these four concepts with a simple example, suppose you want to measure people’s English proficiency and create an “English proficiency scale” (commonly known as an “English test” 🙂 ) for your subjects to fill out. If the scale is all about math questions like “1+1=?” and not a single English word appears on the scale. Then a “first glance” at the scale tells you that it is not measuring English proficiency. If the scale does not “looks like” measuring English proficiency, it does not have face validity. A scale which has face validity means it should at least looks like measuring the variable you want to measure.
Content validity is the degree to which the measure reflects the completeness of the concept. Now, the English Language Proficiency Scale does not have the “1+1=? but only reading comprehension questions from the beginning to the end. According to the general “listening, speaking, reading, and writing” argument, to fully reflect a person’s English ability, listening, speaking, and writing skills may also be required, In order to fully reflect a person’s English ability, listening, speaking, and writing skills may also need to be examined. Therefore, if only reading ability is measured, it is obvious that the measurement of the concept is not complete and does not have good content validity. Therefore, if only reading ability is measured, it is obvious that the measurement of concepts is not complete and does not have good content validity.
Construct validity refers to the extent to which the measurement of a concept relates to the measurement of other concepts and is consistent with established theoretical expectations. The degree of consistency with established theoretical expectations. Assuming that research has shown a positive relationship between vocabulary and English proficiency, if you use your constructed If the results of the “English proficiency scale” that you have constructed are also positively correlated with the subjects’ vocabulary and are consistent with theoretical expectations, then the “English proficiency scale” can be considered to be a “positive” scale. If the results of your English proficiency scale are also positively correlated with the subjects’ vocabulary and meet theoretical expectations, then the English proficiency scale is considered to have the convergent validity of construct validity.
Criteria validity refers to the correlation between the results of a The validity of a concept is the degree to which the results of a measure are correlated with established criteria or important behaviors outside the measure. For example, there are many well-established measures of English proficiency. For example, there are many established measures of English proficiency, such as CET-4 and CET-6, TOEFL, IELTS, and so on. You can use the scores of these tests to validate the validity of the English proficiency scale you have developed. Because of the complexity of the concepts of construct and criterion validity the former can be differentiated into convergent validity and discriminant validity, and the latter can be subdivided into concurrent validity and predictive validity, which will not be elaborated here. The former is distinguished into convergent validity and discriminant validity, and the latter can be subdivided into concurrent validity and predictive validity. If you are interested, please refer to other books.
(Extracted from Li, W. et.al., (2020) 学位论文写作与学术规范 [Introduction to Thesis writing & Academic Norms] (2nd Edition). Beijing, China: Peking University Press.)
(Originally translated with http://www.DeepL.com/Translator (free version). Revised by Ai Pengya.)
依据不同的划分标准有不同的类型,通常来说,效度分为表面效度、内容效度、建构效度和效标效度。
举一个浅显的例子说明这四个概念,假设你想要测量人们的英语能力,于是创建了一份“英语能力量表”(俗称“出了一份英语试卷”)让被试填写。如果这份量表中都类似是“1+1=?”的题目,从头至尾都没有出现一个英文单词,那么“乍一看”这份量表就知道并不是在测量英语能力,也就是没有表面效度——表面效度衡量的是测量结果在“表面上”与我们想要测量的概念的吻合程度。内容效度考察的是测量所能反映概念的完整程度。现在这份“英语能力量表”中没有了“1+1=?”,但是从头到尾都只有阅读理解题目。按照一般“听说读写”的说法,要全面反映一个人的英语能力,可能还需要对听力、口语、写作能力的考察。因此,如果只测量阅读能力,显然对于概念的测量是不够完整的,也就不具备良好的内容效度。
建构效度是指对某个概念的测量结果与其他概念的测量结果之间的关系,与既有理论预期的一致性程度。假设已有研究表明词汇量与英语能力呈正相关关系,若用你构建的“英语能力量表”所测量得到的结果与被试的词汇量同样呈现正相关的关系,符合理论预期,那么就可认为这个“英语能力量表”具有建构效度中的聚合效度。效标效度是指对某个概念的测量结果与该测量外的既定标准或重要行为的相关程度。例如现有针对英语能力已有很多成熟的衡量标准,四、六级,托福,雅思等等,你就可以利用这些考试的分数来验证你所开发的“英语能力量表”的效标效度。由于建构效度和效标效度的概念较为复杂,前者又区分为聚合效度和区分效度,后者又可细化为同步效度和预测效度,这里不再展开阐述。
摘录自 李武 等 (2020)《学位论文写作与学术规范》 北京大学出版社
Leave a comment