Part 1 Inter-rater reliabilityInter-rater reliability measures the extent to which different people or raters assessing thesame variable agree. When researchers collect data, they assign categories, ratings, or scores tothe variables under study (McDonald et al., 2019). People are generally subjective. Thus, theirjudgment and perceptions of the same phenomena may differ. The aim of reliability in […]
To start, you canPart 1
Inter-rater reliability
Inter-rater reliability measures the extent to which different people or raters assessing the
same variable agree. When researchers collect data, they assign categories, ratings, or scores to
the variables under study (McDonald et al., 2019). People are generally subjective. Thus, their
judgment and perceptions of the same phenomena may differ. The aim of reliability in research
is to minimize subjectivity to ensure that other researchers can easily replicate the study. When
measuring interrater reliability, a correlation is calculated for different sets of results provided by
different researchers.
For example, in a study, a total of 53 vocabulary rating pairs were evaluated (Stolarova et
al., 2014). The researchers first assessed inter-rater reliability across and within subgroups. They
used the inter-class correlation coefficient (ICC). The study involved four researchers. Their
observations were analyzed, and a correlation was conducted to minimize the subjectivity of
individual raters.
Test re-test reliability/Repeated measures reliability
Test-retest reliability is also known as repeated measures reliability because it involves
administering the same test twice to a group of individuals. It is a measure of reliability where a
researcher aims to establish the consistency of results when one test is repeated on the same
group of individuals or sample. A good test-retest reliability also demonstrates that a certain test
has internal validity. The consistency of the results over a period of time depicts the ability of the
3
test to yield similar results over time hence its stability. According to Noble et al. (2019), the
test-retest reliability of a test is influenced by the duration of time taken to do a retest as well as
the dynamic nature of the construct that the researcher is measuring.
For example, researchers in a study wanted to determine the risk of falls and its
relationship with older adults living with type 2 diabetes. The researchers used the Risk
Perception Questionnaire (RPQ). They were interested in assessing the reliability of the RPQ in
measuring the risk of falls due to complications related to diabetes when administered via phone.
The study involved a group of 30 community-dwelling older adults who have been clinically
diagnosed with type 2 diabetes (Gravesande et al., 2019). They were also 55 years and older.
During the first test, the researchers measured physical activity, perceived risk of falling, and
fear of falling. At least two days later, they redid the test and measured all three constructs. They
then conducted the test the third time after about six weeks, and the last test was repeated two
days after the third test (Gravesande et al., 2019). The aim is to measure the test-retest reliability
of the RPQ. They found that the test had higher test reliability when conducted one in one as
opposed to by phone.
Face validity
Face validity refers to the extent to which a test appears at face value or subjectively to
measure the construct that t purports to measure (Litvak et al., 2019). It simply refers to whether
a test ‘looks like’ t measures what it claims to measure. It is the relevance of a test to the
participants. It is a rather subjective measure of validity, and while it is important, it cannot be
used alone to make a conclusion on the validity or lack of it in a test. This is because it is
4
determined based on whether a test looks like it can measure what it is supposed to measure as
opposed to whether it has actually been proven to work.
For example, in the study by Gravesande et al. (2019), I would determine the face
validity of the RPQ by looking at it and seeing whether it actually tests fear of falling,
perceived risk of falling, and physical activity. I would at the terms used in the test. It
measures the risk of falling, so I would check whether it asks about the agility of the
participants, their fear or anxiety when walking on uneven terrain or when going up a
staircase, and their general level of fitness. If the RPQ uses these terms, then I would consider
it to have high face validity. If it uses completely different terms that are not related to the
perceived fear of falling, physical activity, or fear of falling, then I would consider such a test
to have low face validity.
Predictive validity
Predictive validity refers to the degree to which scores of a test provide an accurate
prediction of a criterion measure (Rogge et al., 2019). For example, the extent to which test
scores for college admissions accurately predict college GPA (grade point average). The validity
is determined by considering the correlation between target behavior and data from an
assessment. This is established by calculating the correlation coefficient between the test scores
and target behavior. The correlation between test scores and target behavior enables researchers,
policymakers, or scholars to predict the probability of academic success in the future. For
instance, a student who scores highly on a college admission test is likely to achieve a high GPA.
In this case, their results in the admission test are used to predict their performance at the end of
college.
5
For example, I would determine the predictive validity of an interview test by checking
the extent to which the results predict a candidate’s future performance in the workplace. An
interview test that contains questions relating to background, past situation, and behavior, as well
as job knowledge, would have a high predictive validity if a candidate scores highly and they
also perform exemplary well after getting the job.
Concurrent validity
Concurrent validity is the degree to which results of a measure correlate with other results
of another already established measure of the same construct taken within the same time frame
(Webber et al., 2020). The construct may be similar to or related to the underlying construct
being measured. It refers to the extent to which scores of a measure correlate to scores of another
measure whose validity has already been established. Thus, in determining the validity of a new
test, the scores are assessed to see the extent to which they correlate with scores often already
existing tests.
For example, I would determine the validity of a new scale used to measure validity by
correlating test scores from participants with test scores from the same participants but using a
different scale whose validity has already been established. Thus, I would administer the new
scale to a group of participants, then score the test. I would then administer the already existing
scale to the same group and score the test. Then, I would conduct a correlation of the two sets of
scores. If a participant’s scores differ in the two tests, then the new test would have low
concurrent validity.
Part 2
6
In the study, Hammoud (2020) used the Motivated Strategies for Learning Questionnaire
(MSLQ) to measure motivation as a construct in the study. The researcher was interested in
understanding students’ motivation in mathematics and self-concept in institutions of higher
learning. First of all, the MSLQ had fac validity. The instrument contained terms such as
academic self-efficacy, learning goals orientation, and intrinsic motivation. Thus, at face value, it
did appear to be measuring motivation. Further, the researcher ensured the validity and reliability
of the instrument by looking at similar studies that have used the instrument. The researcher cites
several studies that confirm that the MSLQ is a highly reliable and valid instrument.
Hammoud (2020) also used the Intrinsic Motivation Inventory (IMI), and just like in the
case of MSLQ, the instrument has been proven to be valid and reliable in multiple past studies.
The instrument was tailored to meet the unique needs of the study. The researcher adopted some
of the questions from the original instrument, whose validity and reliability had already been
established. Also, by comparing results from other studies and those of the new study, Hammoud
ensured that the instrument had concurrent validity. Hammoud (2020) also used the Self-
Description Questionnaire (SDQ). The researcher adopted the instrument as it was used by the
developer. Its level of reliability is already established as median alpha=0.89 (Hammoud, 2020).
Research studies have also proven that the instrument has a very high level of both validity and
reliability. Hammoud ensured the validity and reliability of the instruments used in the study by
adopting instruments whose validity and reliability have already been established or by adapting
instruments and then ascertaining their validity by correlating them with the originally developed
ones or with results by previous scholars.
In a different study, Andrade and Rodríguez (20180 conducted a study in which they
used the Profile of Mood States questionnaire to collect data from a sample of 700 participants.
7
The researchers wanted to establish changes in mood over time. The researchers used Cronbach’s
alpha statistic to estimate the reliability of the research instrument. The instrument was found to
be highly reliable, with values that ranged between 0.77 and 0.87. further, the researchers
established the validity of the instrument by comparing results from different previous studies.
They adopted a tool that has been used in the field for a long time. Many researchers have
adopted it in their studies; it was, therefore, easy to establish the level of validity by comparing
results and conclusions from previous studies.
8
References
Andrade Fernández, E. M., & Rodríguez Salgado, D. (2018). Factor structure of mood over time
frames and circumstances of measurement: Two studies on the Profile of Mood States
questionnaire.
Gravesande, J., Richardson, J., Griffith, L., & Scott, F. (2019). Test-retest reliability, internal
consistency, construct validity and factor structure of a falls risk perception
questionnaire in older adults with type 2 diabetes mellitus: a prospective cohort
study. Archives of physiotherapy, 9(1), 1-11.
Hammoudi, M. M. (2020). Measurement of students’ mathematics motivation and self-concept at
institutions of higher education: evidence of reliability and validity. International journal
of mathematical education in science and technology, 51(1), 63-86.
Litvak, V., Jafarian, A., Zeidman, P., Tibon, R., Henson, R. N., & Friston, K. (2019, October).
There’s no such thing as a ‘true model: the challenge of assessing face validity. In 2019
IEEE International Conference on Systems, Man and Cybernetics (SMC) (pp. 4403-
4408). IEEE.
McDonald, N., Schoenebeck, S., & Forte, A. (2019). Reliability and inter-rater reliability in
qualitative research: Norms and guidelines for CSCW and HCI practice. Proceedings
of the ACM on Human-Computer Interaction, 3(CSCW), 1-23.
Noble, S., Scheinost, D., & Constable, R. T. (2019). A decade of test-retest reliability of
functional connectivity: A systematic review and meta-analysis. Neuroimage, 203,
116157.
9
Rogge, R. D., Daks, J. S., Dubler, B. A., & Saint, K. J. (2019). It’s all about the process:
Examining the convergent validity, conceptual coverage, unique predictive validity,
and clinical utility of ACT process measures. Journal of Contextual Behavioral
Science, 14, 90-102.
Stolarova, M., Wolf, C., Rinker, T., & Brielmann, A. (2014). How to assess and compare
inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of
mother-father and parent-teacher expressive vocabulary rating pairs. Frontiers in
psychology, 5, 509.
Webber, T. A., Critchfield, E. A., & Soble, J. R. (2020). Convergent, discriminant, and
concurrent validity of nonmemory-based performance validity
tests. Assessment, 27(7), 1399-1415.
Select your paper details and see how much our professional writing services will cost.
Our custom human-written papers from top essay writers are always free from plagiarism.
Your data and payment info stay secured every time you get our help from an essay writer.
Your money is safe with us. If your plans change, you can get it sent back to your card.
We offer more than just hand-crafted papers customized for you. Here are more of our greatest perks.