Complete Reliability of Psychological Testing

Reliability in psychology refers to how consistent and dependable a measurement or test is in producing the same results under consistent conditions.

It's like having a reliable friend you can count on; in psychology, we want our tests to be dependable too. If a test is reliable, it means we can trust its results to be accurate and consistent over time.

This reliability helps psychologists make better decisions based on the data they collect. Whether it's measuring intelligence, personality traits, or the effectiveness of a therapy, reliability ensures that the results are trustworthy and not just a matter of chance.

Reliability in Psychology Definition

(Anne Anastasi)

"Reliability of a test refers to consistency of scores obtained by the same person on same test in different administrations and different occasions".

Concept of Reliability in Psychology

Reliability in psychological testing means that the test consistently produces similar results under consistent conditions.

In short, it is the repeatability of your measurement. A measure is considered reliable if a person's score on the same test given twice is similar.

It is important to remember that reliability is not measured, it is estimated.

Types of Reliability

Test Retest Reliability in Psychology

(Anne Anastasi)

"In test retest method reliability coefficient is simply the correlation between the scores obtained by the same person on two administrations of the same test".

Test retest is the more conservative method to estimate reliability. Simply put, the idea behind test/retest is that you should get the same score on test 1 as you do on test 2.

The three main components to test retest reliability in psychology method are as follows:

Implement your measurement instrument at two separate times for each subject.
Compute the correlation between the two separate measurements.
Assume there is no change in the underlying condition (or trait you are trying to measure) between test 1 and test 2.

Factor Effecting Test Retest Reliability in Psychology

Interval
Experience
Errors due to conditions of test takers
Errors due to uncontrolled test conditions

Parallel or Alternative Form of Reliability - Alternate Form Reliability in Psychology

In the parallel form procedure, two tests with equivalent content but different items are administered to the same examiner. These tests are matched in difficulty level but feature distinct questions. (Aiken)

Formulation of Second Form

The test should contain the same number of items and items should be express in the same form and should cover the same type of content.

The range and level of difficulty of the items should also be equal. Instructions , time limits, assertation examples formed should be equal. (Anne Anastasi Psy Testing)

Internal Reliability in Psychology

Internal consistency reliability in psychology - Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept.

For instance, you might create two sets of three questions assessing class participation. After gathering responses, you can analyze the correlation between these question sets to gauge the reliability of your measurement.

This correlation indicates whether your instrument consistently captures the concept of class participation.

Split Half Reliability in Psychology

Cronbach's alpha splits all the questions on your instrument every possible way and computes correlation values for them all (we use a computer program for this part).

In the end, your computer output generates one number for Cronbach's alpha. Similar to a correlation coefficient, the closer it is to one, the higher the reliability estimate of your instrument.

Cronbach's alpha is a less conservative estimate of reliability than test/retest.

I. Split-Half Reliability Steps

Divide the test into equivalent halves.
Compute a Pearson r between scores on the two halves of the test.
Adjust the half-test reliability using the Spear-man-Brown formula

Spear-man-Brown Formula used to estimate how much a test's reliability will increase when the length of the test is increased by adding parallel items. Where L = the number of times longer the new test will be estimate of SEM for different test lengths can be obtained using Cronbach's Alpha (a)

If we compute one split-half reliability and then randomly divide the items into another set of split halves and recompute, we can keep doing this until we have computed all possible split half estimates of reliability.

Cronbach's Alpha is the geek equivalent to the average of all possible split-half estimates (although that's not how we do it.)

In saying we compute all possible split-half estimates, it doesn't mean that each time we go and actually measure a new sample! Instead, we calculate all split-half estimates from the same sample.

Because we measured all of our sample on each of the six items, all we have to do is have the computer analyze the random subsets of items and compute the resulting correlations.

The figure shows several of the split-half estimates for our six item example and lists them as SH with a subscript. Just keep in mind that although Cronbach's Alpha is equivalent to the average of all possible split half correlations we would never actually calculate it that way.

Rulon's Split-half Method

Split test into two halves and create half-test scores
Compute the difference between half-test scores
Compute the variances of differences and total scores
reliability estimate = Split-half testing

Split-half testing measures consistency by:

Dividing the test into two (usually a mid-point, odd/even numbers, random or other method).

Administering them as separate tests.
Compare the results from each half. A problem with this is that the resultant tests are shorter and can hence lose reliability. Split-half is thus better with tests that are rather long in the first place.

Use Spear-man-Brown’s formula to correct problems of shortness, enabling correlation as if each part were full length:

r = (2rhh)/(1 + rhh)
(Where rhh is correlation between two halves)

II. Kuder-Richardson Reliability or Coefficient Alpha

The appropriate use of this method requires that all items in the test should be psychologically homogeneous that is every item should measure the same factor or a combination of factors in the same proportion every other item does.

(Frank.S.Freeman) The Kuder-Richardson reliability or coefficient alpha is relatively simple to do, being based on one administration of the test. It assesses inter-item consistency of test by looking at two error measures:

Adequacy of content sampling
Heterogeneity of domain being sampled It assumes reliable tests contain more variance and are thus more discriminating. Higher heterogeneity leads to lower inter-item consistency.

For right/wrong scores that are non-dichotomous items:

Formula No 20

Its uses where the problem of difficulty level present.

R=k /k –1[1–Σpi(1-pind)/σ2)]

Where K is number of items, pi is item variance, pind is test variance.
Pi= Probability of score.
Pind= Proportion of person.
K=Total number of items.
σ/SD=Standard deviation of the total score of test.

Formula No 21
Its uses where only two options are available.

R=k/k-1[1-mean(k-mean)/σ2]

When three or more options are available in test then uses this formula as under.

R=k /k –1[1–Σ(Si2/St2)] Equivalence of results (parallel form)

Seeks reliability through equivalence between two versions of the same test, comparing results from each version of test (like split-half).

It is better than test-retest as it can be done the same day (reducing variation).

Parallel versions are useful in such situations as with graduates who may do the same test several times.
An adverse effect occurs where different groups score differently (potential racial, etc. bias).

This may require different versions of the same test – eg. MBTI for different countries.

Discussion

There are a number of procedural aspects that affect test reliability, including:

Test conditions,
Inconsistent administrative practices,
Variation in test marking,
Application of an inappropriate norm group,
Internal state of test-taker (tired, etc.)
Experience level of test-taker (eg. if taken test before). Aptitude Tests > Speed and Power Tests

There are at least 5000 aptitude tests on the market at the moment. The types of question you can expect will depend on which aptitudes and abilities that are needed in the job you are applying for.

Aptitude and ability tests are classified as maximum performance tests because they test what you can achieve when you are making maximum effort.

There are two different styles of maximum performance test; speed tests and power tests.

III. Speed and Power Tests

Speed Test
"A pure speed test is one in which individuals difference depends entirely on speed of performance". In a speed test the scope of the questions is limited and the methods you need to use to answer them is clear.

Taken individually, the questions appear relatively straightforward. Speed test are concerned with how many questions you can answer correctly in the allotted time.

For example:

139 + 235=

A) 372 B) 374 C) 376 D) 437

Power Test
"In power test the time is enough but the difficulty level of items so high that no one can solve all items at any cost". A power test on the other hand will present a smaller number of more complex questions.

The methods you need to use to answer these questions are not obvious, and working out how to answer the question is the difficult part. Once you have determined this, arriving at the correct answer is usually relatively straightforward.

In summary, speed tests contain more items than power tests although they have the same approximate time limit. Speed tests tend to be used in selection at the administrative and clerical level. Power tests tend to be used more at the graduate, professional or managerial level.

Although, this is not always the case, as speed tests do give an accurate indication of performance in power tests. In other words, if you do well in speed tests then you will do well in power tests as well.

These speed and power definitions apply only to maximum performance tests like aptitude and ability tests and not to personality tests.

Factors Effecting/Influencing Reliability
Length of the test
Characteristics of the population(If the population is more hatro-genious the reliability of of the test will reliable than homo-genious).
Characteristics of the test itself
Method itself
Range of age
Time interval

Complete Reliability of Psychological Testing | Reliability in Psych

Reliability in Psychology Definition

Concept of Reliability in Psychology