Step 1 · Initial Discussion
The need for new forms of existing examinations or new exams is considered by the Examination Committee.
Step 2 · Review of Literature
A review of recent literature is conducted which may include journal articles, content covered by the most commonly used textbooks from the discipline, guidelines of state and other professional bodies, critical clinical activities as described by the discipline and curricula taken from programs around the country.
Step 3 · Establish Test Outline
A general test plan is developed based on the review of literature.
Step 4 · Validity of Test Plan
Content experts from the discipline review the test plan for face validity.
Step 5 · Assignment of Test Items
Test item writers with content expertise are identified. Submitted items are reviewed for cognitive level, formatting and grammar.
Step 6 · Formulation of Exam
The items are then formulated into an exam based on the test plan.
Step 7 · Alpha/Beta Testing of Items
Alpha testing is performed with students from the discipline. Changes in items may be made based on initial testing results. The exam is then beta tested as a whole. Data is collected on performance of each item and the exam as a whole. Normative findings may then be applied to the examination.
Step 8 · Annual Review of Examination Performance
Reliability
The reliability of an instrument is described by the accuracy, consistency, dependability and stability of response set (answers) generated by the items of the instrument. Reliability is defined by the errors of measurement contained within the instrument, assessing that the instrument will generate reproducible data. Reliability examines the quantification of random error. Reliability data is given as a correlation coefficient. A correlation coefficient will be greater than (>) zero (0) and less than (<) one (1).
There are four common tests for reliability: Cronbach’s alpha, split-half reliability, inter-rater reliability and the Kuder-Richardson 20 (KR-20). The KR-20 is most useful for dichotomous data (true/false; yes/no) in the response set.
There are generally accepted parameters established in the assessment of reliability. A satisfactory level of reliability is considered when the coefficient falls at the .70 level. Reliability coefficients above the .80 level are considered to be good. Coefficients determined to be above the .90 level are deemed excellent.
Statistical Review Procedures
Annual item analysis procedures are performed on response sets for all of the ERI examinations. Point biserial correlations are assessed on each test item. Items scoring with a negative point biserial are reviewed for accuracy, clarity, content and appropriateness for the clinical area of concern. Items are also reviewed for response rates (percentages) of student responses for each response option including distractors. Questions that indicate high levels of accurate responses are removed and replaced as necessary by the review staff. Test items on examinations are reviewed for spelling, grammar and context accuracy. Changes in items are made as needed. Typically, no more than ten percent of an examination is revised in a given year.
It is the ERI policy that any examination determined to have a Cronbach’s alpha reliability coefficient of less than .80 will receive careful review and scrutiny.
Validity
Validity of an instrument or test is the degree to which it measures what it is supposed to measure of the concept or construct or concern. Forms of validity include content, construct and criterion-related. Content validity may be established by face validity. This method involves the use of experts for development of instrument items or appropriate review of items to ensure accuracy and relevance of each item to the instrument as a whole. Content and construct validity may be supported by sampling validity. For sampling validity, an expert panel establishes how well the instrument samples the content area under question. Criterion-related validity is supported by how the instrument scores relate to an established criterion. Criterion-related validity may also be predictive in nature.
ERI uses multiple resources for test development including commonly used nursing textbooks, current literature and, very importantly, the current NCLEX® (2004) test plan for RN and PN/VN (2005) licensure. Validity of exit assessments is annually established by comparison of exit exam results with NCLEX® results.
Content Validity
Content validity of ERI testing is established by test development and review procedures. Nursing content experts for each content area are responsible for surveying recent literature and textbooks for item development and/or review. The experts also review guidelines of state and professional licensing bodies, critical activity studies done by the National Council of State Boards of Nursing, textbooks of American schools of nursing and nursing content reflected by curricula of associate degree, diploma and baccalaureate programs (for RN testing) and practical/vocational schools of nursing (for PN/VN testing). A blueprint or general test plan is then generated based on the findings from these sources. The test plan may be reviewed and revised as indicated to reflect current educational practices. Item writers are then selected based on content area expertise, creation of new items and/or review of current test items.
Annual review of point biserial correlations of each test and all items guides the review of test items for additional support of content and construct validity. Items with correlations less than the desired level (.15) are reviewed for content accuracy and relevance. Items are then corrected or removed from the examination.
All new test items and examinations by ERI are alpha and then beta tested on students on a national basis. All levels of nursing education are considered and used for this preliminary testing of items. These procedures allow for review and/or correction of unclear items.
Question Clarity
The questions on ERI examinations are not designed to be confusing or tricky for the test-taker. The intent is to make the questions as clear as possible. Items should discriminate between those who know the material and those who do not. The ongoing review and editing of test items focuses on removing the possible sources of confusion while measuring the intended nursing content. Items are also reviewed for any potential bias that may have been introduced into each item. Each question is designed to discriminate the students’ ability to identify nursing process and content. Many items, designed to test essential knowledge, also test critical thinking, application and clinical decision-making ability.
Construct Validity
In the annual review of exams and test items, content experts will review all ERI test and test responses from the national data base. The current NCLEX® test plans RN (2004) and PN (2005) will guide the test plan. Content areas identified in the test plans will be addressed with the Comprehensive Achievement Profile (CAP) examinations.
Predictive Validity
ERI has an ongoing program of research dedicated to the establishment of predictive validity of the exit examinations (RN Assessment, PreRN and LPN Assessment). Annually, NCLEX® pass/fail data is collected from client schools and the data is compared with student scores on the examinations. These exit examinations now are available in shortened form (100-120 questions). Current plans for research include the establishment of the criterion-related (predictive) validity of these shortened forms.