Diane Smith


Washington, DC (202) 408-9514

Ronald M. Hager

National AT Advocacy Project

a project of Neighborhood Legal Services, Inc. · Buffalo, New York · (716) 847-0650


I. Legal Standards for Evaluations

A. Initial Evaluation

1. Developing the IEP begins with a comprehensive, individual evaluation. East Penn School District v. Scott B., 29 IDELR 1058 (E.D. Pa. 1999).

2. The evaluation is to assist the IEP Team in determining whether the student has a disability and, if so, to determine the educational needs of the child and develop the IEP. 20 U.S.C. § 1414(a)(1)(B); 34 C.F.R. § 300.532(b)(2).

3. The evaluation is to include a review of existing data, including that provided by the parent, and current classroom-based assessments, as well as observations by teachers and related services providers. 20 U.S.C. § 1414(c)(1); 34 C.F.R. § 300.533(a).

4. It must assess the relative contribution of cognitive, behavioral, physical and developmental factors and obtain information about the student’s prospects for participating in the general curriculum. 20 U.S.C. § 1414(b)(2).

5. No single procedure may be used as the sole criterion for determining the child’s needs and evaluations must include those tailored to assess specific areas of educational need and not just those designed to provide a single general intelligence quotient. 34 C.F.R. § 300.532(d) & (f).

6. The child must be assessed in all areas of suspected disability to determine the present levels of performance and the educational needs of the child, including, if appropriate, health, vision, hearing, social and emotional status, general intelligence, academic performance, communicative status and motor abilities. 20 U.S.C. §§ 1414(b)(3)(C) and 1414(c)(1)(B)(ii); 34 C.F.R. § 300.532(g).

7. The evaluation must be sufficiently comprehensive to identify all of the child’s special education and related services needs, whether or not commonly linked to the disability category in which the child has been classified. 34 C.F.R. § 300.532(h).

8. The evaluation materials may not be racially or culturally discriminatory. They must be administered in the child’s native language or other mode of communication "unless it is clearly not feasible to do so." 20 U.S.C. § 1414(b)(3)(A).

9. Assessments of students with limited English proficiency must be administered to ensure they measure whether the student has a disability, rather than measuring the child’s English language skills. 34 C.F.R. § 300.532(a)(2).

10. Tests must be selected and administered so as best to ensure that if a test is administered to a child with impaired sensory, manual, or speaking skills, the test results accurately reflect the child's aptitude or achievement level or whatever other factors the test purports to measure, rather than reflecting the child's impaired sensory, manual, or speaking skills (unless those skills are the factors that the test purports to measure). 34 C.F.R. § 300.532(e).

11. A child may not be determined to be eligible if the determinant factor is lack of instruction in reading or math, or limited English proficiency, and the child does not otherwise meet the eligibility criteria under § 300.7. 34 C.F.R. § 300.534(b).

12. When interpreting evaluation materials to determine eligibility and educational need, the District must draw upon information from a variety of sources, including aptitude and achievement tests, parent input, teacher recommendations, physical condition, social or cultural background, and adaptive behavior. 34 C.F.R. § 300.535(a)(1).

13. For an AT evaluation, the district must assess "the student’s functional capabilities and whether they may be increased, maintained, or improved through the use of [AT] devices or services." OSEP Policy Letter to J. Fisher, 23 IDELR 565 (12/4/95).

B. Independent Evaluation at District Expense

1. If the parents disagree with the evaluation obtained by the school, they may request an independent evaluation at school expense. 34 C.F.R. § 300.502(b).

2. Requests for independent evaluations should be submitted before obtaining the evaluation, but this is not required. OSEP Policy Letter to Hon. J. Fields, 2 EHLR 213:259 (1989).

3. Independent means the examiner must be qualified and not employed by the public agency which is responsible for educating the child. 34 C.F.R. § 300.502(a)(3)(i).

4. The evaluation must meet the same criteria used by the public agency when it conducts an evaluation. 34 C.F.R. § 300.502(3)(1).

5. The school is allowed to ask the parents for the reasons they are disagreeing with the school’s evaluation, but cannot require it. 34 C.F.R. § 300.502(b)(4).

6. In either event, the school must, without unreasonable delay, either agree to pay for the independent evaluation or initiate a hearing to show its evaluations were appropriate. 34 C.F.R. § 300.502(b)(2).

7. A parent has the right to an independent AT evaluation, at school expense, if the parent disagrees with the evaluation obtained by the school, and the school fails to show that its evaluations were appropriate. OSEP Policy Letter to J. Fisher, 23 IDELR 565 (12/4/95).

C. Reevaluations

1. Reevaluations of the student must be conducted at least every three years, and more frequently if conditions warrant or if the teacher or parent requests. 20 U.S.C. § 1414(a)(2)(A).

2. "As... part of any reevaluation under Part B of the Act, a group that includes the individuals described in Section 344 [ the required individuals for an IEP team], and other qualified professionals, as appropriate shall–

(1) Review existing evaluation data on the child, including...evaluations and information provided by the parents...Current class-room based assessments and observations... Observations by teachers and related services providers... and

(2) On the basis of that review and input from the child’s parents, identify what additional data, if any, are needed to determine–

(iv) Whether any additions or modifications to the special education and related services are needed to enable the child to meet the measurable annual goals set out in the IEP of the child and to participate, as appropriate, in the general curriculum." 34 C.F.R. § 300.533(a)(1) and (2)(iv).

II. Interpreting Test Data

A. Introduction

1. Excerpted from: UNDERSTANDING TESTS AND MEASUREMENTS for the Parent and Advocate, by Peter W. D. Wright, Esq. and Pamela Darr Wright, M.A., M.S.W., Licensed Clinical Social Worker (1998), available at

2. "If something exists, it exists in some amount. If it exists in some amount, then it is capable of being measured." Rene Descartes, Principles of Philosophy, 1644

3. Critical educational decisions are often made, based on the subjective beliefs of parents and educators. Without objective information, both sides will take positions that are based upon emotions --- and tempered by hopes and fears.

B. Objective Measurement of Progress

1. The IEP must include: appropriate objective criteria and evaluation procedures and schedules for determining, on at least an annual basis, whether the short term instructional objectives are being achieved. 34 C.F.R. § 300.346.

2. For example, "Improve total reading level from the 5.4 grade level to the 5.8 grade level as measured by the Woodcock Reading Mastery Test." "Improve math skills from the 6.4 grade equivalent to the 6.8 grade equivalent as measured by the Key Math Diagnostic Test."

3. Instead of developing an IEP that will measure progress in reading on a specific objective test, the special education staff may come up with a goal such as: "Johnny will make measurable progress in reading, as measured by teacher observation and teacher made tests at 80% accuracy." The criteria of mastery becomes 80% of a subjective opinion.

C. Measurement of Change: Rulers, Yardsticks and Other Tools

1. Measuring child growth

a. You can measure your child's physical growth with a measuring tape and a bathroom scale. Using these tools, you can document his physical growth. You don't need to be a doctor to understand that increases in these measurements prove that your child is growing.

b. If you (or your child's pediatrician) have been measuring your child at regular intervals, you can create a chart or graph that documents changes in height or weight over time.

c. Your child's pediatrician has "growth charts" that you can use to compare your child's growth with the growth of the "average" child.

2. Measuring educational growth

a. Likewise, educational growth can be measured and charted.

b. The yardsticks used for measurement are different, but the principles are the same.

3. Group standardized tests

a. Most school districts test their students on standardized group educational achievement tests at regular intervals.

b. The information they provide is similar to that provided by medical screening tests. Medical screening tests can suggest that a problem exists. Additional testing is usually necessary before the problem can be accurately identified and a treatment plan developed.

c. Children's learning problems can be identified in a similar manner. In most public schools, specific individual ability and achievement tests to clarify learning problems are administered by school psychologists and educational diagnosticians.

4. The Bell Curve

a. In nature, traits and characteristics distribute themselves along theoretical curves.

b. For our purposes, the most important curve is called the normal frequency distribution or bell curve.

c. Because the percentages of areas along the bell curve are well-known and thoroughly researched, they become our frame of reference.

5. Reporting educational scores

a. Age equivalent scores (AE)

b. Grade equivalent scores (GE)

c. Standard scores (SS) and standard deviations (SD)

d. Percentile ranks (PR).

e. Example

(1) On a test of push-ups, Frank earned a raw score of 15 push-ups which converts to a percentile rank of 95 (PR=95). Frank's score looks great --- until we remember that Frank was "held back" three times. Although he is in the fifth grade, Frank is 13 years old!

(2) The average score for 8th graders (who are 13 years old) is 15. Frank scored 15. Frank had a grade equivalent score of 8th grade (GE = 8.0) and an age equivalent score of 13 years (AE = 13-0).

(3) Frank (age 13) was included in our sample of 5th graders who had an average age of 10. When compared to this group of children who were younger than him, Frank scored at the 95% percentile rank (PR) level. Question: If we compare Frank's performance to that of children who are three years younger than him, will this comparison provide us with an accurate picture of his physical fitness? Answer: No.

D. What Does a Test Measure?

1. Tests do not necessarily measure what they purport to measure.

2. To demonstrate this point, let's look at tests that measure reading ability.

a. One test that measures a child's reading ability actually measures the child's ability to correctly read aloud and pronounce isolated words out of context, i.e., a word recognition test. The test includes a list of words, i.e., cat, tree, dog, house, person, etc. This kind of reading test does not measure true reading and may be adversely impacted by speech or word finding problems.

b. Another reading test measures reading by having the child read a passage of text, then answer a series of multiple choice questions about the passage. In this case, the child's score may be a measure of the child's ability to intellectually eliminate certain answers of the multiple choice format, i.e., a test of reasoning, not true reading. Some very bright children may need to recognize and interpret only a few words to discern the total context. Other children have excellent word recognition abilities but cannot link or interpret the words in a body of text or passage.

c. Another reading test has the child read a passage of text aloud (measuring oral reading) and then answer questions. The accuracy of the words read aloud and the child's understanding of the passage makes up the reading score.

E. Subtest Scatter

1. If significant scatter exists, this suggests that the child has areas of strength and weakness that need to be explored.

2. How can you determine if significant subtest scatter is present?

a. Most subtests have a mean score of 10. Most children will score + or - 3 points away from the mean of 10, i.e. most children will score between 7 and 13.

b. If the mean on a subtest is 10 (and most children score between 7 and 13), then scores between 9 and 11 will represent minimal subtest scatter.

3. Lets assume that Child A is given a test that is composed of 10 subtests.

a. The child's scores on the 10 subtests are as follows: on 4 subtests, the child scores 10, on 3 subtests, the child scores 9, and on 3 subtests, the child scores 11.

b. In this case, the overall composite score is 10 and the scatter is very minimal. This child scored in the average range in all 10 subtests.

4. In our next example, we will assume that Child B earns 4 subtest scores of 10, 3 scores of 4, and 3 scores of 16.

a. The child did extremely well on 3 tests, very poorly on 3 tests, and average on 4 subtests. Again, the child's composite score would be 10.

b. Subtest scatter is the difference between the highest and lowest scores. In this case, subtest scatter would be 12 (16-4 = 12).

F. Composite Scores

1. On the Wechsler Intelligence Scale for Children, Third Edition (WISC-III), three scores are usually provided --- a Verbal IQ (VIQ), a Performance IQ (PIQ), and a Full Scale IQ (FSIQ).

a. Each of these IQs are composite scores. Both the Verbal and Performance IQ scores are composites of five different subtests, each of which measures a different area of ability.

b. The Full Scale IQ is a composite of the Verbal and Performance scores --- which makes it a composite of ten different subtests.

c. IQs between 90 and 110 are considered within the "average range."

2. If we rely on composite IQ scores, we may easily be misled.

a. On the Wechsler Intelligence Scale for Children-III, Katie achieved a Full Scale IQ of 101.

b. If the only number you had was her Full Scale IQ score, you would probably assume that her IQ of 101 placed her squarely in the "average range" of intellectual functioning. Is Katie an "average" child?

c. Katie's Verbal IQ is 114 and her Performance IQ is 86. There is a 28 point difference between Katie's Verbal and Performance IQ scores.

d. Katie's Verbal IQ of 114 translates into a percentile rank of 82 (PR=82). Her Performance IQ of 86 converts to a percentile rank of 18 (PR = 18). Katie has a percentile rank fluctuation of 64 points (82-18=64) between her verbal and performance abilities.

3. The Woodcock-Johnson Psycho-Educational Battery-Revised (WJ-R) consists of a number of mandatory and optional subtests.

a0 The results obtained by the child on these different subtests are combined into composite or cluster scores.

b0 If we rely on composite or cluster scores, without examining the child's scores on the individual subtests, we can easily overlook obvious deficiencies and significant strengths.

G0 When Apparent Progress Means Actual Regression

1a Measuring change, called pre- and post- testing, has great relevance to educational planning.

2a After the child's performance level is identified, we can re- test the child later to measure progress, regression, or whether the child is maintaining the same position within the group.

3a Using the scores obtained from pre- and post- testing, we can create graphs to visually demonstrate the child's progress or lack of progress in an academic area.

4a Example

a0 According to earlier testing in September, Erik completed 13 push-ups which placed him in the top 84 percent of all youngsters in his class. After a year of fitness training, all of the fifth grade children were re-tested. When Erik was re-tested, he completed 14 push-ups.

b0 The average performance of the fifth grade class improved by 2 push-ups (from an average raw score of 10 to an average raw score of 12). Erik's raw score increased by 1 push-up, from 13 to 14.

c0 So, we see that although Erik's age equivalent and grade equivalent scores increased slightly from the prior testing, his actual position in the group dropped from the 84th to about the 75th percentile level. While still ahead of his peers, Erik did regress.

H0 Norm Referenced and Criterion Referenced Tests

1a When we evaluated our sample group of fifth graders, we compared each child's performance to the norm group of fifth graders.

a0 Both Erik (raw score of 13, percentile rank of 84) and Sam (raw score of 7, percentile rank of 16) were referenced or compared to this norm group of fifth graders.

b0 To evaluate benefit, we looked at the norm group and the individual child's relative position in that group at the time of the first and second tests. We computed each child's change in position, i.e. progress or regression.

2a In our example, we also referenced the criteria of number of push-ups completed.

a0 A criterion reference analysis determines whether or not a child meets certain criteria (without reference to a norm group.)

b0 For example, at the beginning of the year, Sam completed 7 push-ups. If the criteria for success was 8 push-ups, then Sam failed to reach that goal.

c0 Let's assume that Sam received a year of physical fitness remediation; after that year, Sam completed the 8 push-ups. Does Sam now met the criteria for success? The answer to this question depends on whether the criteria have increased now that Sam is a year older.

d0 We know that Sam's' peer group completed 10 push-ups at the beginning of the year and 12 at the end of the year. Definitions of success are affected by the passage of time. If we rely on criterion referenced measures, we can be misled as to whether the child is falling further behind the peer group. We need to know exactly what the criterion is and what this means when the child is compared to a norm group.

I0 Standard Deviation

1a In most educational and psychological tests, the mean is 100 and the standard deviation is 15. (Mean = 100, SD = 15)

2a Average scores do not deviate far from the mean.

3a As scores fall significantly above or below the mean, they are referred to as being a certain value or distance from the mean, e.g., 1 or 2 standard deviations from the mean.

4a In all tests, the mean is at 0 (zero) standard deviations from the mean. The next marker on the bell curve is +1 and -1 standard deviations from the mean, followed by 2 standard deviations from the mean.

5a In most subtests, the mean is 10 and the standard deviation is 3. (Mean = 10, SD = 3)

a0 One standard deviation above the mean is 10 plus 3, i.e. 10 + 3 = 13.

b0 One standard deviation below the mean is 10 minus 3; i.e. 10 - 3 = 7.

6a One standard deviation above the mean always falls at the 84 percent level (PR = 84); and one standard deviation below the mean is always at the 16 percent level (PR = 16).

7a Two SD's above the mean is always at the 98 percent level (PR = 98); and two SD's below the mean are always at the 2 percent level (PR = 2).

J0 Standard Scores

1a In standard scores, the average score or mean is 100, with a standard deviation of 15.

2a The average child will earn a standard score of 100.

3a If a child scores 1 standard deviation above the mean, the standard score is 100 plus 15; i.e. 100 + 15 = 115.

4a If the child scores 1 standard deviation below the mean, this is 100 minus 15, i.e. 100 - 15 = 85.

K0 Conversion of Standard Scores and Subtest Scores Into Percentile Ranks


Standard Score

Subtest Score

% Rank



> 99



> 99




































































































































L0 Other Tests: Means and Standard Deviations

1a Z scores are simply standard deviation scores of one with a mean of zero (Mean = 0, SD = 1, instead of a mean of 100 and SD of 15 as we found with standard scores).

a0 If you know that a particular child earned a Z score of -1, then you also know that the child's score was one standard deviation below the mean, which is a percentile rank of 16.

b0 If you convert this score, using the standard score format with a mean of 100 and a standard deviation of 15, you will see that a z score of -1 is the same as a standard score of 85.

2a With T scores, the mean is 50 and each unit of standard deviation is equal to 10.

a0 A T score of 60 is the same as a Z score of +1. A T score of 60 and a Z score of +1 are equal to a percentile rank of 84.

b0 A T score of 70 is equal to a Z score of +2, a standard score of 130, and a percentile rank of 98.

3a In Stanine tests, the mean is five and the standard deviation is 2.

M0 Specific Tests

1a Test publishers are constantly updating and revising their tests.

a0 The Wechsler Intelligence test for children was originally known as the WISC. Later, it was revised and became known as the WISC-R. Several years ago, the next version was published as the WISC-III.

b0 The first Test of Written Language (TOWL) was replaced by the TOWL-II and was recently revised again.

c0 The Woodcock Johnson battery of tests was known as the Woodcock Johnson Psycho-Educational Battery. The WJPEB included educational achievement testing and cognitive ability testing. Dr. Woodcock also produced the Woodcock Reading Mastery Test. Today, the current test series is called the Woodcock-Johnson Psycho-Educational Battery, Revised, (WJ-R) which is an educational achievement test that includes the Test of Cognitive Abilities.

2a The current version of any popular test is probably in a revision status. A competitor test publishing company is probably trying to develop a new and better version of the competitor's product.

3a Interested people may ask the examiner to photocopy relevant portions of the manual for you. Examiners cannot copy actual test questions for you, but may be able to copy the instructions and explanations. This is your best source of current test information.


a0 Subtests of the Wechsler Intelligence range from a low score of 1 to a maximum score of 19.

b0 Subtest measures

(1) Subtests in square brackets are optional.

(2) Verbal subtests are those with semantic items, performance subtests are those with pictorial items.

(3) All verbal subtests require that the child interpret meaning from the English language in some way. Performance subtests could be given and responded to without using language at all, merely by pointing at examples and available materials, for example.

Verbal Subtests

Information - (1) Fund of general knowledge; (2) Factual knowledge, long-term memory, recall; (3) This measures how much general information the child has learned from school and at home.

Similarities - (1) Verbal abstract reasoning; (2) Abstract reasoning, verbal categories and concepts; (3) This measures the child's ability to think abstractly. The child decides how things are different or alike.

Arithmetic - (1) Numerical reasoning, attention and short-term memory for meaningful information; (2) Attention and concentration, numerical reasoning; (3) This is not pencil-and-paper arithmetic. Rather it measures verbal mathematical reasoning skills by giving the child oral problems to solve.

Vocabulary - (1) Knowledge of word meanings; (2) Language development, word knowledge, verbal fluency; (3) The child explains that a word means by defining or describing what it does. The dictionary definition is not the only acceptable answer.

Comprehension - (1) Social comprehension and judgment; (2) Social and practical judgment, common sense; (3) This measures how well your child can think abstractly and understand concepts

[Digit Span] - (1) Short-term auditory memory for non-meaningful information; (2) Short-term auditory memory, concentration; (3) This measures a child's ability to remember a sequence of numbers (both backwards and forwards). This sub-test is optional and does not have to be included in your child's assessment

Performance Subtests

Picture Completion - (1) Attention to visual detail; (2) Alertness to detail, visual discrimination; (3) The child looks at pictures and tells the examiner what part is missing

Coding - (1) Visual-motor skills, processing speed; (2) Visual-motor coordination, speed, concentration; (3) This section measures a child's ability to decipher a code and copy the correct symbols in a controlled period of time.

Picture Arrangement - (1) Attention to visual detail, sequential reasoning; (2) Planning, social logical thinking knowledge; (3) This requires a child to put pictures in order so that the story they tell makes sense. It measures their ability to create the whole from its parts.

Block Design - (1) Visual abstract ability; (2) Spatial analysis, abstract visual problem-solving; (3) Unlike picture arrangement, where the child is given the parts and makes up the whole, this test measures the child's ability to look at the whole first, then break it into parts, and finally to reconstruct the whole. It provides blocks and pictures, and the child must put the blocks together to re-create what's in the picture of the blocks

Object Assembly - (1) Part-whole reasoning; (2) Visual analysis and construction of objects; (3) The child is given puzzle parts and must complete the puzzle. It measures a child's ability to make a whole out of its parts.

[Symbol Search] - (2) Visual-motor quickness, concentration, persistence (note: new with WISC III)

[Mazes] - (1) Graphomotor planning, visual-motor coordination and speed; (2) Fine motor coordination, planning, following directions; (3) The child has to find the way out of a maze by using a pencil. Performance is also based on time.

N0 Documenting Results

1a Make a list of all the times when your child has been tested. Arrange your list in chronological order. Include the names, dates, and scores of each test that has been administered to your child more than once.

2a Begin your list with the test or tests that have been administered most frequently.

3a Write down all of the scores from the first administration of a test battery. Convert these scores to percentile ranks. Complete the same process with the most recent testing of the same battery. Compare the results. You should be able to determine whether your child is being remediated (catching up), staying in the same position, or falling further behind the peer group.

4a Take the most glaring deficiencies where your child has shown minimal progress or even regression and chart out the test results.