GAO-03-1091, Justice Outcome Evaluations: Design and Implementation of Studies Require More NIJ Attention


This is the accessible text file for GAO report number GAO-03-1091 
entitled 'Justice Outcome Evaluations: Design and Implementation of 
Studies Require More NIJ Attention' which was released on October 08, 
2003.

This text file was formatted by the U.S. General Accounting Office 
(GAO) to be accessible to users with visual impairments, as part of a 
longer term project to improve GAO products' accessibility. Every 
attempt has been made to maintain the structural and data integrity of 
the original printed product. Accessibility features, such as text 
descriptions of tables, consecutively numbered footnotes placed at the 
end of the file, and the text of agency comment letters, are provided 
but may not exactly duplicate the presentation or format of the printed 
version. The portable document format (PDF) file is an exact electronic 
replica of the printed version. We welcome your feedback. Please E-mail 
your comments regarding the contents or accessibility features of this 
document to Webmaster@gao.gov.

This is a work of the U.S. government and is not subject to copyright 
protection in the United States. It may be reproduced and distributed 
in its entirety without further permission from GAO. Because this work 
may contain copyrighted images or other material, permission from the 
copyright holder may be necessary if you wish to reproduce this 
material separately.

Report to the Honorable Lamar Smith House of Representatives:

United States General Accounting Office:

GAO:

September 2003:

Justice Outcome Evaluations:

Design and Implementation of Studies Require More NIJ Attention:

Justice Outcome Evaluations:

GAO-03-1091:

GAO Highlights:

Highlights of GAO-03-1091, a report to The Honorable Lamar Smith, 
House of Representatives 

Why GAO Did This Study:

Policy makers need valid, reliable, and timely information on the 
outcomes of criminal justice programs to help them decide how to set 
criminal justice funding priorities. In view of previously reported 
problems with selected outcome evaluations managed by the National 
Institute of Justice (NIJ), GAO assessed the methodological quality of 
a sample of completed and ongoing NIJ outcome evaluation grants.

What GAO Found:

From 1992 through 2002, NIJ managed 96 evaluation studies that sought 
to measure the outcomes of criminal justice programs. Spending on 
these evaluations totaled about $37 million. Our methodological review 
of 15 of the 96 studies, totaling about $15 million and covering a 
broad range of criminal justice issues, showed that sufficiently sound 
information about program effects could not be obtained from 10 of the 
15. Five studies, totaling about $7.5 million (or 48 percent of the 
funds spent on the studies we reviewed), appeared to be 
methodologically rigorous in both design and implementation, enabling 
meaningful conclusions to be drawn about program effects. Six studies, 
totaling about $3.3 million (or 21 percent of the funds spent on the 
studies we reviewed), began with sound designs but encountered 
implementation problems that would render their results inconclusive. 
An additional 4 studies, totaling about $4.7 million (or 30 percent of 
the funds spent on the studies we reviewed), had serious 
methodological limitations that from the start limited their ability 
to produce reliable and valid results. Although results from 5 
completed studies were inconclusive, DOJ program administrators said 
that they found some of the process and implementation findings from 
them to be useful.

We recognize that optimal conditions for the scientific study of 
complex social programs almost never exist, making it difficult to 
design and execute outcome evaluations that produce definitive 
results. However, the methodological adequacy of NIJ studies can be 
improved, and NIJ has taken several steps—including the formation of 
an evaluation division and funding feasibility studies--in this 
direction. It is too soon to tell whether these changes will lead to 
evaluations that will better inform policy makers about the 
effectiveness of criminal justice programs.

What GAO Recommends:

* review its ongoing outcome evaluation grants and develop appropriate 
strategies and corrective measures to ensure that methodological 
design and implementation problems are overcome so the evaluations can 
produce more conclusive results;

* continue efforts to respond to GAO’s 2002 recommendation that NIJ 
assess its evaluation process with the purpose of developing 
approaches to ensure that future outcome evaluations are funded only 
when they are effectively designed and implemented.

In commenting on a draft of this report, DOJ agreed with GAO’s 
recommendations, and cited several current and planned activities 
intended to improve NIJ’s evaluation program. DOJ also made two 
substantive comments related to the presentation of information that 
GAO responded to in the report.

www.gao.gov/cgi-bin/getrpt?GAO-03-1091.

To view the full product, including the scope and methodology, click 
on the link above. For more information, contact Laurie E. Ekstrand 
(202) 512-8777 or ekstrandl@gao.gov.

[End of section]

Contents:

Letter:

Results in Brief:

Background:

Overview of the Evaluations We Reviewed:

Most of the Reviewed NIJ Outcome Evaluations Could Not Produce 
Sufficiently Sound Information on Program Outcomes:

Completed Outcome Evaluations Produced Useful Information on Processes 
but Not on Outcomes for DOJ Program Administrators:

NIJ's Current and Planned Activities to Improve Its Evaluation Program:

Conclusions:

Recommendations for Executive Action:

Agency Comments and our Evaluation:

Appendix I: Objectives, Scope, and Methodology:

Appendix II: Summaries of the NIJ Outcome Evaluations Reviewed:

Appendix III: Comments from the Department of Justice:

Appendix IV: GAO Contacts and Staff Acknowledgments:

GAO Contacts:

Staff Acknowledgments:

Tables:

Table 1: NIJ Outcome Evaluations Reviewed by GAO:

Table 2: Characteristics of 5 NIJ Outcome Evaluations with Sufficiently 
Sound Designs and Implementation Plans:

Table 3: Problems Encountered during Implementation of 6 Well-Designed 
NIJ Outcome Evaluation Studies:

Table 4: Design Limitations in 4 NIJ Outcome Evaluation Studies:

Table 5: Number and Size of Outcome Evaluation Awards Made by NIJ from 
1992 through 2002, and Reviewed by GAO:

Table 6: Size and Completion Status of the 15 Evaluations Selected for 
Methodological Review:

Table 7: Programs Evaluated and Funding Sources for Completed NIJ 
Outcome Evaluations:

Abbreviations:

BTC: Breaking the Cycle 

COPS: Community Oriented Policing Services 

DOJ: Department of Justice 

GREAT: Gang Resistance Education and Training 

NIJ: National Institute of Justice 

OJP: Office of Justice Programs 

OVW: Office on Violence Against Women:

United States General Accounting Office:

Washington, DC 20548:

September 24, 2003:

The Honorable Lamar Smith House of Representatives:

Dear Mr. Smith:

The U.S. Department of Justice (DOJ) spent almost $4 billion in fiscal 
year 2002 on assistance to states and local communities to combat 
crime. These funds were used to reduce drug abuse and trafficking, 
address the problems of gang violence and juvenile delinquency, expand 
community policing, and meet the needs of crime victims, among other 
things. In addition, state and local governments spend billions of 
their dollars annually on law enforcement and criminal justice 
programs. Given these expenditures, it is important to know which 
programs are effective in controlling and preventing crime so that 
limited federal, state, and local funds not be wasted on programs that 
are ineffective. As the principal research, development, and evaluation 
agency of DOJ, the National Institute of Justice (NIJ) is responsible 
for evaluating existing programs and policies that respond to crime. It 
spends millions of dollars annually to support studies intended to 
evaluate various DOJ funded programs as well as selected local 
programs. To the extent that NIJ evaluations produce credible, valid, 
reliable, and timely information on the efficacy of these programs in 
combating crime, they can serve an important role in helping 
policymakers make decisions about how to set criminal justice funding 
priorities.

Pursuant to our previous reports in which we reported problems with 
selected NIJ-managed outcome evaluations,[Footnote 1] in your former 
position as Chairman of the Subcommittee on Crime, House Judiciary 
Committee, you asked us to undertake a more extensive review of the 
outcome evaluation work performed under the direction of NIJ during the 
last 10 years. Outcome evaluations are defined as those efforts 
designed to determine whether a program, project, or intervention 
produced its intended effects. As agreed with your office, we are 
reporting on the methodological quality of a sample of completed and 
ongoing NIJ outcome evaluation grants, and the usefulness of the 
evaluations in producing information on outcomes. Because we learned of 
changes NIJ has underway to improve its administration of outcome 
evaluation studies, we are also providing information in this report 
about these changes.

To meet our objectives, we reviewed outcome evaluation grants managed 
by NIJ from 1992 through 2002. During this time period NIJ managed 96 
outcome evaluation grants. Of these 96 grants, we judgmentally selected 
and reviewed 15 outcome evaluations chosen so that they varied in grant 
size, completion status, and program focus. The selected studies 
accounted for about $15.4 million, or about 42 percent, of the 
approximately $36.6 million spent on outcome evaluation studies during 
the 10-year period. Although our sample is not representative of all 
NIJ outcome evaluations conducted during the last 10 years, it includes 
those that have received a large proportion of total funding for this 
type of research, and tends to be drawn from the most recent work. Our 
review assessed the methodological quality of these evaluations using 
generally accepted social science standards,[Footnote 2] including such 
elements as whether evaluation data were collected before and after 
program implementation; how program effects were isolated (i.e., the 
use of nonprogram participant comparison groups or statistical 
controls); and the appropriateness of sampling, outcome measures, 
statistical analyses, and any reported results. We grouped the studies 
into 3 categories based on our judgment of their methodological 
soundness. Although we recognize that the stronger studies may have had 
some weaknesses, and that the weaker studies may have had some 
strengths, our categorization of the studies was a summary judgment 
based on the totality of the information provided to us by NIJ. We also 
interviewed NIJ officials regarding the selection and oversight of 
these evaluation studies. To assess the usefulness of NIJ's outcome 
evaluations in producing information about program outcomes, we 
reviewed the findings from all 5 of the completed NIJ outcome 
evaluations in our sample that were funded in part by DOJ program 
offices, and interviewed program officials at NIJ and program 
administrators at DOJ's Office on Violence Against Women and Office of 
Community Oriented Policing Services. Further details on our 
methodology are provided in appendix I.

Results in Brief:

Our methodological review of 15 selected NIJ outcome evaluation studies 
undertaken since 1992 showed that although most studies began with 
sufficiently sound designs, most could not produce sufficiently sound 
information on program outcomes. Specifically, the studies could be 
characterized in the following ways:

* Studies that began with sufficiently sound evaluation designs: Eleven 
of the 15 studies began with sufficiently sound designs. Some of these 
well-designed studies were also implemented well, while others were 
not. Specifically,

* Five of the 11 studies were sufficiently well designed and 
implemented--including having appropriate comparison groups or random 
assignment to treatment and control groups, baseline measures, and 
follow-up data--so that meaningful conclusions could be drawn about 
program effects. Funding for these methodologically sound studies 
totaled about $7.5 million, or nearly 50 percent of the approximately 
$15.4 million spent on the studies we reviewed.

* Six of the 11 studies began with sufficiently sound designs, but 
encountered implementation problems that limited the extent to which 
the study objectives could be achieved. For example, some evaluators 
were unable to carry out a proposed evaluation plan because the program 
to be evaluated was not implemented as planned, or they could not 
obtain complete or reliable data on outcomes. In some cases, 
implementation problems were beyond the evaluators' control, and 
resulted from decisions made by agencies providing program services 
after the study was underway. These studies were limited in their 
ability to conclude that it was the program or intervention that caused 
the intended outcome results. Funding for these studies with 
implementation problems totaled about $3.3 million, or about 21 percent 
of the approximately $15.4 million spent on the studies we reviewed.

* Studies that did not begin with sufficiently sound designs. Four of 
the 15 studies had serious methodological problems from the beginning 
that limited their ability to produce results that could be 
attributable to the programs that were being evaluated. Methodological 
shortcomings in these studies included the absence of comparison groups 
or appropriate statistical controls, outcome measures with doubtful 
reliability and validity, and lack of baseline data. Funding for these 
studies that began with serious methodological problems totaled about 
$4.7 million, or about 30 percent of the approximately $15.4 million 
spent on the studies we reviewed.

Outcome evaluations are difficult to design and execute because optimal 
conditions for the scientific study of complex social programs almost 
never exist. Attributing results to a particular intervention can be 
difficult when such programs are evaluated in real world settings that 
pose numerous methodological challenges. All 5 of the completed NIJ 
outcome evaluations that focused on issues of interest to DOJ program 
offices had encountered some design and implementation problems. 
Nonetheless, DOJ program administrators told us that these evaluations 
produced information that prompted them to make a number of changes to 
DOJ-funded programs. The majority of the changes enumerated by DOJ 
program administrators occurred as a result of findings from the 
process or implementation components[Footnote 3] of the completed 
outcome evaluations, and not from findings regarding program results. 
For example, as a result of NIJ's evaluation of a DOJ program for 
domestic and child abuse victims in rural areas, DOJ developed a 
training program to assist grantees in creating collaborative programs 
based on the finding from the process evaluation that such information 
was not readily available.

Although outcome evaluations are difficult to design and execute, steps 
can be taken to improve their methodological adequacy and, in turn, the 
likelihood that they will produce meaningful information on program 
effects. NIJ officials told us that they have begun to take several 
steps to try to increase the likelihood that outcome evaluations will 
produce more definitive results, including the establishment of an 
Evaluation Division responsible for ensuring the quality and utility of 
NIJ evaluations, the funding of selected feasibility studies prior to 
soliciting outcome evaluations, and greater emphasis on applicants' 
prior performance in awarding evaluation grants.

We are making recommendations to the Attorney General to improve the 
quality of NIJ's outcome evaluations. We recommend that NIJ review the 
methodological adequacy of its ongoing grants and take action to 
improve, refocus, or limit them, as appropriate; and that NIJ develop 
approaches to ensure that future outcome evaluations are effectively 
designed and implemented. In commenting on a draft of this report, the 
DOJ's Office of Justice Programs' (OJP) Assistant Attorney General 
agreed with our recommendations. She also provided technical comments, 
which we evaluated and incorporated, as appropriate. The Assistant 
Attorney General made two substantive comments on our draft report--one 
relating to the fact that even rigorous study design and careful 
monitoring of program implementation do not ensure that evaluation 
results will be conclusive; the other relating to our purported focus 
on experimental and quasi-experimental methods to the exclusion of 
other high quality evaluation methods. We respond to these points in 
the Agency Comments and Evaluation section of the report.

Background:

NIJ is the principal research development, and evaluation agency within 
OJP. It was created under the 1968 Omnibus Crime Control and Safe 
Streets Act,[Footnote 4] and is authorized to enter into grants, 
cooperative agreements, or contracts with public or private agencies to 
carry out evaluations of the effectiveness of criminal justice programs 
and identify promising new programs. NIJ's Office of Research and 
Evaluation oversees evaluations by outside researchers of a wide range 
of criminal justice programs, including ones addressing violence 
against women, drugs and crime, policing and law enforcement, 
sentencing, and corrections.

According to NIJ officials, the agency initiates a specific criminal 
justice program evaluation in one of three ways. First, congressional 
legislation may mandate evaluation of specific programs. For example, 
the Departments of Commerce, Justice, and State, the Judiciary, and 
Related Agencies Appropriations Act, 2002,[Footnote 5] requires DOJ to 
conduct independent evaluations of selected programs funded by OJP's 
Bureau of Justice Assistance and selected projects funded by OJP's 
Office of Juvenile Justice and Delinquency Prevention. DOJ determined 
that NIJ would be responsible for overseeing these evaluations. Second, 
NIJ may enter into an evaluation partnership with another OJP or DOJ 
office, or another federal agency, to evaluate specific programs or 
issues of interest to both organizations. In these cases, NIJ, in 
partnership with the program offices, develops a solicitation for 
proposals and oversees the resulting evaluation. Third, NIJ 
periodically solicits proposals for evaluation of criminal justice 
programs directly from the research community, through an open 
competition for grants. These solicitations ask evaluators to propose 
research of many kinds in any area of criminal justice, or in broad 
conceptual areas such as violence against women, policing research and 
evaluation, research and evaluation on corrections and sentencing, or 
building safer public housing communities through research 
partnerships.

According to NIJ officials, once the decision has been made to evaluate 
a particular program, or to conduct other research in a specific area 
of criminal justice, the process of awarding an evaluation grant 
involves the following steps. First, NIJ issues a solicitation and 
receives proposals from potential evaluators. Next, proposals are 
reviewed by an external peer review panel, as well as by NIJ 
professional staff. The external review panels are comprised of members 
of the research and practitioner communities,[Footnote 6] and reviewers 
are asked to identify, among other things, the strengths and weaknesses 
of the competing proposals. External peer review panels are to consider 
the quality and technical merit of the proposal; the likelihood that 
grant objectives will be met; the capabilities, demonstrated 
productivity, and experience of the evaluators; and budget constraints. 
Reviews are to include constructive comments about the proposal, useful 
recommendations for change and improvement, and recommendations as to 
whether the proposal merits further consideration by NIJ. NIJ 
professional staff are to review all proposals and all written external 
peer reviews, considering the same factors as the peer review panels. 
NIJ professional staff are also to consider the performance of 
potential grantees on any other previous research grants with NIJ. 
Next, the results of the peer and NIJ staff reviews are discussed in a 
meeting of NIJ managers, led by NIJ's Director of the Office of 
Research and Evaluation. Then, NIJ's Office of Research and Evaluation 
staff meet with the NIJ Director to present their recommendations. 
Finally, the NIJ Director makes the funding decision based on peer 
reviews, staff recommendations, other internal NIJ discussions that may 
have taken place, and consideration of what proposals may have the 
greatest impact and contribute the most knowledge.

NIJ generally funds outcome evaluations through grants, rather than 
with contracts. NIJ officials told us that there are several reasons 
for awarding grants as opposed to contracts. Contracts can give NIJ 
greater control over the work of funded researchers, and hold them more 
accountable for results. However, NIJ officials said that NIJ most 
often uses grants for research and evaluation because they believe that 
grants better ensure the independence of the evaluators and the 
integrity of the study results. Under a grant, NIJ allows the principal 
investigator a great deal of freedom to propose the most appropriate 
methodology and carry out the data collection and analysis, without 
undue influence from NIJ or the agency funding the program. Grants also 
require fewer bureaucratic steps than do contracts, resulting in a 
process whereby a researcher can be selected in a shorter amount of 
time.

NIJ officials told us that NIJ tends to make use of contracts for 
smaller and more time-limited tasks--such as literature reviews or 
assessments of whether specific programs have sufficient data to allow 
for more extensive process or outcome evaluations--rather than for 
conducting outcome evaluations. NIJ also occasionally makes use of 
cooperative agreements, which entail a greater level of interaction 
between NIJ and the evaluators during the course of the evaluation. 
According to NIJ officials, cooperative agreements between NIJ and its 
evaluators tend to be slight variations of grants, with the addition of 
a few more specific requirements for grantees. NIJ officials told us 
that they might use a cooperative agreement when NIJ wants to play a 
significant role in the selection of an advisory panel, in setting 
specific milestones, or aiding in the design of specific data 
collection instruments.

NIJ is to monitor outcome evaluation grantees in accordance with 
policies and procedures outlined in the OJP Grant Management Policies 
and Procedures Manual. In general, this includes monitoring grantee 
progress through regular contact with grantees (site visits, cluster 
conferences, other meetings); required interim reports (semiannual 
progress and quarterly financial reports); and a review of final 
substantive evaluation reports. In some cases, NIJ will require 
specific milestone reports, especially on larger studies. Grant 
monitoring for all types of studies is carried out by approximately 20 
full-time NIJ grant managers, each responsible for approximately 17 
ongoing grants at any one time.

Overview of the Evaluations We Reviewed:

From 1992 through 2002, NIJ awarded about $36.6 million for 96 
evaluations that NIJ identified as focusing on measuring the outcomes 
of programs, policies, and interventions, among other things.[Footnote 
7] The 15 outcome evaluations that we selected for review varied in 
terms of completion status (8 were completed, 7 were ongoing) and the 
size of the award (ranging between about $150,000 and about $2.8 
million), and covered a wide range of criminal justice programs and 
issues (see table 1). All evaluations were funded by NIJ through grants 
or cooperative agreements.[Footnote 8] Seven of the 15 evaluations 
focused on programs designed to reduce domestic violence and child 
maltreatment, 4 focused on programs addressing the behavior of law 
enforcement officers (including community policing), 2 focused on 
programs addressing drug abuse, and 2 focused on programs to deal with 
juvenile justice issues.

Table 1: NIJ Outcome Evaluations Reviewed by GAO:

Domestic violence and child maltreatment: 

Grant: Domestic violence and child maltreatment: 

Grant: National Evaluation of the Rural Domestic Violence and Child 
Victimization Enforcement Grant Program; Award: $719,949; No; 
Status: Completed: Yes; Status: Ongoing: No.

Grant: National Evaluation of the Domestic Violence Victims' Civil 
Legal Assistance Program; Award: $800,154; No; Status: Completed: 
No; Status: Ongoing: Yes.

Grant: Multi-Site Demonstration of Collaborations to Address Domestic 
Violence and Child Maltreatment; Award: $2,498,638; No; Status: 
Completed: No; Status: Ongoing: Yes.

Grant: Evaluation of a Multi-Site Demonstration for Enhanced Judicial 
Oversight of Domestic Violence Cases; Award: $2,839,954; No; 
Status: Completed: No; Status: Ongoing: Yes.

Grant: An Evaluation of Victim Advocacy with a Team Approach; Award: 
$153,491; No; Status: Completed: Yes; Status: Ongoing: No.

Grant: Culturally Focused Batterer Counseling for African-American Men; 
Award: $356,321; No; Status: Completed: No; Status: Ongoing: Yes.

Grant: Testing the Impact of Court Monitoring and Batterer Intervention 
Programs at the Bronx Misdemeanor Domestic Violence Court; Award: 
$294,129; No; Status: Completed: No; Status: Ongoing: Yes.

Grant: Law enforcement: 

Grant: An Evaluation of Chicago's Citywide Community Policing Program; 
Award: $2,157,859; No; Status: Completed: Yes; Status: Ongoing: No.

Grant: Corrections and Law Enforcement Family Support: Law Enforcement 
Field Test; Award: $649,990; No; Status: Completed: No; 
Status: Ongoing: Yes.

Grant: Reducing Non-Emergency Calls to 911: An Assessment of Four 
Approaches to Handling Citizen Calls for Service; Award: $399,919; 
No; Status: Completed: Yes; Status: Ongoing: No.

Grant: Responding to the Problem Police Officer: An Evaluation of Early 
Warning Systems; Award: $174,643; No; Status: Completed: Yes; 
Status: Ongoing: No.

Grant: Drug abuse: 

Grant: Evaluation of Breaking the Cycle; Award: $2,419,344; No; 
Status: Completed: Yes; Status: Ongoing: No.

Grant: Evaluation of a Comprehensive Service-Based Intervention 
Strategy in Public Housing; Award: $187,412; No; Status: 
Completed: Yes; Status: Ongoing: No.

Grant: Juvenile justice issues: 

Grant: National Evaluation of the Gang Resistance Education and 
Training Program; Award: $1,568,323; No; Status: Completed: Yes; 
Status: Ongoing: No.

Grant: Evaluation of a Juvenile Justice Mental Health Initiative with 
Randomized Design; Award: $200,000; No; Status: Completed: 
No; Status: Ongoing: Yes.

Source: GAO analysis of NIJ data.

[End of table]

Most of the Reviewed NIJ Outcome Evaluations Could Not Produce 
Sufficiently Sound Information on Program Outcomes:

Overall, we found that 10 of the 15 evaluations that we reviewed could 
not produce sufficiently sound information about program outcomes. Six 
evaluations began with sufficiently sound designs, but encountered 
implementation problems that would render their results inconclusive. 
An additional 4 studies had serious methodological problems that from 
the start limited their ability to produce reliable and valid results. 
Five studies appeared to be methodologically rigorous in both their 
design and implementation. (Appendix II provides additional information 
on the funding, objectives, and methodology of the 15 outcome 
evaluation studies.):

Most of the Reviewed Studies Were Well Designed, but Many Later 
Encountered Implementation Problems:

Our review found that 5 evaluations had both sufficiently sound designs 
and implementation plans or procedures, thereby maximizing the 
likelihood that the study could meaningfully measure program effects. 
Funding for these methodologically sound studies totaled about $7.5 
million, or nearly 50 percent of the approximately $15.4 million spent 
on the studies we reviewed. Six evaluations were well designed, but 
they encountered problems implementing the design as planned during the 
data collection phase of the study. Funding for these studies with 
implementation problems totaled about $3.3 million, or about 21 percent 
of the approximately $15.4 million spent on the studies we reviewed.

Five Evaluations Were Sufficiently Well Designed and Implemented:

Five of the evaluations we reviewed were well designed and their 
implementation was sufficiently sound at the time of our review. Two of 
these evaluations had been completed and 3 were ongoing. All 5 
evaluations met generally accepted social science standards for sound 
design, including measurement of key outcomes after a follow-up period 
to measure change over time, use of comparison groups or appropriate 
statistical controls to account for the influence of external factors 
on the results,[Footnote 9] random sampling of participants and/or 
sites or other purposeful sampling methods to ensure generalizable 
samples and procedures to ensure sufficient sample sizes, and 
appropriate data collection and analytic procedures to ensure the 
reliability and validity of measures (see table 2).

Studies Measured Change in Outcomes Over Time:

All 5 evaluations measured, or included plans to measure, specified 
outcomes after a sufficient follow-up period. Some designs provided for 
collecting baseline data at or before program entry, and outcome data 
several months or years following completion of the program. Such 
designs allowed evaluators to compare outcome data against a baseline 
measurement to facilitate drawing conclusions about the program's 
effects, and to gauge whether the effects persisted or were transitory. 
For example, the National Evaluation of the Gang Resistance Education 
and Training Program examined the effectiveness of a 9-week, school-
based education program that sought to prevent youth crime and violence 
by reducing student involvement in gangs. Students were surveyed 
regarding attitudes toward gangs, crime, and police, self-reported gang 
activity, and risk-seeking behaviors 2 weeks before the program began, 
and then again at yearly intervals for 4 years following the program's 
completion.

Table 2: Characteristics of 5 NIJ Outcome Evaluations with Sufficiently 
Sound Designs and Implementation Plans:

Evaluation study: National Evaluation of the Gang Resistance Education 
and Training Program; Sufficient follow-up: Yes; Use of comparison groups 
to control for external factors: Yes; Appropriate sampling procedures and 
reasonable sample sizes: Yes; Appropriate data collection and analysis 
procedures: Yes.

Evaluation study: Evaluation of Breaking the Cycle; Sufficient follow-
up: Yes; Use of comparison groups to control for external factors: Yes; 
Appropriate sampling procedures and reasonable sample sizes: Yes; 
Appropriate data collection and analysis procedures: Yes.

Evaluation study: Evaluation of a Multi-Site Demonstration for Enhanced 
Judicial Oversight of Domestic Violence Cases; Sufficient follow-up: 
Planned; Use of comparison groups to control for external factors: Yes; 
Appropriate sampling procedures and reasonable sample sizes: Planned; 
Appropriate data collection and analysis procedures: Planned.

Evaluation study: Culturally Focused Batterer Counseling for African-
American Men; Sufficient follow-up: Planned; Use of comparison groups 
to control for external factors: Yes; Appropriate sampling procedures and 
reasonable sample sizes: Planned; Appropriate data collection and 
analysis procedures: Planned.

Evaluation study: Testing the Impact of Court Monitoring and Batterer 
Intervention Programs at the Bronx Misdemeanor Domestic Violence 
Court[A]; Sufficient follow-up: Planned; Use of comparison groups to 
control for external factors: Yes; Appropriate sampling procedures and 
reasonable sample sizes: Planned; Appropriate data collection and 
analysis procedures: Planned.

Source: GAO analysis of NIJ data.

[A] Although we have categorized this evaluation as having a 
sufficiently sound design and implementation plan, the grantee's 
proposal did not discuss how differential attrition from the four 
treatment groups would be handled if it occurred. Therefore, we do not 
know if the grantee has made sufficient plans to address this potential 
circumstance.

[End of table]

Measuring change in specific outcome variables at both baseline and 
after a follow-up period may not always be feasible. When the outcome 
of interest is "recidivism," such as whether drug-involved criminal 
defendants continue to commit criminal offenses after participating in 
a drug treatment program, the outcome can only be measured after the 
program is delivered. In this case, it is important that the follow-up 
period be long enough to enable the program's effects to be discerned. 
For example, the ongoing evaluation of the Culturally Focused Batterer 
Counseling for African-American Men seeks to test the relative 
effectiveness of counseling that recognizes and responds to cultural 
issues versus conventional batterer counseling in reducing batterer 
recidivism. All participants in the study had been referred by the 
court system to counseling after committing domestic violence 
violations. The evaluators planned to measure re-arrests and re-
assaults 1 year after program intake, approximately 8 months after the 
end of counseling. The study cited prior research literature noting 
that two-thirds of first-time re-assaults were found to occur within 6 
months of program intake, and over 80 percent of first-time re-assaults 
over a 2-1/2 year period occur within 12 months of program intake.

Comparison Groups Were Used to Isolate Program Effects:

All 5 evaluations used or planned to use comparison groups to isolate 
and minimize external factors that could influence the results of the 
study. Use of comparison groups is a practice employed by evaluators to 
help determine whether differences between baseline and follow-up 
results are due to the program under consideration or to other programs 
or external factors. In 3 of the 5 studies, research participants were 
randomly assigned to a group that received services from the program or 
to a comparison group that did not receive services. In constructing 
comparison groups, random assignment is an effective technique for 
minimizing differences between participants who receive the program and 
those who do not on variables that might affect the outcomes of the 
study. For example, in the previously mentioned ongoing evaluation of 
Culturally Focused Batterer Counseling for African-American Men 
participants who were referred to counseling by a domestic violence 
court are randomly assigned to one of three groups: (1) a culturally 
focused group composed of only African-Americans, (2) a conventional 
counseling group composed of only African-Americans, or (3) a mixed 
race conventional counseling group. The randomized design allows the 
investigators to determine the effect of the culturally focused 
counseling over and above the effect of participating in a same race 
group situation.

In the remaining two evaluation studies, a randomized design was not 
used and the comparison group was chosen to match the program group as 
closely as possible on a number of characteristics, in an attempt to 
ensure that the comparison and program groups would be similar in 
virtually all respects aside from the intervention. For example, the 
ongoing Evaluation of a Multi-Site Demonstration for Enhanced Judicial 
Oversight of Domestic Violence Cases seeks to examine the effects of a 
coordinated community response to domestic violence (including 
advocacy, provision of victim services, and enhanced judicial 
oversight) on victim safety and offender accountability. To ensure that 
the comparison and program groups were similar, comparison sites were 
selected based on having court caseload and population demographic 
characteristics similar to the demonstration sites. Only the program 
group is to receive the intervention; and neither comparison site has a 
specialized court docket; enhanced judicial oversight; or a county-
wide, coordinated system for handling domestic violence cases.

Sufficiently Sound Sampling Procedures and Adequate Response Rates 
Helped Ensure Representativeness:

All 5 evaluations employed or planned to employ sufficiently sound 
sampling procedures for selecting program and comparison participants. 
This was intended to ensure that study participants were representative 
of the population being examined so that conclusions about program 
effects could be generalized to that population. For example, in the 
previously mentioned Judicial Oversight Demonstration evaluation, 
offenders in program and comparison sites are being chosen from court 
records. In each site, equal numbers of eligible participants are being 
chosen consecutively over a 12-month period until a monthly quota is 
reached. Although this technique falls short of random sampling, the 
optimal method for ensuring comparability across groups, use of the 12-
month sampling period takes into consideration and controls for 
possible seasonal variation in domestic violence cases.

The 5 evaluations also had adequate plans to achieve, or succeeded in 
achieving, reasonable response rates from participants in their 
samples. Failure to achieve adequate response rates threatens the 
validity of conclusions about program effects, as it is possible that 
selected individuals who do not respond or participate are 
substantially different on the outcome variable of interest from those 
who do respond or participate. The previously mentioned National 
Evaluation of the Gang Resistance Education and Training Program sought 
to survey students annually for up to 4 years after program 
participation ended. The grantee made considerable efforts in years 2, 
3, and 4 to follow up with students who had moved from middle school to 
high school and were later enrolled in a large number of different 
schools; in some cases, in different school districts. The grantee 
achieved a completion rate on the student surveys of 76 percent after 2 
years,[Footnote 10] 69 percent after 3 years, and 67 percent after 4 
years. The grantee also presented analyses that statistically 
controlled for differential attrition among the treatment and 
comparison groups, and across sites, and showed that the program 
effects that were found persisted in these specialized analyses.

Careful Data Collection and Analysis Procedures Were Used or Planned:

All 5 well-designed evaluations employed or had adequate plans to 
employ careful data collection and analysis procedures. These included 
procedures to ensure that the comparison group does not receive 
services or treatment received by the program group, response rates are 
documented, and statistical analyses are used to adjust for the effects 
of selection bias or differential attrition on the measured 
results.[Footnote 11] For example, the Breaking the Cycle evaluation 
examined the effectiveness of a comprehensive effort to reduce 
substance abuse and criminal activity among arrestees with a history of 
drug involvement. The program group consisted of felons who tested 
positive for drug use, reported drug use in the past, or were charged 
specifically with drug-related felonies. The comparison group consisted 
of persons arrested a year before the implementation of the Breaking 
the Cycle intervention who tested positive for at least one drug. Both 
groups agreed to participate in the study. Although groups selected at 
different times and using different criteria may differ in systematic 
ways, the evaluators made efforts to control for differences in the 
samples at baseline. Where selection bias was found, a correction 
factor was used in the analyses, and corrected results were presented 
in the report.

Six Studies Were Well-Designed but Encountered Problems During 
Implementation:

Six of the 11 studies that were well-designed encountered problems in 
implementation during the data collection phase, and thus were unable 
to or are unlikely to produce definitive results about the outcomes of 
the programs being evaluated. Such problems included the use of program 
and comparison groups that differed on outcome-related characteristics 
at the beginning of the program or became different due to differential 
attrition, failure of the program sponsors to implement the program as 
originally planned, and low response rates among program participants 
(see table 3). Five of the studies had been completed and 1 was 
ongoing.

Table 3: Problems Encountered during Implementation of 6 Well-Designed 
NIJ Outcome Evaluation Studies:

Evaluation study: An Evaluation of Chicago's Citywide Community 
Policing Program; Program and comparison groups differed: Yes; Program 
not implemented as planned: Yes; Response rates were low: No.

Evaluation study: Evaluation of a Comprehensive Service-Based 
Intervention Strategy in Public Housing; Program and comparison groups 
differed: Yes; Program not implemented as planned: Yes; Response rates 
were low: No.

Evaluation study: An Evaluation of Victim Advocacy with a Team 
Approach; Program and comparison groups differed: No; Program not 
implemented as planned: Yes; Response rates were low: Yes.

Evaluation study: Reducing Non-Emergency Calls to 911: An Assessment of 
Four Approaches to Handling Citizen Calls for Service; Program and 
comparison groups differed: No; Program not implemented as 
planned: Yes; Response rates were low: Yes.

Evaluation study: Responding to the Problem Police Officer: An 
Evaluation of Early Warning Systems; Program and comparison groups 
differed: Yes; Program not implemented as planned: No; Response 
rates were low: No.

Evaluation study: Evaluation of the Juvenile Justice Mental Health 
Initiative with Randomized Design; Program and comparison groups 
differed: No; Program not implemented as planned: Yes; Response 
rates were low: No.

Source: GAO analysis of NIJ data.

[End of table]

Differences between Program and Comparison Group Characteristics Make 
it Difficult to Attribute Outcomes to the Program:

Three of the 6 studies used a comparison group that differed from the 
program group in terms of characteristics likely to be related to 
program outcomes--either due to preexisting differences or to 
differential attrition--even though the investigators may have made 
efforts to minimize the occurrence of these problems.[Footnote 12] As a 
result, a finding that program and comparison group participants 
differed in outcomes could not be attributed solely to the program. For 
example, the Comprehensive Service-Based Intervention Strategy in 
Public Housing evaluation sought to reduce drug activity and promote 
family self-sufficiency among tenants of a public housing complex in 
one city through on-site comprehensive services and high profile police 
involvement. The intervention site was a housing project in one section 
of the city; the comparison site was another public housing complex on 
the opposite side of town, chosen for its similarities to the 
intervention site in terms of race, family composition, crime 
statistics, and the number of women who were welfare recipients. 
However, when baseline data from the two sites were examined, important 
preexisting differences between the two sites became apparent. These 
differences included a higher proportion of residents at the comparison 
site who were employed, which could have differentially affected 
intervention and comparison residents' propensity to utilize and 
benefit from available services. Additionally, since there was 
considerable attrition at both the intervention and comparison sites, 
it is possible that the intervention and comparison group respondents 
who remained differed on some factors related to the program outcomes. 
Although it may have been possible to statistically control for these 
differences when analyzing program outcomes, the evaluator did not do 
so in the analyses presented in the final report.

Program Results Not Measurable Because Program Not Implemented as 
Planned:

In 5 of the 6 studies, evaluators ran into methodological problems 
because the program under evaluation was not implemented as planned, 
and the investigators could not test the hypotheses that they had 
outlined in their grant proposals. For the most part, this particular 
implementation problem was beyond the evaluators' control. It resulted 
from decisions made by agencies providing program services that had 
agreed to cooperate with the evaluators but, for a number of reasons, 
made changes in the programs or did not cooperate as fully as expected 
after the studies were underway. This occurred in the evaluation of the 
Juvenile Justice Mental Health Initiative with Randomized Design, a 
study that is ongoing and expected to be completed in September 2003. 
The investigators had proposed to test whether two interventions 
provided within an interagency collaborative setting were effective in 
treating youths with serious emotional disturbances referred to the 
juvenile justice system for delinquency. Juveniles were to be randomly 
assigned to one of two treatment programs, depending on age and offense 
history (one for youth under the age of 14 without serious, violent, or 
chronic offense history, and one for youth ages 14 and older with 
serious, violent, or chronic delinquencies) or to a comparison group 
that received preexisting court affiliated service programs. The 
evaluators themselves had no power to develop or modify programs. The 
funding agencies[Footnote 13] contracted with a local parent support 
agency and with a nonprofit community-based agency to implement the 
programs, but the program for youth under the age of 14 was never 
implemented.[Footnote 14] In addition, partway through the study, the 
funding agencies decided to terminate random assignment of juveniles, 
and shortly thereafter ended the program. As a result, the evaluators 
had complete data on 45 juveniles who had been in the treatment 
program, rather than on the 100 juveniles they had proposed to study. 
Although the study continued to collect data on juveniles eligible for 
the study (who were then assigned to the comparison group, since a 
treatment option was no longer available), the evaluators proposed to 
analyze the data from the random experiment separately, examining only 
those treatment and comparison youths assigned when program slots were 
available. Because of the smaller number of participants than 
anticipated, detailed analyses of certain variables (such as the type, 
or amount of service received, or the effects of race and gender) are 
likely to be unreliable.

Low Response Rates May Reduce the Reliability and Validity of Findings:

Low response rates were a problem in 2 of the 6 studies, potentially 
reducing the reliability and validity of the findings. In a third 
study, response rates were not reported, making it impossible for us to 
determine whether this was a problem or not.[Footnote 15] In one study 
where the response rate was a problem, the evaluators attempted to 
survey victims of domestic abuse, a population that NIJ officials 
acknowledged was difficult to reach. In An Evaluation of Victim 
Advocacy With a Team Approach, the evaluators attempted to contact by 
telephone women who were victims of domestic violence, to inquire about 
victims' experiences with subsequent violence and their perceptions of 
safety. Response rates were only about 23 percent, and the victims who 
were interviewed differed from those who were not interviewed in terms 
of the nature and seriousness of the abuse to which they had been 
subjected. NIJ's program manager told us that when she became aware of 
low response rates on the telephone survey, she and the principal 
investigator discussed a variety of strategies to increase response 
rates. She said the grantee expended additional time and effort to 
increase the response rate, but had limited success. In the other study 
with low response rates--Reducing Non-Emergency Calls to 911: An 
Assessment of Four Approaches to Handling Citizen Calls for Service--
investigators attempted to survey police officers in one city regarding 
their attitudes about the city's new non-emergency phone system. Only 
20 percent of the police officers completed the survey.

Some Evaluation Studies Had Serious Design Limitations from the 
Beginning:

Four of the evaluation studies began with serious design problems that 
diminished their ability to produce reliable or valid findings about 
program outcomes. One of the studies was completed, and 3 were ongoing. 
The studies' design problems included the lack of comparison groups, 
failure to measure the intended outcomes of the program, and failure to 
collect preprogram data as a baseline for the outcomes of interest (see 
table 4). Funding for these studies that began with serious 
methodological problems totaled about $4.7 million, or about 30 percent 
of the approximately $15.4 million spent on the studies we reviewed.

Table 4: Design Limitations in 4 NIJ Outcome Evaluation Studies:

Evaluation study: National Evaluation of the Rural Domestic Violence 
and Child Victimization Enforcement Grant Program; No comparison group: 
Yes; Intended outcomes not measured: Yes; Limited pre-program data: Yes.

Evaluation study: National Evaluation of the Domestic Violence Victims' 
Civil Legal Assistance Program; No comparison group: Yes; Intended 
outcomes not measured: No; Limited pre-program data: Yes.

Evaluation study: Multi-Site Demonstration of Collaborations to Address 
Domestic Violence and Child Maltreatment; No comparison group: Yes; 
Intended outcomes not measured: Yes; Limited pre-program data: No.

Evaluation study: Corrections and Law Enforcement Family Support: Law 
Enforcement Field Test; No comparison group: Yes; Intended outcomes not 
measured: No; Limited pre-program data: No.

Source: GAO analysis of NIJ data.

[End of table]

Lack of Comparison Groups:

None of the 4 outcome evaluation studies had a comparison group built 
into the design--a factor that hindered the evaluator's ability to 
isolate and minimize external factors that could influence the results 
of the study. The completed National Evaluation of the Rural Domestic 
Violence and Child Victimization Enforcement Grant Program did not make 
use of comparison groups to study the effectiveness of the federal 
grant program that supports projects designed to prevent and respond to 
domestic violence, dating violence, and child victimization in rural 
communities. Instead, evaluators collected case study data from 
multiday site visits to 9 selected sites.

The other three funded grant proposals submitted to NIJ indicated that 
they anticipated difficulty in locating and forming appropriate 
comparison groups. However, they proposed to explore the feasibility of 
using comparison groups in the design phase following funding of the 
grant. At the time of our review, when each of these studies was well 
into implementation, none was found to be using a comparison group. For 
example, the Evaluation of a Multi-Site Demonstration of Collaborations 
to Address Domestic Violence and Child Maltreatment proposed to examine 
whether steps taken to improve collaboration between dependency courts, 
child protective services, and domestic violence service providers in 
addressing the problems faced by families with co-occurring instances 
of domestic violence and child maltreatment resulted in improvements in 
how service providers dealt with domestic violence and child 
maltreatment cases. Although NIJ stated that the evaluators planned to 
collect individual case record data from similar communities, at the 
time of our review these sites had not yet been identified, nor had a 
methodology for identifying the sites been proposed. Our review was 
conducted during the evaluation's third year of funding.

Intended Outcomes of Program Were Not Measured:

Although they were funded as outcome evaluations, 2 of the 4 studies 
were not designed to provide information on intended outcomes for 
individuals served by the programs. Both the Rural Domestic Violence 
and the Multi-Site Demonstration of Collaborations programs had as 
their objectives the enhanced safety of victims, among other goals. 
However, neither of the evaluations of these programs collected data on 
individual women victims and their families in order to examine whether 
the programs achieved this objective. Most of the data collected in the 
Rural Domestic Violence evaluation were indicators of intermediary 
results, such as increases in the knowledge and training of various 
rural service providers. While such intermediary results may be 
necessary precursors to achieving the program's objectives of victim 
safety, they are not themselves indicators of victim safety. The Multi-
Site Demonstration of Collaborations evaluation originally proposed to 
collect data on the safety of women and children as well as perpetrator 
recidivism, but in the second year of the evaluation project, the 
evaluators filed a request to change the scope of the study. 
Specifically, they noted that the original outcome indicators proposed 
for victim safety were not appropriate given the time frame of the 
evaluation compared to the progress of the demonstration project 
itself. The modified scope, which was approved by NIJ, focused on 
system rather than individual level outcomes. The new 'effectiveness' 
indicators included such things as changes in policies and procedures 
of agencies participating in the collaboration, and how agency 
personnel identify, process, and manage families with co-occurring 
domestic violence and child maltreatment. Such a design precludes 
conclusions about whether the programs improved the lives of victims of 
domestic violence or their children.

Lack of Pre-Program Data Hinders Ability to Show That Program Produced 
Change:

As discussed in our March 2002 report, the Rural Domestic Violence 
evaluation team did not collect baseline data prior to the start of the 
program, making it difficult to identify change resulting from the 
program. In addition, at the time of our review, in the third year of 
the multi-year National Evaluation of the Domestic Violence Victims' 
Civil Legal Assistance Program evaluation, the evaluator did not know 
whether baseline data would be available to examine changes resulting 
from the program. This evaluation, of the federal Civil Legal 
Assistance program,[Footnote 16] proposed to measure whether there had 
been a decrease in pro se representation (or self-representation) in 
domestic violence protective order cases. A decrease in pro se 
representation would indicate successful assistance to clients by Civil 
Legal Assistance grantees. In May 2003, NIJ reported that the evaluator 
was still in the process of contacting the court systems at the study 
sites to see which ones had available data on pro se cases. The 
evaluator also proposed to ask a sample of domestic violence victims 
whether they had access to civil legal assistance services prior to the 
program, the outcomes of their cases, and satisfaction with services. 
Respondents were to be selected from a list of domestic violence 
clients served by Civil Legal Assistance grantees within a specified 
time period, possibly 3 to 9 months prior to the start of the outcome 
portion of the study. Such retrospective data on experiences that may 
have occurred more than 9 months ago must be interpreted with caution, 
given the possibility of recall errors or respondents' lack of 
knowledge about services that were available in the past.

NIJ Has Funded Outcome Evaluations Despite Major Gaps in Knowledge 
about the Availability of Data and Comparison Groups:

Outcome evaluations are inherently difficult to conduct because in 
real-world settings program results can be affected by factors other 
than the intervention being studied. In addition, grantees' ability to 
conduct such evaluations can depend on the extent to which information 
is available up front about what data are available to answer the 
research questions, where such data can be obtained, and how the data 
can be collected for both the intervention and comparison groups. We 
found that in 3 of the 15 NIJ evaluations we reviewed, NIJ lacked 
sufficient information about these issues to assure itself that the 
proposals it funded were feasible to carry out. These 3 studies totaled 
about $3.7 million.

For the Evaluation of Non-Emergency Calls to 911, NIJ and DOJ's Office 
of Community Oriented Policing Services jointly solicited grant 
proposals to evaluate strategies taken by 4 cities to decrease non-
emergency calls to the emergency 911 system. NIJ officials told us that 
they had conducted 3-day site visits of the 4 sites, and that 
discussions with local officials included questions about availability 
of data in each jurisdiction. The NIJ solicitation for proposals 
contained descriptions of how non-emergency calls were processed at all 
4 sites, but no information on the availability of outcome data to 
assess changes in the volume, type, and nature of emergency and non-
emergency calls before and after the advent of the non-emergency 
systems. Evaluators were asked to conduct both a process analysis and 
an assessment analysis. The assessment analysis was to include 
"compiling and/or developing data" on a number of outcome questions. 
Once the study was funded, however, the grantee learned that only 1 of 
the 4 cities had both a system designed specifically to reduce non-
emergency calls to 911, as well as reliable data for evaluation 
purposes.

In the case of the Multi-Site Demonstration of Collaborations to 
Address Domestic Violence and Child Maltreatment, NIJ funded the 
proposal without knowing whether the grantee would be able to form 
comparison groups. NIJ officials stated that one of the reasons for 
uncertainty about the study design was that at the time the evaluator 
was selected, the 6 demonstration sites had not yet been selected. The 
proposal stated that the grantee would explore the "potential for 
incorporating comparison communities or comparison groups at the site 
level, and assess the feasibility, costs, and contributions and 
limitations of a design that incorporates comparison groups or 
communities." NIJ continued to fund the grantee for 3 additional years, 
although the second year proposal for supplemental funding made no 
mention of comparison groups and the third year proposal stated that 
the grantee would search for comparison sites, but did not describe how 
such sites would be located. In response to our questions about whether 
comparison groups would be used in the study, NIJ officials said that 
the plan was for the grantee to compare a random sample of case records 
from before program implementation to those after implementation at 
each of the demonstration sites. Designs utilizing pre-post treatment 
comparisons within the same group are not considered to be as rigorous 
as pre-post-treatment comparison group designs because they do not 
allow evaluators to determine whether the results are due to the 
program under consideration or to some other programs or external 
factors.

NIJ also approved the Multi-Site Demonstration of Collaborations 
proposal without knowing whether data on individual victims of domestic 
violence and child maltreatment would be available during the time 
frame of the evaluation. The first year proposal stated that the 
grantee would examine outcomes for individuals and families, although 
it also noted that there are challenges to assessing such outcomes and 
that system outcomes should be examined first. Our review found that in 
the third year of the evaluation, data collection was focused solely on 
"system" outcomes, such as changes in policies and procedures and how 
agency personnel identify, process, and manage families with co-
occurring domestic violence and child maltreatment. Thus, although the 
original design called for answering questions about the outcomes of 
the program for individuals and families, NIJ could not expect answers 
to such questions.[Footnote 17]

In the case of the Civil Legal Assistance study, NIJ officials told us 
that they have held discussions with the grantee about the feasibility 
of adding comparison groups to the design. According to these 
officials, the grantee said that a comparison group design would force 
it to reduce the process sites to be studied from 20 to somewhere 
between 6 and 8. NIJ advised the grantee that so large a reduction in 
sites would be too high a price to pay to obtain comparison groups, and 
advised the grantee to stay with the design as originally proposed. 
Consequently, NIJ cannot expect a rigorous assessment of outcomes from 
this evaluation.

Completed Outcome Evaluations Produced Useful Information on Processes 
but Not on Outcomes for DOJ Program Administrators:

Of the 5 completed NIJ studies that focused on issues of interest to 
DOJ program offices, findings related to program effectiveness were not 
sufficiently reliable or conclusive. However, DOJ program 
administrators told us that they found some of the process and 
implementation findings from the completed studies to be 
useful.[Footnote 18]

Program administrators from DOJ's Office on Violence Against Women said 
that although they did not obtain useful outcome results from the Rural 
Domestic Violence evaluation, they identified two "lessons learned" 
from the process and implementation components of the study. First, the 
evaluation found that very little information was available to grantees 
regarding how to create collaborative programs. Thus, DOJ engaged a 
technical assistance organization to develop a training program on how 
to create collaborative projects based on the experiences of some of 
the grantees examined by the Rural evaluation. Second, program 
administrators told us that the evaluation found that because Rural 
grants were funded on an 18-month schedule, programs did not have 
adequate time to structure program services and also collect useful 
program information. As a result, Rural programs are now funded for at 
least 24 months.[Footnote 19]

While shortcomings in NIJ's outcome evaluations of law enforcement 
programs leave questions about whether the programs are effective and 
whether they should continue to be funded, program administrators in 
DOJ's Office of Community Oriented Policing Services said that the 
studies helped identify implementation problems that assisted them in 
developing and disseminating information in ways useful to the law 
enforcement community. These included curriculum development, 
leadership conferences, and fact sheets and other research 
publications. For example, as a result of the NIJ-managed study, 
Responding to the Problem Police Officer: An Evaluation of Early 
Warning Systems,[Footnote 20] DOJ officials developed a draft command 
level guidebook that focuses on the factors to be considered in 
developing an early warning system, developed an early warning 
intervention training curriculum that is being taught by the 31 
Regional Community Policing Institutes[Footnote 21] located across the 
country, and convened a "state-of-art" conference for five top law 
enforcement agencies that were developing early warning systems. DOJ 
officials also said the studies showed that the various systems 
evaluated had been well received by citizens and law enforcement 
officials. For example, they said that citizens like the 311 non-
emergency number that was established in several cities to serve as an 
alternative to calling the 911 emergency number. The system allows law 
enforcement officers to identify hot spots or trouble areas in the city 
by looking at various patterns in the citizen call data. Officials may 
also be able to monitor the overall state of affairs in the city, such 
as the presence of potholes, for example. Similarly, Chicago's City-
Wide Community Policing program resulted in the development of a crime 
mapping system, enabling officers to track crime in particular areas of 
the city. Like the non-emergency telephone systems, DOJ officials 
believe that crime mapping helps inform citizens, police, and policy 
makers about potential problem areas.

NIJ's Current and Planned Activities to Improve Its Evaluation Program:

NIJ officials told us that they have begun to take several steps to try 
to increase the likelihood that outcome evaluations will produce more 
definitive results. We recommended in our March 2002 report on selected 
NIJ-managed outcome evaluations[Footnote 22] that NIJ assess its 
evaluation process to help ensure that future outcome evaluations 
produce definitive results. In November 2002, Congress amended the 
relevant statute to include cost-effectiveness evaluation where 
practical as part of NIJ's charge to conduct evaluations.[Footnote 23] 
Since that time NIJ has established an Evaluation Division within NIJ's 
Office of Research and Evaluation. NIJ officials told us that they have 
also placed greater emphasis on funding cost-benefit studies, funded 
feasibility studies prior to soliciting outcome evaluations, and placed 
greater emphasis on applicants' prior performance in awarding grants.

In January 2003, NIJ established an Evaluation Division within NIJ's 
Office of Research and Evaluation, as part of a broader reorganization 
of NIJ programs. According to NIJ, the Division will "oversee NIJ's 
evaluations of other agency's [sic] programs and…develop policies and 
procedures that establish standards for assuring quality and utility of 
evaluations."[Footnote 24] NIJ officials told us that among other 
things, the Division will be responsible for recommending to the NIJ 
Director which evaluations should be undertaken, assigning NIJ staff to 
evaluation grants and overseeing their work, and maintaining oversight 
responsibility for ongoing evaluation grants. In addition, NIJ 
officials told us that one of the NIJ Director's priorities is to put 
greater emphasis on evaluations that examine the costs and benefits of 
programs or interventions. To support this priority, NIJ officials told 
us that the Evaluation Division had recently developed training for NIJ 
staff on cost-benefit and cost-effectiveness analysis.[Footnote 25]

NIJ recently undertook 37 "evaluability assessments" to assess the 
feasibility of conducting outcome evaluations of congressionally 
earmarked programs prior to soliciting proposals for 
evaluation.[Footnote 26] In 2002 and 2003, these assessments were 
conducted to examine each project's scope, activities, and potential 
for rigorous evaluation.[Footnote 27] The effort included telephone 
interviews and site visits to gather information regarding such things 
as what outcomes could be measured, what kinds of data were being 
collected by program staff, and the probability of using a comparison 
group or random assignment in the evaluation. Based on the review, NIJ 
solicited proposals from the research community to evaluate a subset of 
the earmarked programs that NIJ believed were ready for outcome 
evaluation.[Footnote 28]

NIJ officials also stated that in an effort to improve the performance 
of its grantees, it has begun to pay greater attention to the quality 
and timeliness of their performance on previous NIJ grants when 
reviewing funding proposals. As part of NIJ's internal review of grant 
applications, NIJ staff check that applicants' reports are complete and 
accurate and evaluate past work conducted by the applicant using 
performance related measures. Although this is not a new activity, NIJ 
officials told us that NIJ was now placing more emphasis on reviewing 
applicants' prior performance than it had in the past.[Footnote 29] NIJ 
officials told us that NIJ staff may also contact staff in other OJP 
offices, where the applicant may have received grant funding, to assess 
applicant performance on those grants.

Conclusions:

Our in-depth review of 15 outcome evaluations managed by NIJ during the 
past 10 years indicated that the majority was beset with methodological 
and/or implementation problems that limited the ability to draw 
meaningful conclusions about the programs' effectiveness. Although our 
sample is not representative of all NIJ outcome evaluations conducted 
during the last 10 years, it includes those that have received a large 
proportion of the total funding for this type of research, and tends to 
be drawn from the most recent work. The findings from this review, 
coupled with similar findings we reported in other reviews of NIJ 
outcome evaluations, raise concerns about the level of attention NIJ is 
focusing on ensuring that funded outcome evaluations produce credible 
results.

We recognize that it is very difficult to design and execute outcome 
evaluations that produce meaningful and definitive results. Real world 
evaluations of complex social programs inevitably pose methodological 
challenges that can be difficult to control and overcome. Nonetheless, 
we believe it is possible to conduct outcome evaluations in real world 
settings that produce meaningful results. Indeed, 5 of NIJ's outcome 
evaluations can be characterized in this way, and these 5 accounted for 
about 48 percent of the $15.4 million spent on the studies we reviewed. 
We also believe that NIJ could do more to help ensure that the millions 
of dollars it spends annually to evaluate criminal justice programs is 
money well spent. Indeed, poor evaluations can have substantial costs 
if they result in continued funding for ineffective programs or the 
curtailing of funding for effective programs.

NIJ officials told us that they recognize the need to improve their 
evaluation efforts and have begun to take several steps in an effort to 
increase the likelihood that outcome evaluations will produce more 
conclusive results. These steps include determining whether a program 
is ready for evaluation and monitoring evaluators' work more closely. 
We support NIJ's efforts to improve the rigor of its evaluations. 
However, it is too soon to tell whether and to what extent these 
efforts will lead to NIJ funding more rigorous effectiveness 
evaluations, and result in NIJ obtaining evaluative information that 
can better assist policy makers in making decisions about criminal 
justice funding priorities. In addition to the steps that NIJ is 
taking, we believe that NIJ can benefit from reviewing problematic 
studies it has already funded in order to determine the underlying 
causes for the problems and determine ways to avoid them in the future.

Recommendations for Executive Action:

We recommend that the Attorney General instruct the Director of NIJ to:

* Conduct a review of its ongoing outcome evaluation grants--including 
those discussed in this report--and develop appropriate strategies and 
corrective measures to ensure that methodological design and 
implementation problems are overcome so the evaluations can produce 
more conclusive results. Such a review should consider the design and 
implementation issues we identified in our assessment in order to 
decide whether and what type of intervention may be appropriate. If, 
based on NIJ's review, it appears that the methodological problems 
cannot be overcome, NIJ should consider refocusing the studies' 
objectives and/or limiting funding.

* Continue efforts to respond to our March 2002 recommendation that NIJ 
assess its evaluation process with the purpose of developing approaches 
to ensure that future outcome evaluation studies are funded only when 
they are effectively designed and implemented. The assessment could 
consider the feasibility of such steps as:

* obtain more information about the availability of outcome data prior 
to developing a solicitation for research;

* require that outcome evaluation proposals contain more detailed 
design specifications before funding decisions are made regarding these 
proposals; and:

* more carefully calibrate NIJ monitoring procedures to the cost of the 
grant, the risks inherent in the proposed methodology, and the extent 
of knowledge in the area under investigation.

Agency Comments and our Evaluation:

We provided a copy of a draft of this report to the Attorney General 
for review and comment. In a September 4, 2003, letter, DOJ's Assistant 
Attorney General for the Office of Justice Programs commented on the 
draft. Her comments are summarized below and presented in their 
entirety in appendix III.

The Assistant Attorney General stated that NIJ agreed with our 
recommendations. She also highlighted NIJ's current and planned 
activities to improve its evaluation program. For example, as we note 
in the report, NIJ has established an Evaluation Division and initiated 
a new strategy of evaluability assessments. Evaluability assessments 
are intended to be quick, low cost initial assessments of criminal or 
juvenile justice programs to help NIJ determine if the necessary 
conditions exist to warrant sponsoring a full-scale outcome evaluation. 
To improve its grantmaking process, the Assistant Attorney General 
stated that NIJ is developing a new grant "special conditions" that 
will require grantees to document all changes in the scope and 
components of evaluation designs. In response to our concerns, NIJ also 
plans, in fiscal year 2004, to review its grant monitoring procedures 
for evaluation grants in order to more intensively monitor the larger 
or more complex grants. NIJ also plans to conduct periodic reviews of 
its evaluation research portfolio to assess the progress of ongoing 
grants. This procedure is to include documenting any changes in 
evaluation design that may have occurred and reassessing the expected 
benefits of ongoing projects.

In her letter, the Assistant Attorney General made two substantive 
comments--both concerning our underlying assumptions in conducting the 
review--with which we disagree. In her first comment, the Assistant 
Attorney General noted that our report implies that conclusive 
evaluation results can always be achieved if studies are rigorously 
designed and carefully monitored. We disagree with this 
characterization of the implication of our report. While sound research 
design and careful monitoring of program implementation are factors 
that can significantly affect the extent to which outcome evaluation 
results are conclusive, they are not the only factors. We believe that 
difficulties associated with conducting outcome evaluations in real 
world settings can give rise to situations in which programs are not 
implemented as planned or requisite data turn out not to be available. 
In such instances, even a well-designed and carefully monitored 
evaluation will not produce conclusive findings about program 
effectiveness. Our view is that when such problems occur, NIJ should 
respond and take appropriate action. NIJ could (1) take steps to 
improve the methodological adequacy of the studies if it is feasible to 
do so, (2) reconsider the purpose and scope of evaluation if there is 
interest in aspects of the program other than its effectiveness, or (3) 
decide to end the evaluation project if it is not likely to produce 
useful information on program outcomes.

In her second comment, the Assistant Attorney General expressed the 
view that our work excluded consideration of valid, high quality 
evaluation methods other than experimental and quasi-experimental 
design. We believe that our assessment of NIJ's outcome evaluations was 
both appropriate and comprehensive. We examined a variety of 
methodological attributes of NIJ's studies in trying to assess whether 
they would produce sufficiently sound information on program outcomes. 
Among other things, we systematically examined such factors as the type 
of evaluation design used; how program effects were isolated (that is, 
whether comparison groups or statistical controls were utilized); the 
size of study samples and appropriateness of sampling procedures; the 
reliability, validity, and appropriateness of outcome measures; the 
length of follow-up periods on program participants; the extent to 
which program attrition or program participant nonresponse may have 
been an issue; the appropriateness of analytic techniques that were 
employed; and the reported results. Therefore, we made determinations 
about the cause and effect linkages between programs and outcomes using 
a myriad of methodological information. In discussing the 
methodological strengths of experimental and quasi-experimental 
designs, we did not intend to be dismissive of other potential 
approaches to isolating the effects of program interventions. For 
example, if statistical controls can be employed to adequately 
compensate for a methodological weakness such as the existence of a 
comparison group that is not comparable on characteristics that could 
affect the study's outcome, then we endorse the use of such a 
technique. However, in those instances where our review found that 
NIJ's studies could not produce sufficiently sound information about 
program outcomes, we saw no evidence that program effects had been 
isolated using alternative, compensatory, or supplemental methods.

In addition to these comments, the Assistant Attorney General also 
provided us with a number of technical comments, which we incorporated 
in the report as appropriate.

As arranged with your office, unless you publicly announce its contents 
earlier, we plan no further distribution of this report until 14 days 
from the date of this report. At that time, we will send copies to the 
Attorney General, appropriate congressional committees and other 
interested parties. In addition, the report will be available at no 
charge on GAO's Web site at http://www.gao.gov.

Sincerely yours,

Laurie E. Ekstrand 

Director, Homeland Security and Justice Issues:

Signed by Laurie E. Ekstrand: 

[End of section]

Appendix I: Objectives, Scope, and Methodology:

In response to your request, we undertook a review of the outcome 
evaluation work performed under the direction of the National Institute 
of Justice (NIJ) during the last 10 years. We are reporting on (1) the 
methodological quality of a sample of completed and ongoing NIJ outcome 
evaluation grants and (2) the usefulness of the evaluations in 
producing information on program outcomes.

Our review covered outcome evaluation grants managed by NIJ from 1992 
through 2002. Outcome evaluations are defined as those efforts designed 
to determine whether a program, project, or intervention produced its 
intended effects. These kinds of studies can be distinguished from 
process evaluations, which are designed to assess the extent to which a 
program is operating as intended.

To determine the methodological quality of a sample of NIJ-managed 
outcome evaluations, we asked NIJ, in June 2002, to identify and give 
us a list of all outcome evaluations managed by NIJ that were initiated 
during the last 10 years, or initiated at an earlier date but completed 
during the last 5 years. NIJ identified 96 evaluation studies that 
contained outcome evaluation components that had been awarded during 
this period. A number of these studies included both process and 
outcome components. We did not independently verify the accuracy or 
completeness of the data NIJ provided.

These 96 evaluations were funded for a total of about $36.6 million. 
Individual grant awards ranged in size from $22,374 to about $2.8 
million. Twenty grants were awarded for $500,000 or more, for a total 
of about $22.8 million (accounting for about 62 percent of all funding 
for NIJ outcome evaluations during the 10-year review period); 51 
grants for less than $500,000, but more than $100,000, for a total of 
about $11.7 million (accounting for about 32 percent of all NIJ outcome 
evaluation funding); and 25 grants for $100,000 or less, for a total of 
about $2.1 million (accounting for about 6 percent of all NIJ outcome 
evaluation funding). Fifty-one of the 96 evaluations had been completed 
at the time of our review; 45 were ongoing.

From the list of 96 outcome evaluation grants, we selected a judgmental 
sample of 16 grants for an in-depth methodological review. Our sample 
selection criteria were constructed so as to sample both large and 
medium-sized grants (in terms of award size), and both completed and 
ongoing studies. We selected 8 large evaluations--funded at $500,000 or 
above--and 8 medium-sized evaluations--funded at between $101,000 and 
$499,000. Within each group of 8 we selected the 4 most recently 
completed evaluations, and the 4 most recently initiated evaluations 
that were still ongoing, in an effort to ensure that the majority of 
the grants reviewed were subject to the most recent NIJ grant 
management policies and procedures. One of the medium-sized ongoing 
evaluations was dropped from our review when we determined that the 
evaluation was in the formative stage of development; that is, the 
application had been awarded but the methodological design had not yet 
been fully developed. As a result, our in-depth methodological review 
covered 15 NIJ-managed outcome evaluations accounting for about 42 
percent of the total spent on outcome evaluation grants between 1992 
and 2002 (see tables 5 and 6). These studies are not necessarily 
representative of all outcome evaluations managed by NIJ during this 
period.

Table 5: Number and Size of Outcome Evaluation Awards Made by NIJ from 
1992 through 2002, and Reviewed by GAO:

Large ($500,000 or more): No.

Size of grant: Large ($500,000 or more); All NIJ outcome evaluations: 
Number of grants: 20; All NIJ outcome evaluations: Total funding: 
$22,801,186; NIJ outcome evaluations reviewed by GAO: Number 
of grants (percent reviewed in category): 8 (40%); NIJ outcome 
evaluations reviewed by GAO: Total funding (percent reviewed in 
category): $13,654,211 (60%).

Size of grant: Medium ($101,000-$499,000); All NIJ outcome evaluations: 
Number of grants: 51; All NIJ outcome evaluations: Total funding: 
11,687,679; NIJ outcome evaluations reviewed by GAO: Number of 
grants (percent reviewed in category): 7 (14%); NIJ outcome evaluations 
reviewed by GAO: Total funding (percent reviewed in category): 
1,765,915 (15%).

Size of grant: Small ($100,000 or less); All NIJ outcome evaluations: 
Number of grants: 25; All NIJ outcome evaluations: Total funding: 
2,110,737; NIJ outcome evaluations reviewed by GAO: Number of 
grants (percent reviewed in category): N/A; NIJ outcome evaluations 
reviewed by GAO: Total funding (percent reviewed in category): N/A.

Size of grant: Total; All NIJ outcome evaluations: Number of grants: 
96; All NIJ outcome evaluations: Total funding: $36,599,602; 
NIJ outcome evaluations reviewed by GAO: Number of grants (percent 
reviewed in category): 15 (16%); NIJ outcome evaluations reviewed by 
GAO: Total funding (percent reviewed in category): $15,420,126 (42%).

Source: GAO analysis of NIJ data.

[End of table]

Table 6: Size and Completion Status of the 15 Evaluations Selected for 
Methodological Review:

Grant title: 

National Evaluation of Gang Resistance Education and Training Program; 
Award: $1,568,323; Size of award: Large: Yes; Size of award: Medium: 
No; Status: Completed: Yes; Status: Ongoing: No.

Evaluation of Chicago's Citywide Community Policing Program; Award: 
$2,157,859; Size of award: Large: Yes; Size of award: Medium: No; 
No; Status: Completed: Yes; Status: Ongoing: No.

National Evaluation of the Rural Domestic Violence and Child 
Victimization Enforcement Grant Program; Award: $719,949; Size of 
award: Large: Yes; Size of award: Medium: No; No; Status: 
Completed: Yes; Status: Ongoing: No.

Evaluation of Breaking the Cycle; Award: $2,419,344; Size of award: 
Large: Yes; Size of award: Medium: No; No; Status: Completed: 
Yes; Status: Ongoing: No.

National Evaluation of the Domestic Violence Victims' Civil Legal 
Assistance Program; Award: $800,154; Size of award: Large: Yes; Size of 
award: Medium: No; No; Status: Completed: No; Status: 
Ongoing: Yes.

Evaluation of a Multi-Site Demonstration of Collaborations to Address 
Domestic Violence and Child Maltreatment; Award: $2,498,638; Size of 
award: Large: Yes; Size of award: Medium: No; No; Status: 
Completed: No; Status: Ongoing: Yes.

Corrections and Law Enforcement Family Support: Law Enforcement Field 
Test; Award: $649,990; Size of award: Large: Yes; Size of award: Medium: 
No; No; Status: Completed: No; Status: Ongoing: Yes.

Evaluation of a Multi-Site Demonstration for Enhanced Judicial 
Oversight of Domestic Violence Cases; Award: $2,839,954; Size of award: 
Large: Yes; Size of award: Medium: No; No; Status: Completed: 
No; Status: Ongoing: Yes.

Evaluation of a Comprehensive Service-Based Intervention Strategy in 
Public Housing; Award: $187,412; Size of award: Large: No; Size of 
award: Medium: Yes; No; Status: Completed: Yes; Status: Ongoing: 
No. 

An Evaluation of Victim Advocacy with a Team Approach; Award: $153,491; 
Size of award: Large: No; Size of award: Medium: Yes; No; 
Status: Completed: Yes; Status: Ongoing: No.

Reducing Non-Emergency Calls to 911: An Assessment of Four Approaches 
to Handling Citizen Calls for Service; Award: $399,919; Size of award: 
Large: No; Size of award: Medium: Yes; No; Status: Completed: 
Yes; Status: Ongoing: No.

Responding to the Problem Police Officer: An Evaluation of Early 
Warning Systems; Award: $174,643; Size of award: Large: No; Size 
of award: Medium: Yes; No; Status: Completed: Yes; Status: Ongoing: 
No. 

Evaluation of a Juvenile Justice Mental Health Initiative with 
Randomized Design; Award: $200,000; Size of award: Large: No; Size 
of award: Medium: Yes; No; Status: Completed: No; Status: 
Ongoing: Yes.

Culturally Focused Batterer Counseling for African-American Men; Award: 
$356,321; Size of award: Large: No; Size of award: Medium: Yes; 
No; Status: Completed: No; Status: Ongoing: Yes.

Testing the Impact of Court Monitoring and Batterer Intervention 
Programs at the Bronx Misdemeanor Domestic Violence Court; Award: 
$294,129; Size of award: Large: No; Size of award: Medium: Yes; 
No; Status: Completed: No; Status: Ongoing: Yes.

Source: GAO analysis of NIJ data.

[End of table]

The evaluations we selected comprised a broad representation of issues 
in the criminal justice field and of program delivery methods. In terms 
of criminal justice issues, 7 of the 15 evaluations focused on programs 
designed to reduce domestic violence, 4 focused on programs addressing 
the behavior of law enforcement officers, 2 focused on programs 
addressing drug abuse, and 2 focused on programs to deal with juvenile 
justice issues. In terms of program delivery methods, 3 evaluations 
examined national discretionary grant programs or nationwide 
cooperative agreements, 4 examined multisite demonstration programs, 
and 8 examined local programs or innovations.

For the 15 outcome evaluations we reviewed, we asked NIJ to provide any 
documentation relevant to the design and implementation of the outcome 
evaluation methodologies, such as the application solicitation, the 
grantee's initial and supplemental applications, progress notes, 
interim reports, requested methodological changes, and any final 
reports that may have become available. We used a data collection 
instrument to obtain information systematically about each program 
being evaluated and about the features of the evaluation methodology. 
We based our data collection and assessments on generally accepted 
social science standards.[Footnote 30] We examined such factors as 
whether evaluation data were collected before and after program 
implementation; how program effects were isolated (i.e., the use of 
nonprogram participant comparison groups or statistical controls); and 
the appropriateness of sampling, outcome measures, statistical 
analyses, and any reported results.[Footnote 31] A senior social 
scientist with training and experience in evaluation research and 
methodology read and coded the documentation for each evaluation. A 
second senior social scientist reviewed each completed data collection 
instrument and the relevant documentation for the outcome evaluation to 
verify the accuracy of every coded item. We relied on documents NIJ 
provided to us between October 2002 and May 2003 in assessing the 
evaluation methodologies and reporting on each evaluation's status. We 
grouped the studies into 3 categories based on our judgment of their 
methodological soundness. Although we recognize that the stronger 
studies may have had some weaknesses, and that the weaker studies may 
have had some strengths, our categorization of the studies was a 
summary judgment based on the totality of the information provided to 
us by NIJ. Following our review, we interviewed NIJ officials regarding 
NIJ's role in soliciting, selecting, and monitoring these grants, and 
spoke to NIJ grant managers regarding issues raised about each of the 
grants during the course of our methodological review.

In the course of our discussions with NIJ officials, we learned of 
changes NIJ has underway to improve its administration of outcome 
evaluation studies. To document these changes, we interviewed 
responsible NIJ officials, and requested and reviewed relevant 
documents. We are providing information in this report about these 
changes.

To identify the usefulness of the evaluations in producing information 
on program outcomes, we reviewed reported findings from completed NIJ-
managed outcome evaluations that either evaluated programs administered 
or funded by the Department of Justice (DOJ), or had been conducted 
with funding contributed by DOJ program offices (see table 7). Of the 8 
completed evaluations that we reviewed for methodological adequacy, 5 
had been conducted with funding contributed in part by DOJ program 
offices, including 2 evaluations funded in part by DOJ's Office on 
Violence Against Women (OVW) and 3 evaluations funded in part by DOJ's 
Office of Community Oriented Policing Services (COPS). Of the 2 
evaluations funded by OVW, 1 was a review of a national program 
administered by DOJ, and the other was a review of a locally 
administered program funded partially by an OVW grant. Of the 3 
evaluations funded by COPS, 2 were evaluations of programs funded at 
least in part with COPS funding, and the other was an evaluation of a 
program operating at several local law enforcement agencies, supported 
with local funding. Because of our interest in the effectiveness of 
criminal justice programs, we limited our review of the usefulness of 
NIJ outcome evaluations to evaluations of DOJ programs, or evaluations 
funded by DOJ program offices, and did not examine the 3 other 
completed NIJ outcome evaluations that focused on programs funded by 
agencies other than DOJ.

Table 7: Programs Evaluated and Funding Sources for Completed NIJ 
Outcome Evaluations:

Completed NIJ evaluations: OVW Evaluations: 

Completed NIJ evaluations: National Evaluation of the Rural Domestic 
Violence and Child Victimization Enforcement Grant Program; DOJ-funded 
program: Yes; Evaluation funded by DOJ program offices: Yes.

Completed NIJ evaluations: An Evaluation of Victim Advocacy with a Team 
Approach; DOJ-funded program: Yes; Evaluation funded by DOJ program 
offices: Yes.

Completed NIJ evaluations: COPS Evaluations: 

Completed NIJ evaluations: Evaluation of Chicago's Citywide Community 
Policing Program; DOJ-funded program: Yes; Evaluation funded by DOJ 
program offices: Yes.

Completed NIJ evaluations: Reducing Non-Emergency Calls to 911: An 
Assessment of Four Approaches to Handling Citizen Calls for Service; 
DOJ-funded program: Yes; Evaluation funded by DOJ program offices: Yes.

Completed NIJ evaluations: Responding to the Problem Police Officer: An 
Evaluation of Early Warning Systems; DOJ-funded program: No; Evaluation 
funded by DOJ program offices: Yes.

Completed NIJ evaluations: Other evaluations: 

Completed NIJ evaluations: National Evaluation of Gang Resistance 
Education and Training Program; DOJ-funded program: No; Evaluation 
funded by DOJ program offices: No.

Completed NIJ evaluations: Evaluation of Breaking the Cycle; DOJ-funded 
program: No; Evaluation funded by DOJ program offices: No.

Completed NIJ evaluations: Evaluation of a Comprehensive Service-Based 
Intervention Strategy in Public Housing; DOJ-funded program: No; 
Evaluation funded by DOJ program offices: No.

Source: GAO analysis of NIJ data.

[End of table]

We interviewed NIJ officials and relevant DOJ program administrators 
regarding whether these findings were used to implement improvements in 
the evaluated programs. At OVW and COPS, we asked officials the extent 
to which they (1) were involved in soliciting and developing the 
evaluation grant, and monitoring the evaluation; (2) were aware of the 
evaluation results; and (3) had made any changes to the programs they 
administered based on evaluation findings about the effectiveness of 
the evaluated programs.

We conducted our work at NIJ headquarters in Washington, D.C., between 
May 2002 and August 2003 in accordance with generally accepted 
government auditing standards.

[End of section]

Appendix II: Summaries of the NIJ Outcome Evaluations Reviewed:

Table 8: Evaluations with Sound Designs and Sound Implementation Plans:

Evaluation: Principal investigator; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: University of 
Nebraska at Omaha.

Evaluation: Program evaluated; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: The GREAT program 
began in 1991 with the goal of using federal, state, and local law 
enforcement agents to educate elementary school students in areas prone 
to gang activity about the destructive consequences of gang membership. 
The program seeks to prevent youth crime and violence by reducing 
involvement in gangs. According to the evaluator's proposal, as of 
April 1994, 507 officers in 37 states (150 sites) had completed GREAT 
training. GREAT targets middle school students (with an optional 
curriculum for third and fourth graders) and consists of 8 lessons 
taught over a 9-week period.

Evaluation: Evaluation components; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: Process and outcome 
evaluations began in 1994 and were completed in 2001. Total evaluation 
funding was $1,568,323. The outcome evaluation involved a cross-
sectional and longitudinal design. For the cross-sectional component, 
5,935 eighth grade students in 11 different cities were surveyed to 
assess the effectiveness of GREAT. Schools that had offered GREAT 
within the last 2 years were selected, and questionnaires were 
administered to all eighth graders in attendance on a single day. This 
sample constituted a 1-year follow-up of 2 ex-post facto groups: 
students who had been through GREAT and those who had not. A 5-year 
longitudinal, quasi-experimental component was conducted in 6 different 
cities. Schools in the 6 cities were selected purposively, to allow for 
random assignment where possible. Classrooms in 15 of 22 schools were 
randomly assigned to receive GREAT or not, whereas assignment in the 
remaining schools was purposive. A total of more than 3,500 students 
initially participated, and active consent was obtained for 2,045 
participants. Students were surveyed 2 weeks before the program, 2 
weeks after completion, and at 1-, 2-, 3-, and 4-year intervals after 
completion. Significant follow-up efforts were employed to maintain 
reasonable response rates. Concepts measured included attitudinal 
measures regarding crime, gangs and police; delinquency; drug sales and 
use; and involvement in gangs, gang activities, and risk-seeking 
behaviors. In addition, surveys were conducted with parents of the 
students participating in the longitudinal component, administrative 
and teaching staff at the schools in the longitudinal design, and 
officers who had completed GREAT training prior to July 1999.

Evaluation: Assessment of evaluation; The National Evaluation of the 
Gang Resistance Education and Training (GREAT) Program: Although 
conclusions from the cross-sectional component may be limited because 
of possible pre-existing differences between students who had been 
exposed to GREAT and students who had not and lack of detail about 
statistical controls employed, the design and analyses for the 
longitudinal component are generally sound, including random assignment 
of classrooms to the intervention in 15 of the 22 schools, collection 
of baseline and extensive follow-up data; and statistical controls for 
differential attrition rates of participant and comparison groups.

Evaluation: Principal investigator; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: Urban Institute.

Evaluation: Program evaluated; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: A consortium of 
federal agencies, led by the Office of National Drug Control Policy and 
NIJ, developed the Breaking the Cycle (BTC) demonstration program in 3 
sites to test the effectiveness of a comprehensive, coordinated 
endeavor to reduce substance abuse and criminal activity, and improve 
the health and social functioning of drug-involved offenders. The first 
site, Birmingham, Ala., received funding in 1997, and the next 2 sites, 
Tacoma, Wash., and Jacksonville, Fla. received funding in 1998. 
Participants were adult arrestees (for any type of crime) who tested 
positive for drug use and had a history of drug involvement. The 
program was based on the recognition that there was a link between drug 
use and crime, and it had the support of many criminal justice system 
officials who were willing to use the authority of the criminal justice 
system to reduce drug use among offenders. BTC intended to expand the 
scope of earlier programs such as drug courts and Treatment 
Alternatives to Street Crime by incorporating drug reduction activities 
as part of handling felony cases. BTC included early intervention; a 
continuum of treatment options tailored to participants' needs, 
including treatment readiness programs in jails; regular judicial 
monitoring and graduated sanctions; and collaboration among justice and 
treatment agencies.

Evaluation: Evaluation components; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: Begun in 1997, and 
the final report completed in 2003, the evaluation was funded for 
$2,419,344, and included both outcome and process components. 
Comparison groups were selected in each of the 3 sites, and were 
composed of defendants similar to the BTC participants who were 
arrested in the year before BTC was implemented. The evaluation 
examined program success in (1) reducing drug use and criminal 
activity, as measured by self-reported drug use in the 6 months prior 
to follow-up interviews and officially recorded arrests in the 12 
months after baseline; (2) improving the physical and mental health and 
family/social well-being of participants, as measured by self-reported 
interview data on problems experienced in these 3 areas during the 30 
days before follow-up; and (3) improving labor market outcomes for 
participating offenders, as measured by self-reported interview data on 
employment and social difficulties in the 30 days before follow-up. 
Survey data were collected at baseline and again at two intervals 
between 9 and 15 months after baseline. At baseline the sample sizes 
for the treatment and comparison groups were, respectively, 374 and 192 
in Birmingham, 335 and 444 in Jacksonville, and 382 and 351 in Tacoma. 
Response rates for the follow-up interviews varied across the 3 sites 
from 65 to 75 percent for the treatment groups, and from 71 to 73 
percent for the comparison groups. Method of assessment varied across 
sites and across samples, with some participants in both the comparison 
and treatment groups interviewed in person while others were 
interviewed by telephone. Multiple statistical analyses, including 
logistic regression, with controls for differences in demographics, 
offense history, substance abuse history, and work history between 
treatment and comparison groups were used. BTC's effect on the larger 
judicial environment was also assessed, using official records on the 
number of hearings, case closure rates, and other factors.; Cost-
benefit analyses of the BTC interventions were conducted at the three 
locations. The costs attributable to the BTC program were derived from 
budgetary information provided by program staff. The BTC program 
benefits were conceptualized as "costs avoided" arising from the social 
and economic costs associated with crime. The estimates of cost avoided 
in the study were based on (1) the costs (to society) associated with 
the commission of particular crimes and (2) the costs (to the criminal 
justice system) associated with arrests. Estimates of these components 
from the economic and criminal justice literature were applied to self-
reported arrest data from the program and comparison group subjects. 
The derived estimates of benefits were compared to program costs to 
form cost-benefit ratios for the interventions. An earlier effort to 
incorporate estimates of savings in service utilization from BTC (as a 
program benefit) was not included in the final report analysis due to 
inconclusive results.

Evaluation: Assessment of evaluation; The National Evaluation of the 
Gang Resistance Education and Training (GREAT) Program: The evaluation 
was well designed and implemented. The study used comparison groups to 
isolate and minimize external factors that could have influenced the 
results. While the comparison groups were selected and baseline data 
collected 1 year before the treatment groups were selected, the study 
corrected for selection bias and attrition, using multivariate models 
that incorporated control variables to measure observed sample 
differences. The study appears to have handled successfully other 
potential threats to the reliability and validity of results, by using 
appropriate statistical analyses to make adjustments. For example, the 
study relied on both self-reported measures of drug use and arrest 
histories as well as official records of arrests, to assess the effects 
of the program. Self-report measures are subject to errors in memory or 
self-presentational biases, while official records can be inaccurate 
and/or incomplete. The evaluators made use of both the self-report and 
official measures to attempt to control for these biases.; The 
methodological approach used in the cost benefit analysis was generally 
sound. The report specified the assumptions underlying the cost and 
benefit estimates, and appropriately discussed the limitations of the 
analysis for policymaking.

Evaluation: Principal investigator; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: The Urban Institute.

Evaluation: Program evaluated; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: The Judicial 
Oversight Demonstration (JOD) initiative is a multiyear program being 
implemented at 3 sites (City of Boston/Dorchester District Court, 
Mass.; Washtenaw County, Ann Arbor, Mich.; and Milwaukee County, Wis.) 
to address the problem of domestic violence. JOD tests the idea that a 
coordinated community, focused judicial, and systemic criminal justice 
response can improve victim safety and service provision, as well as 
offender accountability. JOD emphasizes uniform and consistent 
responses to domestic violence offenses, including coordinated victim 
advocacy and services; strong offender accountability and oversight; 
rigorous research and evaluation components; and centralized technical 
assistance. Demonstration sites have developed partnerships with a 
variety of public and private entities, including victim advocacy 
organizations, local law enforcement agencies, courts, and other social 
service providers. The program began in fiscal year 2000, and 
demonstration sites are expected to receive funding for 5 years.

Evaluation: Evaluation components; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: A process evaluation 
began in January 2000. The outcome component of the evaluation began in 
October 2002 and is to be completed by October 2005. At the time of our 
review, the evaluation grant amount was $2,839,954. Plans call for a 
full outcome assessment to be conducted in 2 sites and, because no 
appropriate comparison site could be identified, a partial assessment 
in the third site. The 2 sites with a full assessment were matched with 
comparison sites having similar court caseloads and population 
demographics; neither comparison site had a specialized court docket, 
enhanced judicial oversight, or a countywide coordinated system for 
handling domestic violence cases. Over 12 months, all domestic violence 
cases in each site, up to monthly size quotas, will be selected into 
the following groups: cases where the offender was found guilty and 
sentenced to jail for 6 months or less and probation or probation only, 
cases that were dismissed or diverted from prosecution, and cases where 
the offender received more than 6 months incarceration. Victims and 
offenders in the first group will be interviewed, and in the second 
group, victims only will be interviewed. Offender recidivism in both 
groups will be tracked for 1 year following the intervention using 
police and court records. For the third group, only offender recidivism 
will be tracked. In the partial assessment site, subject to data 
availability, the plan is to compare a sample of domestic violence 
cases in which the offender was placed on probation in the period 
before JOD implementation with a sample of cases in which the offender 
was placed on probation and scheduled for judicial review in the period 
after JOD implementation. Data about incidents, victims, and offenders 
are to be obtained from official records, and offender recidivism will 
be tracked using police and court records. Overall, short-term outcomes 
for the study are planned to include various measures of offender 
compliance and victim and offender perceptions of JOD, and long-term 
outcomes are planned to include various measures of offender 
recidivism, victim well-being, and case processing changes. In 
addition, to discern any system level changes due to JOD, aggregate, 
annual data on all domestic violence cases for the 2 years prior to and 
3 years after JOD implementation in all sites will be collected and 
analyzed.

Evaluation: Assessment of evaluation; The National Evaluation of the 
Gang Resistance Education and Training (GREAT) Program: The evaluation 
plan appears to be ambitious and well designed. A quasi-experimental 
design is planned, and data will be collected from multiple sources, 
including victims, offenders, and agencies. While lack of sustained 
cooperation, uneven response rates, and missing data could become 
problems, detailed plans seem to have been made to minimize these 
occurrences. The planned approach of selecting cases (choosing equal 
numbers of cases consecutively until a monthly quota is reached, over a 
12-month period) may be nearly as good as random sampling and takes 
into consideration seasonal variation. However, it could introduce 
biases, should there be variation as to the time each month when case 
selection begins.

Evaluation: Principal investigator; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: Indiana University 
of Pennsylvania.

Evaluation: Program evaluated; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: The purpose of this 
study is to test the relative effectiveness of culturally focused 
versus conventional batterer counseling for African-American men. It is 
based on research indicating that conventional counseling dropout and 
partner re-assault rates are higher for African-American men than they 
are for white men, and clinical literature in related fields that 
recommends culturally focused counseling to improve the effectiveness 
of counseling with African-American men. Culturally focused counseling 
refers to the counselor recognizing and responding to cultural issues 
that emerge in group sessions (including such topics as African-
American men's perceptions of the police, relationships with women, 
sense of African-American manhood, past and recent experiences of 
violence, and reactions to discrimination and prejudice), and a 
curriculum that includes the major cultural issues facing a particular 
group of participants. The setting for the evaluation is a counseling 
center in Pittsburgh, Pennsylvania.

Evaluation: Evaluation components; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: The evaluation began 
in September 2001, and the expected completion date is February 2005. 
At the time of our review, the grant amount was $356,321. A clinical 
trial will be conducted to test the effect of culturally focused 
counseling on the extent to which African-American men drop out of 
counseling, are accused of re-assaults, and are re-arrested for 
domestic violence. Plans are for 600 African-American men referred by 
the Pittsburgh Domestic Violence Court over a 12-month period to 
batterer counseling at the counseling center to be randomly assigned to 
either (1) a culturally focused counseling group of only African-
Americans, (2) conventional batterer counseling in an African-American 
only group, and (3) conventional counseling in a racially mixed group. 
Before assignment, however, the counseling center must recommend the 
men for participation in the study. Men included in the study will be 
administered a background questionnaire and two tests of culturally 
specific attitudes (i.e., racial acculturation and identity) at program 
intake. The men's female partners will be interviewed by phone 3 
months, 6 months, and 12 months after program intake. These structured 
interviews will collect information on the woman's relationship with 
the man, the man's behavior, and the woman's help-seeking. Clinical 
records of program attendance and police records of re-arrests will be 
obtained for each man. Planned analyses are to include (1) verification 
of equivalent culturally focused and conventional counseling sub-
samples at intake and during the follow-up; (2) comparison of the 
program dropouts, re-assaults, and re-arrests for the three counseling 
options at each follow-up interval and cumulatively; and (3) a 
predictive model of the re-assault outcome based on characteristics, 
cultural attitudes, and situational factors. Additionally, interviews 
with a sub-sample of 100 men about their counseling experience are to 
be conducted.

Evaluation: Assessment of evaluation; The National Evaluation of the 
Gang Resistance Education and Training (GREAT) Program: This is a well-
designed experiment to test the effect of a new approach to provide 
counseling to perpetrators of domestic violence. The researchers have 
plans to (1) adjust for any selection bias in group assignment and 
participant attrition through statistical analysis; (2) prevent 
"contamination" from counselors introducing intervention 
characteristics to control groups, or the reverse; and (3) monitor the 
response rates on the interviews with female partners. The evaluation 
is on-going. The most recent progress report we reviewed indicated that 
the evaluation is proceeding as planned, with the recruitment of 
batterers behind schedule by 1 month, the series of female partner 
interviews on schedule and very close to expected response rates, and 
the interviews with the sub-sample of batterers about three-quarters 
complete. One potential concern we have is that because all men 
referred by the domestic violence court to the counseling center may 
not be recommended to participate in the study, any bias in 
recommending study participants will determine the population to which 
the study's results can be generalized.

Evaluation: Principal investigator; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: Fund for the City of 
New York.

Evaluation: Program evaluated; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: Operating since 
1998, the Bronx Misdemeanor Domestic Violence Court handles spousal 
abuse misdemeanor cases. The court has the power to prescribe various 
conditions of discharge for batterers, including participation in group 
counseling and/or court monitoring. Given concerns about the 
effectiveness of these options, it was decided to test the efficacy of 
batterer counseling programs and court monitoring, alone and in 
combination with each other. Furthermore, court monitoring was tested 
based on the frequency of its administration--either monthly or on a 
graduated basis (less monitoring for fewer incidences of abuse). This 
was to ascertain whether graduated monitoring might give batterers more 
incentive to change.

Evaluation: Evaluation components; The National Evaluation of the Gang 
Resistance Education and Training (GREAT) Program: The evaluation began 
in September 2001 and is expected to be completed in August 2003. At 
the time of our review, this evaluation was funded for $294,129. The 
proposed study is an outcome evaluation of 4 different treatment 
alternatives for conditional discharge defendants in domestic violence 
cases. The treatment options are (1) counseling program and monthly 
court monitoring, (2) counseling program and graduated court 
monitoring, (3) monthly court monitoring program only, and (4) 
graduated court monitoring only. Participants in the evaluation (800 
total) are to be assigned randomly to 1 of the 4 treatments at the time 
of sentencing, and incidents of new crimes are to be measured 6 and 12 
months after sentencing. Official crime records at both intervals, and 
interviews with victims at the 12-month interval are the sources of 
data. The planned analysis involves looking at the groups as a whole, 
and subgroups related to age, criminal history, and current charge. 
Outcome measures are (1) completion of the conditional discharge or 
imposition of the jail alternative, (2) new arrests for domestic 
violence, and (3) new reports from victims of domestic violence 
incidents.

Evaluation: Assessment of evaluation; The National Evaluation of the 
Gang Resistance Education and Training (GREAT) Program: This is a well-
designed approach to measure the comparative efficacy of combinations 
of program counseling and variations in monitoring. However, at the 
time of our review, we had some concerns about how well implementation 
will proceed. One concern is that if one or more of the treatments is 
less effective, it could result in participants spending time in jail, 
reducing the possibility of further incidents. This difficulty can be 
addressed in the analysis, but neither the proposal nor subsequent 
progress reports discuss this or other differential attrition issues. 
Also, although the evaluators have a plan to try to ensure good 
response rates for the victims' survey, it is uncertain how effective 
they will be. Other surveys of similar populations have been 
problematic.

[End of table]

Table 9: Well-designed Evaluations That Encountered Implementation 
Problems:

Evaluation: Principal investigator; An Evaluation of Chicago's Citywide 
Community Policing Program: Northwestern University.

Evaluation: Program evaluated; An Evaluation of Chicago's Citywide 
Community Policing Program: Chicago's community policing program, known 
as Chicago's Alternative Policing Strategy (CAPS), began in April 1993. 
The program reorganizes policing around small geographical areas where 
officers assigned to beat teams meet with community residents to 
identify and address a broad range of neighborhood problems.

Evaluation: Evaluation components; An Evaluation of Chicago's Citywide 
Community Policing Program: There were 2 evaluation efforts in this 
study, 1 examining the prototype project and the second examining 
citywide program implementation. The combined evaluations were 
completed in August 2001, at a total cost of $2,157,859.; The prototype 
evaluation, conducted between April 1993 and September 1994, compared 
five areas that implemented CAPS with four areas that did not. Data 
from the 1990 Census were used to select four sections of the city that 
closely matched the demographics of the five prototype areas. Residents 
of all areas were first surveyed in the spring of 1993 regarding the 
quality of police service and its impact on neighborhood problems. 
Follow-up interviews occurred in either June or September of 1994 (14 
to 17 month time lags). Interviews were conducted by telephone in 
English and Spanish. The re-interview rate was about 60 percent. A 
total of 1,506 people were interviewed both times, an average of 180 in 
each prototype area and 150 in each comparison area.; The CAPS citywide 
evaluation began after the conclusion of the prototype evaluation in 
July 1994. The purpose of this evaluation was to assess how changing 
from a traditional policing approach to a community-centered approach 
would affect citizens' perceptions of the police, neighborhood problems 
and crime rates. The researchers administered annual citywide public 
opinion surveys between 1993 and 2001 (excluding 2000). The surveys 
covered topics such as police demeanor, responsiveness, and task 
performance. Surveys were also administered to officers at CAPS 
orientation sessions to obtain, among other things, aggregate 
indicators of changes in officers' attitudes toward CAPS. Changes in 
levels of recorded crimes were analyzed. Direct observations of police 
meetings, surveys of residents, and interviews with community activists 
were used to measure community involvement in problem solving and the 
capacity of neighborhoods to help themselves.

Evaluation: Assessment of evaluation; An Evaluation of Chicago's 
Citywide Community Policing Program: The 1992 crime rates were reported 
to be similar between prototype districts and their matched comparison 
areas and the baseline demographic measures used to match the two 
groups were basically similar. The initial and follow-up response rates 
of about 60 percent seem reasonable considering the likelihood of 
community mobility in these areas; however, attrition rates differed 
for various demographic characteristics, such as home ownership, race, 
age, and education, raising some concerns about whether the results are 
generalizable to the intended population. The follow-up time (14-17 
months) was the maximum period allowed by the planned citywide 
implementation of CAPS. A single follow-up survey and the citywide 
implementation precluded drawing firm conclusions about longer-term 
impacts of the prototype program.; Because CAPS was implemented 
throughout the city of Chicago in 1995, the CAPS citywide evaluation 
was not able to include appropriate comparison groups and could not 
obtain a measure of what would have happened without the benefits of 
the program. The authors used a variety of methods to examine the 
implementation and outcomes of the CAPS program, and stated that there 
was no elaborate research design involved because their focus was on 
organizational change. However, because the trends over time from 
resident surveys and crime data were presented without controls or 
comparison groups and some declines in crime began before the program 
was implemented, changes cannot be attributed solely to the program.

Evaluation: Principal investigator; An Evaluation of Chicago's Citywide 
Community Policing Program: Yale University School of Medicine.

Evaluation: Program evaluated; An Evaluation of Chicago's Citywide 
Community Policing Program: The program was an intervention strategy 
designed to reduce drug activity and foster family self-sufficiency in 
families living in a public housing complex in the city of New Haven, 
Conn. The key elements of the intervention were (1) an on-site 
comprehensive services model that included both clinical (substance 
abuse treatment and family support services) and nonclinical components 
(e.g., extensive outreach and community organizing as well as job 
training and placement and GED high school equivalency certification) 
and (2) high profile police involvement. The goals of the program were 
(1) increases in the proportion of residents entering and completing 
intervention services and (2) a reduction in substance-related 
activities and crime.

Evaluation: Evaluation components; An Evaluation of Chicago's Citywide 
Community Policing Program: The evaluation began in 1998 and was 
completed in 2000. The total evaluation funding was $187,412. The 
intervention site was a public housing complex composed primarily of 
female heads of household tenants and additional family members; the 
control site was another public housing complex on the opposite side of 
town, chosen for its similarities to the intervention site. The 
evaluation design was both process and outcome oriented and involved 
the collection of both qualitative and quantitative data. At baseline, 
a needs assessment survey was completed (n=175 at the intervention site 
and n=80 at the control site), and follow-up surveys with residents 
took place at 12 and 18 months post-intervention (no response rates 
reported). All heads of household at the sites were the target 
population for the surveys. The follow-up surveys, while administered 
in the same two sites, did not track the same respondents that were 
surveyed at baseline. Survey measures included access to social 
services; knowledge and reported use of social services; and residents' 
perceptions of the extent of drug and alcohol abuse, drug selling, 
violence, safety, and unsupervised youth in the community. The study 
also examined crime statistics obtained from the New Haven police 
department, at baseline and during the intervention.

Evaluation: Assessment of evaluation; An Evaluation of Chicago's 
Citywide Community Policing Program: The study had several limitations, 
the first of which is potential selection bias due to pre-existing 
differences between the sites, as well as considerable (and possibly 
differential) attrition in both groups, with no statistical control for 
such differences. Second, respondents may not have been representative 
of the populations at the housing sites. No statistical comparisons of 
respondents to nonrespondents on selected variables were presented. In 
addition, on the baseline survey, the response rates of the 
intervention and control sites differed substantially (70 vs. 44 
percent, respectively). Overall response rates were not reported for 
the follow-up surveys. Furthermore, implementation did not work 
smoothly (e.g., the control site received additional unanticipated 
attention from the police). Finally, the grantee proposed to track data 
on individuals over time (e.g., completion of services), but this goal 
was not achieved, in part because of the limited capability of project 
staff in the areas of case monitoring, tracking, and data management. 
Thus, although the intervention may have produced changes in the 
intervention site "environment" over time (aggregate level changes), it 
is not clear that the intervention successfully impacted the lives of 
individuals and families at the site.

Evaluation: Principal investigator; An Evaluation of Chicago's Citywide 
Community Policing Program: Wayne State University.

Evaluation: Program evaluated; An Evaluation of Chicago's Citywide 
Community Policing Program: The program provides assistance to domestic 
violence victims in some police precincts in the city of Detroit. The 
domestic violence teams studied included specially trained police 
officers, police department advocates, legal advocates, and in one 
police precinct, an on-site prosecutor. The advocates assisted victims 
by offering information about the legal system, referrals, and safety 
planning.

Evaluation: Evaluation components; An Evaluation of Chicago's Citywide 
Community Policing Program: The outcome evaluation began in January of 
1998 and the final report was completed in January of 2001. The grant 
amount was $153,491. The objectives of the study were to address the 
relationships between advocacy and victim safety and between advocacy 
and victims' responses to the criminal justice system, using a quasi-
experimental design to compare domestic violence cases originating in 
police precincts with and without special police domestic violence 
teams that included advocates. The study focused on assistance provided 
in 3 police precincts. Precincts not served by in-precinct domestic 
violence teams, but resembling the precincts with such teams in terms 
of ethnic representation and median income, were selected as 
comparisons. Data were collected using police records, county 
prosecutor's office records, advocate contact forms, and telephone 
interviews with victims. Cases that met Michigan's legal definition of 
domestic violence, had adult female victims, and were received in the 
selected precincts over a 4-month period in 1998 were eligible for the 
study. The cases were first identified by the police department through 
police reports and then reviewed for qualification by a member of the 
research team. A weekly quota of cases was selected from each precinct. 
If the number of qualified cases for a precinct exceeded the quota, 
then cases were selected randomly using a random numbers table. 
Outcomes included rates of completed prosecution of batterers, rate of 
guilty findings against batterers, subsequent violence against victims, 
victims' perceptions of safety, and victims' views of advocacy and the 
criminal justice process.

Evaluation: Assessment of evaluation; An Evaluation of Chicago's 
Citywide Community Policing Program: The study was severely affected by 
numerous problems, many of which the researchers acknowledged. First, 
the sample selection was based on incomplete or unreliable data, since 
police officers in writing reports often did not fully describe 
incidents, and precinct staff inconsistently provided complete case 
information about incidents to the researchers. Second, evaluators were 
not able to secure cooperation from domestic violence advocates and 
their supervisors at all service levels in providing reliable reports 
on service recipients and the type, number, and length of services. 
Additionally, most domestic violence team members were moved out of the 
precincts and into a centralized location during the period victims in 
the study were receiving services, thereby potentially affecting the 
service(s) provided to them. Further, the researchers were uncertain as 
to whether women from the comparison precincts received any advocacy 
services, thereby potentially contaminating the research results 
between the precincts with the domestic violence teams and the 
comparison precincts. Finally, low response rates and response bias for 
data collected from victims were problems. The overall response rate 
for the initial round of telephone interviews was only about 23 percent 
and the response rates for follow-up interviews were lower. Response 
rates were not provided separately for victims from the precincts with 
the domestic violence teams and the comparison precincts. As a result 
of the low response rates, the interviewed victims were identified as 
being less likely to have experienced severe physical abuse, less 
likely to be living with the abuser, and more likely to have a child in 
common with the abuser, compared to the victims in the sample who were 
not interviewed.

Evaluation: Principal investigator; An Evaluation of Chicago's Citywide 
Community Policing Program: University of Cincinnati.

Evaluation: Program evaluated; An Evaluation of Chicago's Citywide 
Community Policing Program: DOJ's COPS office has worked with police 
agencies, the Federal Communications Commission, and the 
telecommunications industry to find ways to relieve the substantial 
demand on the current 911 emergency number. Many police chiefs and 
sheriffs have expressed concern that non-emergency calls represent a 
large portion of the 911 overload problem. Four cities have implemented 
strategies to decrease non-emergency 911 calls and have agreed to 
participate in the research. Those cities, each implementing a 
different type of approach, were Baltimore, Md.; Dallas, Tex.; Buffalo, 
N.Y.; and Phoenix, Ariz.

Evaluation: Evaluation components; An Evaluation of Chicago's Citywide 
Community Policing Program: A process and outcome evaluation was 
conducted between July of 1998 and June of 2000. The grant amount was 
$399,919. For the outcome component, the grantee examined whether (1) 
the volume of 911 calls declined following the introduction of the non 
emergency call system; (2) there was a corresponding decline in radio 
dispatches, thus enhancing officer time; and (3) this additional time 
was directed to community-oriented policing strategies. The bulk of the 
design and analysis focused on Baltimore, with a limited amount of 
analysis of outcomes in Dallas and no examination of outcomes in the 
other two sites. The study compared rates of 911 calls before 
implementation of the new 311 system to rates of 911 and 311 calls 
after the system in both cities. In Baltimore, time series analysis was 
used to analyze the call data; police officers and sergeants were 
surveyed; the flow of 311 and 911 calls to Neighborhood Service Centers 
was examined; researchers accompanied police officers during randomly 
selected shifts in 3 sectors of Baltimore for 2 weeks; and citizens who 
made 311 calls during a certain 1-month time frame were surveyed.

Evaluation: Assessment of evaluation; An Evaluation of Chicago's 
Citywide Community Policing Program: The crux of the outcome analysis 
relies on the study of pre-and post-311 system comparisons, and the 
time series analysis done in Baltimore is sound. The rigor of several 
other parts of this study is questionable (e.g., poor response rates to 
surveys and short time frames for data from accompanying police 
officers on randomly selected shifts). In addition, the choice of sites 
that NIJ required the grantee to examine, other than Baltimore, did not 
allow for a test of the study's objectives. Although NIJ conducted pre-
solicitation site visits to all 4 sites, at the time of the 
solicitation it still did not clearly know whether outcome data would 
be available at all the sites. As it turned out, outcome data were not 
available in Phoenix and Buffalo. Further, since the 311 system in 
Dallas was not implemented with the goal of reducing or changing call 
volume, it does not appear to be a good case with which to test the 
study's objectives.

Evaluation: Principal investigator; An Evaluation of Chicago's Citywide 
Community Policing Program: University of Nebraska - Omaha.

Evaluation: Program evaluated; An Evaluation of Chicago's Citywide 
Community Policing Program: An Early Warning (EW) system is a data 
based police management tool designed to identify officers whose 
behavior is problematic, as indicated by high rates of citizen 
complaints, use of force incidents, or other evidence of behavior 
problems, and to provide some form of intervention, such as counseling 
or training to correct that performance. According to the current 
study's national survey of local law enforcement agencies (LEA) serving 
populations of 50,000 or more, about one-quarter of LEAs surveyed had 
an EW system, with another 12 percent indicating that one was planned. 
One-half of existing EW systems have been created since 1994.

Evaluation: Evaluation components; An Evaluation of Chicago's Citywide 
Community Policing Program: Begun in 1998, the study was completed in 
1999 and included process and outcome components, as well as a national 
survey. The total evaluation funding was $174,643. The outcome portion 
of the study was composed of case studies of EW systems in 3 large 
urban police departments (Miami-Dade, Fla.; Minneapolis, Minn.; and New 
Orleans, La.). Sites were selected judgmentally; each had functioning 
EW systems in place for a period of 4 or more years and had agreed to 
participate in the study.; Both Miami-Dade and Minneapolis case studies 
examined official performance records (including citizen complaints in 
both sites and use of force reports in Miami-Dade) for officers 
identified by the department's EW system, for 2 years prior to and 
after departmental intervention, compared to records for officers not 
identified. The participant groups included officers hired between 1990 
and 1992 and later identified by the EW system (n=28 in Miami-Dade; 
n=29 in Minneapolis); the comparison groups included officers hired 
during the same period and not identified (n=267 in Miami-Dade; n=78 in 
Minneapolis). In New Orleans, official records were not organized in a 
way that permitted analysis of performance of officers subject to EW 
and a comparison group. The New Orleans case study, therefore, examined 
citizen complaint data for a group of officers identified by the EW 
system 2 years or more prior to the study, and for whom full 
performance data were available for 2 years prior to and 2 years 
following intervention (n=27).

Evaluation: Assessment of evaluation; An Evaluation of Chicago's 
Citywide Community Policing Program: The study had a number of 
limitations, many of them acknowledged by the grantee. First, it is not 
possible to disentangle the effect of EW systems per se from the 
general climate of rising standards of accountability in all 3 sites. 
Second, use of nonequivalent comparison groups (officers identified for 
intervention are likely to differ from those not identified), without 
statistical adjustments for differences between groups creates 
difficulties in presenting outcome results. Only in Minneapolis did the 
evaluators explicitly compare changes in performance of the EW group 
with changes in performance of the comparison group, again without 
presenting tests of statistical significance. Furthermore, the content 
of the intervention was not specifically measured, raising questions 
about the nature of the intervention that was actually delivered, and 
whether it was consistent over time in the 3 sites, or across officers 
subject to the intervention. Moreover, it was not possible to determine 
which aspects of the intervention were most effective overall (e.g., 
differences in EW selection criteria, intervention services for 
officers, and post-intervention monitoring), since the intervention was 
reportedly effective in all 3 departments despite differences in the 
nature of their EW systems. Also, no data were available to examine 
whether the EW systems had a deterrent effect on desirable officer 
behavior (e.g., arrests or other officer-initiated activity). Finally, 
generalizability of the findings in Miami-Dade and Minneapolis may also 
be limited, since those case studies examined cohorts of officers 
recruited in the early 1990s, and it is not clear whether officers with 
greater or fewer years of police experience in these departments would 
respond similarly to EW intervention.

Evaluation: Principal investigator; An Evaluation of Chicago's Citywide 
Community Policing Program: University of Missouri - St. Louis.

Evaluation: Program evaluated; An Evaluation of Chicago's Citywide 
Community Policing Program: The Juvenile Justice Mental Health 
Initiative (JJMI) is a collaborative multi-agency demonstration project 
funded under an Office of Juvenile Justice and Delinquency Prevention 
grant, and administered by the St. Louis Mental Health Board, the St. 
Louis Family Court, and the Missouri Department of Health. The 
initiative provides mental health services to families of youths 
referred to the juvenile justice system for delinquency who have 
serious emotional disturbances (SED). The initiative involves parents 
and families in juvenile justice interventions, providing coordinated 
services and sanctions for youths who otherwise might shuttle between 
criminal justice and mental health agencies. Two new mental health 
programs were established under JJMI. The first, the Child Conduct and 
Support Program, was designed for families in which youths under the 
age of 14 do not have a history of serious, violent, or chronic 
offending. The second, Multi-systemic Therapy (MST), was designed for 
families in which youths aged 14 and above have prior serious, violent, 
or chronic delinquency referrals.

Evaluation: Evaluation components; An Evaluation of Chicago's Citywide 
Community Policing Program: The evaluation began in October 2001 and is 
expected to be completed in September 2003. At the time of our review, 
the evaluation was funded for $200,000. The study proposed to evaluate 
the two mental health programs using a random experimental design. 
Youths referred to the Juvenile Court are first screened for SED. Those 
who test positive or have prior diagnoses of SED (anxiety, depressed 
mood, somatic complaints, suicidal ideation, thought disturbance, or 
traumatic experience) are eligible for the JJMI programs. Eligible 
youth are randomly assigned to either one of the two treatment programs 
(depending on age) or to a control group. The evaluation includes a 
comparison of police contact data, court data, self-reported 
delinquency, and standardized measures of psychological and parental 
functioning. Potentially important demographic and social context 
variables, including measures of school involvement and performance, 
will be obtained from court records.

Evaluation: Assessment of evaluation; An Evaluation of Chicago's 
Citywide Community Policing Program: This is an ongoing, well designed 
study. However, as implementation has proceeded, several problems that 
may affect the utility of the results have emerged. First, the 
researchers proposed to sample a total of 200 youths, with random 
assignment expected to result in approximately 100 juveniles in the 
treatment and comparison groups. The treatment group turned out to be 
much smaller than anticipated, however, because the randomization 
protocol and, subsequently, the MST program itself, were discontinued 
by the St. Louis Mental Health Board. At the time of termination, only 
45 youths had been randomly assigned to the treatment group. The small 
number of subjects limits the extent of the analyses that can be 
conducted on this population.; The Child Conduct and Support Program 
designed to address the mental health needs of youth under the age of 
14 without a history of serious offending was never implemented by the 
providers contracted to develop the program. Eligible youth, of all 
ages, were instead assigned to the MST program. Thus, the evaluation 
will not be able to compare the relative effectiveness of programs 
specifically designed for younger and older juvenile offenders with 
SED.

[End of table]

Table 10: Evaluations with Design Limitations:

Evaluation: Principal investigator; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
COSMOS Corporation.

Evaluation: Program evaluated; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The National Rural Domestic Violence and Child Victimization 
Enforcement Grant program, begun in fiscal year 1996, has funded 92 
grants through September 2001 to promote the early identification, 
intervention, and prevention of woman battering and child 
victimization; increase victim's safety and access to services; enhance 
the investigation and prosecution of crimes of domestic violence and 
child abuse; and develop innovative, comprehensive strategies for 
fostering community awareness and prevention of domestic abuse. The 
program seeks to maximize rural resources and capacity by encouraging 
greater collaboration between Indian tribal governments, rural local 
governments, and public and private rural service organizations.

Evaluation: Evaluation components; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The evaluation began in October 1998 and was completed in July 2002. 
This evaluation was funded at $719,949, and included both process and 
outcome components. Initially 10 grantees (comprising 11 percent of the 
total number of program grantees) were selected to participate in the 
outcome evaluation; 1 was unable to obtain continuation funding and was 
dropped from the outcome portion of the study. Two criteria were used 
in the selection of grant participants: the "feasibility" of grantees 
visited in the process phase of the evaluation (n=16) to conduct an 
outcome evaluation; and recommendations from OVW, which were based on 
knowledge of grantee program activities and an interest in representing 
the range of organizational structures, activities, and targeted groups 
served by the grantees. Logic models were developed, as part of the 
case study approach, to show the logical or plausible links between a 
grantee's activities and desired outcomes. The specified outcome data 
were collected from multiple sources, using a variety of methodologies, 
during 2-3 day site visits (e.g., multi-year criminal justice, medical, 
and shelter statistics were collected from archival records where 
available; community stakeholders were interviewed; and grantee and 
victim service agency staff participated in focus groups).

Evaluation: Assessment of evaluation; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
This evaluation has several limitations. First, the choice of the 10 
outcome sites was skewed toward the technically developed evaluation 
sites and was not representative of all Rural Domestic Violence program 
grantees, particular project types, or delivery styles. Second, the 
lack of comparison groups makes it difficult to exclude the effect of 
external factors, such as victim safety and improved access to 
services, on perceived change. Furthermore, several so-called short-
term outcome variables were in fact process variables (e.g., number of 
clients served, number of services provided, number of workshops 
conducted, and service capacity of community agencies). Moreover, it is 
not clear how interview and focus group participants were selected. 
Finally, pre-and post-survey data were not collected at multiple points 
in time to assess change, except at 1 site, where pre-and post-tests 
were used to assess increased knowledge of domestic violence among site 
staff as a result of receiving training.

Evaluation: Principal investigator; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
Institute for Law and Justice.

Evaluation: Program evaluated; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The Civil Legal Assistance (CLA) program is one of seven OJP grants 
(through OVW) dedicated to enhancing victim safety and ensuring 
offender accountability. The CLA program awards grants to nonprofit, 
nongovernmental organizations that provide legal services to victims of 
domestic violence or that work with victims of domestic violence who 
have civil legal needs. The CLA grant program was created by Congress 
in 1998. In fiscal year 1998, 54 programs were funded, with an 
additional 94 new grantees in fiscal year 1999. Approximately 85-100 
new and continuation grants were anticipated in fiscal year 2000.

Evaluation: Evaluation components; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The study began in November 2000 and was expected to be completed in 
October 2003. The proposed evaluation consisted of process and outcome 
components and the total evaluation funding at the time of our review 
was $800,154. The objective of the outcome evaluation was to determine 
the effectiveness of the programs in meeting the needs of the women 
served. The researchers proposed to study 8 sites with CLA programs. At 
each site at least 75 cases will be tracked to see if there is an 
increase in pro se (self) representation in domestic violence 
protective order cases, and a total of 240 victims receiving services 
will be surveyed (about 30 at each site). Focus groups of service 
providers will be used to identify potential program impacts on the 
justice system and wider community. Outcomes to be assessed include 
change in pro se representation in domestic violence protective order 
cases, satisfaction with services, and legal outcomes resulting from 
civil assistance.

Evaluation: Assessment of evaluation; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The evaluation has several limitations. First, NIJ and the grantee 
agreed in 2002 not to utilize a comparison group approach whereby data 
would be collected from a set of comparison sites, due to concerns that 
investment in that approach would limit the amount of information that 
could be derived from the process component of the evaluation and from 
within-site and cross-site analyses of the selected outcome sites. 
Thus, the study will be limited in its ability to isolate and minimize 
the potential effects of external factors that could influence the 
results of the study, in part because it did not include comparison 
groups in the study design. At the time of our review, it was not yet 
clear whether sufficient data will be available from the court systems 
at each outcome site in order to examine changes in pro se 
representation. In addition, since victims would be selected for the 
surveys partially on the basis of willingness to be interviewed, it is 
not clear how representative the survey respondents at each site will 
be and how the researchers will handle response bias. It also appears 
that the victim interviews will rely to a great extent on measures that 
will primarily consist of subjective, retrospective reports.

Evaluation: Principal investigator; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
Caliber Associates.

Evaluation: Program evaluated; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The Department of Health and Human Services and DOJ's Office of Justice 
Programs are jointly funding 6 demonstration sites for up to 3 years to 
improve how 3 systems (dependency courts, child protective services, 
and domestic violence service providers) work with their broader 
communities to address families with co-occurring domestic violence 
(DV) and child maltreatment (CM). Funded sites must agree to implement 
key recommendations of the National Council of Juvenile and Family 
Courts Judges' publication, "Effective Interventions in Domestic 
Violence and Child Maltreatment: Guidelines for Policy and Practice" 
(aka, the "Greenbook"). At a minimum, the sites need to implement 
changes in policies and procedures regarding screening and assessment; 
confidentiality and information sharing; safety; service provision; 
advocacy; cross-training; and case collaboration. The goals of the 
demonstration are to generate more coordinated, comprehensive, and 
consistent responses to families faced with DV and CM, resulting in 
increased safety and well-being for women and their children.

Evaluation: Evaluation components; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The evaluation began in September 2000, and is expected to be completed 
around September 2004. At the time of our review, this evaluation was 
funded at $2,498,638, for both process and outcome components. The 
original evaluation proposal focused on various process elements as 
well as the effects of the intervention on perpetrator recidivism and 
the safety of women and children. In the second year, the evaluator 
realized that no site considered itself to be in the implementation 
phase and many of the original outcome indicators for children and 
families were not appropriate given the initiative time frame. The 
revised design in the funded third year proposal is therefore a 
systems-level evaluation. The analytic focus is now on how the 3 
systems identify, process, and manage families with co-occurrence of DV 
and CM.; A random sample of case records from before and after the 
introduction of the intervention will be used to document trends in 
identification of co-occurring cases of DV and CM over the course of 
the intervention. Stakeholder interviews conducted during site visits 
in fall 2001 and later during implementation, and analysis of agency 
documents, will be used to measure changes in policies and procedures. 
"Network analysis" of responses on the stakeholder interviews will be 
performed to measure changes in how key stakeholders work with others 
within and across systems. Supervisors and workers will also be asked, 
early in the implementation period and at the end of the initiative, to 
respond to vignettes describing hypothetical situations involving co-
occurrence of DV and CM to see how they might respond to clients.

Evaluation: Assessment of evaluation; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
This evaluation has several limitations. First, the study objectives 
changed substantially from year 1 to year 3. The study is no longer 
examining outcomes for individuals, precluding conclusions about 
whether the implementation improved the lives of victims of domestic 
violence or their children. Second, it is not clear whether the 
evaluator will locate appropriate comparison data at this late stage, 
and without a comparison group, the study will not be able to determine 
(a) whether collaboration between systems improved (or weakened) 
because of the intervention or some extraneous factors and (b) whether 
collaboration resulted in increased capacity in the 3 systems to 
identify the co-occurrence of DV and CM, or whether these kinds of 
cases increased for reasons other than collaboration (e.g., perhaps 
identification of these cases is improving all over the country). 
Questions remain about the extent of data available for examining co-
occurrence of DV and CM at the 6 sites.

Evaluation: Program evaluated; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
Since 1996 NIJ has funded, as part of the CLEFS program, 32 grants 
totaling over $2.8 million to law enforcement agencies, correctional 
agencies, and organizations representing officers (unions and 
membership associations) to support the development of research, 
demonstration, and evaluation projects on stress intervention methods. 
The stress intervention methods developed and studied have included 
stress debriefing and management techniques, peer support services, 
referral networks, police chaplaincy services, stress management 
training methods, spouse academies, and stress education programs. 
While NIJ purports to have developed state-of-practice stress reduction 
methods through these efforts, it acknowledges that very little outcome 
data have been generated.

Evaluation: Evaluation components; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The evaluation began in June 2000 and is expected to be completed in 
June 2004. At the time of our review, the grant amount was $649,990. 
The study proposes to develop and field test a model to allow for the 
systematic evaluation of selected program components. The grantee 
worked with NIJ to identify the test sites and services to be 
evaluated, based on grant application reviews, telephone interviews, 
and site visits. Three police departments in Duluth, Minn.; North Miami 
Beach, Fla.; and Knoxville, Tenn. were selected. Baseline stress 
correlate data were collected during visits to the 3 sites between 
January 2002 and March 2002, and baseline officer and spouse/partner 
surveys were conducted during the same visits. Outcome data were to be 
collected at baseline (prior to actual program implementation), midway 
through the implementation, and toward the end of the evaluation. While 
the original proposal did not specify exactly what stress correlate or 
outcome data were to be collected, the grantee was considering looking 
at rates of absenteeism and tardiness, citizen complaints, rule and 
regulation violations, disciplinary actions, and premature retirements 
and disability pensions, as stress correlates. These were to be 
obtained from official agency records. Surveys included questions about 
program impacts on physical health, emotional health, job performance, 
job satisfaction, job-related stress, and family related stress. The 
evaluation also included baseline health screenings. It appears the 
evaluation plan has been modified to add supervisor surveys (there were 
none at baseline), and to incorporate group data collection efforts 
with officers, spouses, supervisors, and administrators.

Evaluation: Assessment of evaluation; National Evaluation of the Rural 
Domestic Violence and Child Victimization Enforcement Grant Program: 
The study has several limitations. First, the 3 study sites were chosen 
on the basis of merits in their proposal to implement a stress 
reduction or wellness program for officers, from 4 sites that submitted 
applications. There was no attempt to make the chosen sites 
representative of other sites with stress reduction programs and police 
departments more generally. Second, the study will not make use of 
comparison groups consisting of similar agencies that did not implement 
stress reduction programs. It is unclear how effects of the 
interventions in these 3 sites over time will be disentangled from the 
effects of other factors that might occur concurrently. Third, the 
grantee will not collect individually identified data, and thus will 
only be able to analyze and compare aggregated data across time, 
limiting the extent of analysis of program effects that can be 
accomplished. Fourth, response rates to the first wave of officer 
surveys were quite low in 2 of the 3 sites (16 percent and 27 
percent).

[End of table]

[End of section]

Appendix III: Comments from the Department of Justice:

U.S. Department of Justice 
Office of Justice Programs:

Office of the Assistant Attorney General	
Washington, D. C. 20531:

Laurie E. Ekstrand:

Director, Homeland Security and Justice Issues General Accounting 
Office:

441 G Street, N.W. Mail Stop 2440A Washington, DC 20548:

SEP 04 2003:

Dear Ms. Ekstrand:

This letter responds to the General Accounting Office (GAO) draft 
report entitled, "JUSTICE OUTCOME EVALUATIONS: Design and 
Implementation of Studies Require More NIJ Attention" (GAO-03-1091).

The Office of Justice Programs (OJP) and its component, the National 
Institute of Justice (NIJ), share GAO's support for high-quality 
evaluations that can inform criminal justice policy and practice. The 
GAO's review of 15 NIJ outcome evaluations reveals much about the 
complexity of real-world evaluation, the challenges to a successful 
outcome evaluation, and the keys to ensuring a highly successful 
evaluation.

In the draft report, GAO recommended that NIJ review its ongoing 
outcome evaluation grants and develop appropriate strategies and 
corrective measures to ensure that methodological design and 
implementation problems are overcome. The GAO also recommended that NIJ 
continue to assess its evaluation process, including considering the 
feasibility of obtaining more information about the availability of 
outcome data prior to developing a solicitation for research; requiring 
proposals to contain more detailed design specifications; and more 
carefully calibrating NIJ monitoring procedures based on the 
characteristics of the grant and the knowledge area under 
investigation.

The NIJ agrees with GAO's recommendations. As GAO noted, in January 
2003 NIJ established an Evaluation Division to, in part, improve the 
quality and utility of NIJ's evaluations. The NIJ has also started, and 
will continue with, a new strategy of "evaluability assessments." These 
are quick, low-cost initial assessments of criminal or juvenile justice 
programs to see if the necessary conditions exist to warrant sponsoring 
a full-scale outcome evaluation. Also, as part of the grantmaking 
process, NIJ is developing new grant "special conditions" that will 
require grantees to document all changes in the scope and components of 
evaluation designs. For ongoing grants, NIJ is making several changes 
in response to GAO's concerns. During Fiscal Year 2004, NIJ plans to 
review its grant monitoring procedures for evaluation grants in order 
to intensively monitor the larger or more complex grants. The NIJ will 
also conduct periodic reviews of its evaluation research portfolio to 
assess the progress of 
ongoing grants, document any changes in evaluation design that may have 
occurred, and reassess the expected benefits of ongoing projects.

A few factual inaccuracies in the draft report have been highlighted in 
the attachment to this letter. In addition, there are two points that 
GAO makes in the draft report that we believe require highlighting 
here.

We strongly agree with GAO that "optimal conditions for the scientific 
study of complex social problems almost never exist." (draft report p. 
4) The "real-world" conditions with which evaluators must contend often 
pose substantial challenges to successfully completing even the best 
designed and carefully planned evaluations. As the draft report notes 
regarding the six of eleven evaluations that encountered problems, 
"some evaluators were unable to carry out a proposed evaluation plan 
because the program to be evaluated was not implemented as planned, or 
they could not obtain complete or reliable data on outcomes. In some 
cases, implementation problems were beyond the evaluators' control, and 
resulted from decisions made by agencies providing program services 
after the study was underway." (draft report p. 3; emphasis added) 
However, GAO did not take this key point into consideration 
sufficiently when reaching its conclusions about the feasibility of 
attaining "conclusive results." The implication is that through 
rigorous design and careful monitoring, conclusive evaluation results 
can always be achieved. While we wish it were so, as a practical matter 
we cannot share that level of optimism.

Second, GAO's draft report reflects a strong commitment to implementing 
rigorously designed evaluations. Randomized control trials can provide 
strong evidence of program effects, and effectively control for 
spurious factors which, if unchecked, can confound interpretation of 
the evaluation results. However, randomized trials are not always 
feasible, and sometimes even non-random comparison groups are 
unavailable (as GAO notes on p. 40). In these cases, evaluators must 
choose from among other designs that have sufficient scientific rigor 
while also taking into account numerous factors such as data 
availability, cost opportunities for randomization, risk to subjects, 
likely effect size of the intervention, and the availability of 
appropriate comparison groups. We do not believe that the GAO 
sufficiently took this fact into account in its report or recognizes 
that these other methods of evaluation are valid means of scientific 
endeavor. In the last two decades, the evaluation field and its 
theorists have broadened their thinking about what constitutes 
"quality" evaluation. Increasingly, prominent leaders in the evaluation 
field are urging researchers to choose methods that are appropriate to 
the particular evaluation being conducted, and to take into 
consideration the context of each evaluation rather than utilizing the 
same set of methods and designs for all evaluations - a direct attack 
on the experimental design. The GAO report, with its strong emphasis on 
experimental and quasi-experimental designs, reflects an important 
view, but one that does not reflect current evaluation theory and 
practice.

As GAO notes in the draft report, many partner agencies have found NIJ 
evaluations a rich source of information to inform and guide criminal 
justice programs and policies. The NIJ will strive to demonstrate even 
greater value through future outcome evaluations.

The OR appreciates the opportunity to comment on the draft report. Our 
additional specific comments are enclosed for GAO's consideration.

Sincerely,

Deborah J. Daniels 

Assistant Attorney General:

Signed by Deborah J. Daniels: 

Enclosure:

cc:	Sarah V. Hart, Director National Institute of Justice:

Cynthia J. Schwimer Comptroller, OR:

LeToya A. Johnson Audit Liaison, OR:

Vickie L. Sloan Audit Liasion, DOJ:

OAAG Executive Secretariat Control Number 20031746:


[End of section]

Appendix IV: GAO Contacts and Staff Acknowledgments:

GAO Contacts:

Laurie E. Ekstrand (202) 512-8777:

Evi L. Rezmovic (202) 512-2580:

Staff Acknowledgments:

In addition to the above, Tom Jessor, Anthony Hill, Stacy Reinstein, 
David Alexander, Michele Fejfar, Douglas Sloane, Shana Wallace, Judy 
Pagano, Kenneth Bombara, Scott Farrow, Ann H. Finley, Katherine Davis, 
and Leo Barbour made key contributions to this report.


FOOTNOTES

[1] U.S. General Accounting Office, Justice Impact Evaluations: One 
Byrne Evaluation Was Rigorous; All Reviewed Violence Against Women 
Office Evaluations Were Problematic, GAO-02-309 (Washington, D.C.: Mar. 
2002); and Drug Courts: Better DOJ Data Collection and Evaluation 
Efforts Needed to Measure Impact of Drug Court Program, GAO-02-434 
(Washington, D.C.: Apr. 2002).

[2] Social science research standards are outlined in Donald T. 
Campbell and Julian Stanley, Experimental and Quasi-Experimental 
Designs for Research (Chicago: Rand McNally, 1963); Thomas D Cook and 
Donald T. Campbell, Quasi-experimentation: Design and Analysis Issues 
for Field Settings (Boston: Houghton Mifflin, 1990); Carol H. Weiss, 
Evaluation Research: Methods for Assessing Program Effectiveness 
(Englewood Cliffs: Prentice-Hall, Inc., 1972); Edward Suchman, 
Evaluation Research: Principles and Practice in Public Service and 
Social Action Programs (New York: Russell Sage Foundation, 1967); and 
U.S. General Accounting Office, Designing Evaluations, GAO/PEMD-10.1.4 
(Washington, D.C.: May 1991).

[3] Outcome evaluations can be distinguished from process or 
implementation evaluations, which are designed to assess the extent to 
which a program is operating as intended. 

[4] 42 U.S.C. 3721-3723. NIJ was formerly called the National Institute 
of Law Enforcement and Criminal Justice. 

[5] P.L. 107-77. See H.R. Conf. Rep. No. 107-278, at 88, 108, and 112 
(2001).

[6] In 2002, the NIJ Director specified that there be an equal number 
of researchers and practitioners on the review panels. 

[7] A number of these grants included both process and outcome 
components. 

[8] Three of the 15 evaluations were funded as cooperative agreements.

[9] Statistically controlling for external factors that may be related 
to program outcomes and on which the treatment and comparison groups 
differ is usually not necessary when there is random assignment of 
participants to treatment and comparison conditions.

[10] The grantee notes that a 1990 analysis of 85 longitudinal studies 
reported an average questionnaire completion rate of 72 percent for 19 
studies that had a 24-month follow-up period. This is slightly lower 
than the 76 percent response rate achieved after 2 years in the Gang 
Resistance Education and Training evaluation. 

[11] Selection bias refers to biases introduced by selecting different 
types of people into the program and comparison groups; differences in 
measured outcomes for each group may be a function of preexisting 
differences between the groups, rather than the intervention. 
Differential attrition refers to unequal loss of participants from the 
program and comparison groups during the course of a study, resulting 
in groups that are no longer comparable. Both may be a threat to the 
validity of conclusions. 

[12] Preexisting differences between the program and comparison groups 
can be viewed as a design problem. We treat this as an implementation 
problem in this section because the proposed design for these 
particular studies appeared to us to be reasonable at the time the 
funding decision was made. Problems with the comparability of the 
groups became apparent only after the studies were well underway, and 
often it was too late to control for the effects of such differences on 
program outcomes with statistical adjustments. 

[13] The treatment programs were to be developed under the funding and 
oversight of the St. Louis Mental Health Board and the Missouri 
Department of Mental Health. 

[14] As a result, juveniles under 14 were randomly assigned to either 
the program for juveniles 14 and over, or to the comparison group. 

[15] The Evaluation of a Comprehensive Service-Based Intervention 
Strategy in Public Housing reported response rates for both the 
intervention and comparison sites on a survey at baseline, but did not 
report response rates for follow-up surveys conducted 12 and 18 months 
after the intervention began.

[16] Civil Legal Assistance provides grants to nonprofit, 
nongovernmental organizations that provide legal services to victims of 
domestic violence or that work with victims of domestic violence who 
have civil legal needs.

[17] NIJ officials told us in August 2003 that the evaluation had been 
funded for a fourth year, and that the federal agencies funding this 
evaluation (DOJ and the Department of Health and Human Services) were 
also considering a fifth year of funding. Four years of funding allows 
the evaluation to collect data covering about the first 3 years of 
implementation in the sites. However, data collected from stakeholders 
at the sites early in the evaluation showed that the sites expected 
that it would take 3.5 to 4 years to achieve change in key individual 
level outcomes. At the time of our review, there was no information on 
whether individual level outcome data would be collected.

[18] Because of our interest in the effectiveness of criminal justice 
programs, we limited our review of the usefulness of NIJ outcome 
evaluations to evaluations of DOJ programs, or evaluations funded by 
DOJ--a total of 5 evaluations. We did not examine 3 other completed NIJ 
outcome evaluations focusing on programs funded by agencies other than 
DOJ. 

[19] Officials with DOJ's Office on Violence Against Women were not 
familiar with the findings from the other completed NIJ study focusing 
on violence against women, the Victim Advocacy with a Team Approach 
evaluation. This evaluation was funded by a transfer of funds to NIJ 
for NIJ research and evaluations in the area of violence against women. 
NIJ officials stated that Office on Violence Against Women officials 
were consulted in the development of the solicitation. 

[20] An early warning system is a data based police management tool 
designed to identify officers whose behavior is problematic, as 
indicated by high rates of citizen complaints, use of force incidents, 
or other evidence of behavior problems, and to provide some form of 
intervention, such as counseling or training to correct that 
performance. The NIJ-managed study consisted of a process and outcome 
evaluation of early warning systems in 3 large urban police 
departments, as well as a national survey. 

[21] Through the Regional Community Policing Institute network, DOJ's 
Office of Community Oriented Policing Services assists local law 
enforcement agencies with meeting their community policing training 
needs.

[22] GAO-02-309.

[23] Homeland Security Act of 2002, P.L. 107-296 sec. 237.

[24] NIJ Web site (http://www.ojp.usdoj.gov/nij/about.htm).

[25] These analyses compare a program's outputs or outcomes with the 
costs (resources expended) to produce them. Cost-effectiveness analysis 
assesses the costs of meeting a single goal or objective, and can be 
used to identify the least costly alternative to meet that goal. Cost-
benefit analysis aidms to identify all the relevant costs and benefits, 
usually expressed in dollar terms.

[26] Earmarked refers to dedicating an appropriation for a particular 
purpose. Legislative language may designate any portion of a lump-sum 
amount for particular purposes. In fiscal year 2002, congressional 
guidance for the use of these funds was provided in conference report 
H.R. 107-278. The report specified that up to 10 percent of the funds 
for the Bureau of Justice Assistance's Edward Byrne Discretionary Grant 
Program be made available for an independent evaluation of the program 
(at 88); and up to 10 percent of the funds for the Office of Juvenile 
Justice and Delinquency Prevention's Discretionary Grants for National 
Programs and Special Emphasis Programs (at 108) and Safe Schools 
Initiative be made available for an independent evaluation of the 
program (at 112).

[27] Prior to conducting the evaluability assessments, NIJ conducted an 
initial review of the earmarked programs, and eliminated from 
consideration those programs that were appearing in legislation for the 
first time, in order to focus on those programs that were receiving 
continuation funding. 

[28] The solicitation deadlines were April 11, 2003, for the Bureau of 
Justice Assistance programs and July 15, 2003, for the Office of 
Juvenile Justice and Delinquency Prevention programs.

[29] A new requirement of the solicitation for proposals is that 
applicants report what prior funding they have received from NIJ. 

[30] These standards are well defined in scientific literature. See, 
for example, Donald T. Campbell and Julian C. Stanley, Experimental and 
Quasi-Experimental Designs for Research (Chicago: Rand McNally & 
Company, 1963); Carol H. Weiss, Evaluation Research: Methods for 
Assessing Program Effectiveness (Englewood Cliffs: Prentice-Hall, 
Inc., 1972); Edward A. Suchman, Evaluative Research: Principles and 
Practice in Public Service & Social Action Programs (New York: Russell 
Sage Foundation, 1967); and GAO/PEMD-10.14.

[31] The evaluations varied in the methodologies that were used to 
examine program effects. Of the 15 evaluations, 14 did not explicitly 
discuss cost/benefit considerations. The evaluation of Breaking the 
Cycle estimated cost/benefit ratios at each of the 3 demonstration 
sites examined. 

GAO's Mission:

The General Accounting Office, the investigative arm of Congress, 
exists to support Congress in meeting its constitutional 
responsibilities and to help improve the performance and accountability 
of the federal government for the American people. GAO examines the use 
of public funds; evaluates federal programs and policies; and provides 
analyses, recommendations, and other assistance to help Congress make 
informed oversight, policy, and funding decisions. GAO's commitment to 
good government is reflected in its core values of accountability, 
integrity, and reliability.

Obtaining Copies of GAO Reports and Testimony:

The fastest and easiest way to obtain copies of GAO documents at no 
cost is through the Internet. GAO's Web site ( www.gao.gov ) contains 
abstracts and full-text files of current reports and testimony and an 
expanding archive of older products. The Web site features a search 
engine to help you locate documents using key words and phrases. You 
can print these documents in their entirety, including charts and other 
graphics.

Each day, GAO issues a list of newly released reports, testimony, and 
correspondence. GAO posts this list, known as "Today's Reports," on its 
Web site daily. The list contains links to the full-text document 
files. To have GAO e-mail this list to you every afternoon, go to 
www.gao.gov and select "Subscribe to e-mail alerts" under the "Order 
GAO Products" heading.

Order by Mail or Phone:

The first copy of each printed report is free. Additional copies are $2 
each. A check or money order should be made out to the Superintendent 
of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or 
more copies mailed to a single address are discounted 25 percent. 
Orders should be sent to:

U.S. General Accounting Office

441 G Street NW,

Room LM Washington,

D.C. 20548:

To order by Phone: 	

	Voice: (202) 512-6000:

	TDD: (202) 512-2537:

	Fax: (202) 512-6061:

To Report Fraud, Waste, and Abuse in Federal Programs:

Contact:

Web site: www.gao.gov/fraudnet/fraudnet.htm E-mail: fraudnet@gao.gov

Automated answering system: (800) 424-5454 or (202) 512-7470:

Public Affairs:

Jeff Nelligan, managing director, NelliganJ@gao.gov (202) 512-4800 U.S.

General Accounting Office, 441 G Street NW, Room 7149 Washington, D.C.

20548: