Service Delivery
Evaluating the Process and Monitoring Outcomes
What are some important issues for conducting evaluations?
Long-Term or Short-Term Goals?
When conducting an outcome evaluation, the primary goal is to answer the following question: Does the program "work"? In answering this question, a vital second question is sometimes overlooked: How long will it take to find out if it works? The answer to this second question can vary widely depending on the type of outcome of interest. For example, if a program works with low-income high school seniors to increase their chances of on-time graduation, one would expect to see results within a relatively short time frame. Thus, one could conduct an evaluation that tracked seniors over the course of just one school year.
The situation becomes more complicated if a program's intended impact is more long-term, for example, services for low-income first-graders intended to increase the likelihood of high school graduation. Examining progress toward this goal would require many years of data collection, considerably greater expense, and many years of waiting for evaluation results. The situation is further complicated by the way that policymakers and other funders have historically distributed funds. Often, organizations are required to demonstrate that their programs will have impact within a relatively short period of time. This requirement can discourage organizations from focusing on long-term goals or disadvantage those that do.
Clearly, the timing of results can present a considerable challenge, particularly because many very meaningful goals (such as reducing child poverty, involvement in crime, and a community's illiteracy rate) cannot be achieved in the short run. Thus, it is essential that targeted implementation efforts continue to try to reach those goals even if the impact of the efforts cannot be observed immediately.
The quandary surrounding the measurement of long-term outcome goals is often discussed in the context of results-based decision-making. Experts in this area have recommended that, instead of abandoning important long-term goals, programs should identify shorter-term interim goals that can signal progress (or a lack thereof) toward long-term goals [1]. Available research on the subject of a program's long-term goals provides the best source of information about short-term outcomes that might be good candidates for signaling progress. This strategy may assist organizations in measuring their own progress and demonstrate to funders and policymakers that there is some program impact in the short run.
Does the Program Work?
Another considerable challenge in conducting evaluations is determining whether an observed outcome is due to a program's activities or due to some other factor. For example, suppose a new after-school program was launched for high-school-aged students that was intended to reduce the level of delinquency in a city during the high-risk hours between 2 p.m. and 6 p.m. To measure the impact of the program, one may choose to compare police reports of delinquency in the city from the school year before starting the program (baseline) with data from the year of the program. If one finds that the city's level of delinquency has declined, can one conclude that the program was the cause of the decline? The answer is no. There are many other factors besides the after-school program that may have influenced the drop in delinquency. For example, it may be that more students are employed or participating in extracurricular activities after school than in the previous year. It may also be that fewer parents are working outside the home (perhaps because of an economic downturn in the area) resulting in a greater level of supervision for students after school than in the previous year. In short, to provide sound feedback about the impact of a program, an evaluation must rule out such "rival explanations" to the greatest extent possible given available resources.
The best way to rule out these rival explanations is to a use a control or comparison group that is similar to the program group in every way except for program participation. This can be done by randomly assigning individuals (or other units, such as families, schools, neighborhoods, or even cities) to participate in the program or control group. For example, one could make an alphabetical list of all high school students interested in signing up for an after-school program. One could then admit every other student on the list to the program. The remaining students would be assigned to a control group that would not participate in the program. Specifically, one would track both the program group and control group to see if the two differed in their delinquent behavior.
Assigning students to the program and control group in this "random" way is called an experimental design. Random assignment makes use of mathematical principles that increases the chance that the groups will be similar in every way except for program participation. Thus, random assignment would equalize the groups on other factors (besides program participation), such as gender, race, household income, school performance, and such, that might influence whether the students are involved in delinquency. This type of study would provide the highest-quality evidence on the impact of a program. That is, one could put the most trust in the results of an evaluation that used an experimental design.
While this is the "gold standard" for program evaluation research, it is often not possible or feasible to use an experimental design [2]. Under such circumstances, a second-best option would be what is called a quasi-experimental design. Using this technique, one would construct a comparison group similar to the program group and employ statistical methods to remove the influence of factors that might make the two groups different.
In case of the after-school program discussed earlier, it might be that the program is already operating and administrators cannot be persuaded to allow a random assignment of students to participate in the program and comparison group. In that case, if the program is at capacity and a waiting list for the program is available, one could treat the wait-listed students as the comparison group. The wait-listed students are not participating in the program so their behavior could be compared to that of the program participants.
If no waiting list is available, one could obtain a list of all students at the school not participating in the program and randomly draw students from that list to make up a comparison group. This method is less preferable than the waiting-list method because wait-listed students have at least expressed an interest in participating in the program. Thus, those students are likely to be more similar to those in the program than the rest of the student body who had expressed no interest in the program.
Either method, however, is better than using no comparison group at all. An evaluation lacking any kind of comparison group and consideration of rival explanations provides very weak evidence and should not be used to draw conclusions about the effectiveness of a program.
For an evaluation of the after-school program, for example, after constructing a comparison group, one would gather as much information as possible about the students in order to take any potential differences among them into account. Statistics could be used to reduce the influence of those pre-existing differences that might otherwise cloud the findings. For example, boys are more likely to be delinquent than are girls. If the program group contains more boys than the comparison group, it is likely that one would see more delinquency in the program group, regardless of the impact of program itself. Using appropriate statistics, one could use mathematical principles to control the influence of gender on delinquency to more clearly determine the impact of the program.
How similar a comparison group is to a program group and the types of factors that can be taken into account using statistics are often highly dependent on available resources and existing data. Circumstances rarely allow for conducting the ideal evaluation, but steps should be taken to provide the best quality evidence possible, given existing constraints.
How Can Costs and Benefits Be Assessed?
In addition to raising questions about whether a program "works," broader questions may be raised about whether a program is worth the cost or the budgetary tradeoffs required to pay for it. Particularly for those programs that require considerable resources for their establishment and operation, managers and policymakers are often interested in whether the investment can be justified.
Measuring a program's costs and benefits is one type of evaluation or method of assessing the effectiveness of a program. Determining how to measure actual program costs can be difficult and weighing those costs against their potential benefits can be complicated. One important first step is to consider how the evaluation will be used and how much time and money are available to conduct the analysis.
Researchers at RAND have outlined four basic forms of analyzing cost: [3]
Cost analysis measures only the actual cost of the program itself. This type of analysis would provide information that is useful for budget projections and for determining how much it would cost to replicate the program in another setting. In general, this is the least complicated and expensive type of analysis.Cost-effectiveness analysis measures how much it would cost to produce a desired outcome. In other words, this method assesses how much of the desired outcome can be purchased based on a specified funding level for the program.
Cost-savings analysis measures whether the program "pays for itself" by comparing the costs of the program with the actual savings produced by the program. Only savings to the organization or agency funding the program, and not savings to others, is considered in this method.
Cost-benefit analysis measures whether the benefits of the program to a particular stakeholder outweigh their costs. Commonly, the stakeholder in this type of analysis is society as a whole rather than individual organizations or agencies. This is generally the most expensive and complicated type of analysis.
One considerable challenge for using the cost-benefit analysis method of assessing programs is translating costs and benefits into a format that allows them to be compared -- that is, dollar amounts. Assigning dollar values to some potential program benefits can be very difficult. For example, measuring the benefits of a program to reduce school violence requires the development of some method for transferring the impact of school violence into dollar amounts. Tangible costs, such as building repairs, lost work time, and medical expenses, that might be avoided with less school violence are easier to estimate as dollar amounts than intangible costs to victims and communities, such as pain and suffering, that would be avoided in the absence of school violence.
In whatever form costs are analyzed, RAND researchers point out several issues that should be considered in the research: [3]
- When describing costs and benefits, they should be allocated to the stakeholders that accrue them. That is, costs and benefits may not be equally distributed among different groups and analyses should be careful to show the nature of this distribution. For example, a juvenile crime prevention program may increase costs to a city government but much of the savings may accrue to county- or state-level juvenile justice agencies.
- The analysis should specify the timeframe of the costs and benefits. Managers and policymakers may not be willing to wait for a data collection effort that will take 20 years before potential long-term program benefits can be observed. It is important to assess existing political realities and funding availability to determine a reasonable time frame for data collection.
- The time frame has further importance because costs and benefits that accrue in the future should be "discounted" relative to costs and benefits that are realized in the present. Thus, shorter-term costs and benefits should be given more weight than costs and benefits coming over the longer term.
- Because costs for goods and services often vary over time and place, data on costs should be collected according to the quantities of required resources rather than solely on how much the resources cost. For example, it would be better to collect data on the cost of staff hours and the number of staff hours invested in the program.
- When making future projections about costs and benefits, it is best to specify a range of values when some factors have the potential to change in the future. For example, impending budget cuts may force a reduction of staff and services, which would impact predictions about future costs and benefits.
Who Should Conduct the Evaluation?
Another major area of concern is who should conduct the evaluation. The choices typically are: the existing in-house organization staff is used, an external consultant is hired to advise existing staff, or an external evaluator (or team) conducts the evaluation with some assistance from internal staff. There are advantages and disadvantages to each approach, as described in a report for program administrators sponsored by the Department of Housing and Urban Development: [4]
- In-house Evaluation Team
- Advantages: May be the least expensive option; promotes maximum involvement and participation of staff and can contribute to building staff expertise for future evaluation efforts.
- Disadvantages: Staff members may not be sufficiently knowledgeable or experienced to effectively design and implement the evaluation; potential funders may not perceive evaluation results as objective.
- In-house Evaluation Team Supported by Outside Consultant
- Advantages: May be less expensive than hiring an outside evaluator; using the agency staff as team members could increase the likelihood that the evaluation will be consistent with program objectives
- Disadvantages: Greater time commitment required of staff may outweigh the cost reduction from using the outside professional as a consultant instead of as a team leader; may produce a less influential or objective report.
- Outside Evaluator
- Advantages: Results may be perceived by current or potential funders as more objective because evaluator does not have a stake in the evaluation findings; evaluator may have greater expertise and knowledge than agency staff about the technical aspects involved in conducting an evaluation.
- Disadvantages: Can be expensive to hire; may not have an adequate understanding of issues relevant to the organization or its type of service.
Regardless of who conducts the evaluation, it is important to remember that evaluations can be expensive. One component that can drive up the cost is the lack of an adequate data system. When necessary data are available in computerized form, evaluations can be conducted with far fewer resources than when data must be tabulated from paper files or collected from other sources. This is an important consideration when preparing proposals to funders or setting aside funds for an external evaluator. To reduce this potential cost, it is best to develop a new database or integrate new fields into an existing database at the same time a new program is being developed. This task can be complicated; therefore, if sufficient expertise is not available internally, it may be beneficial to hire a database consultant to assist with constructing a new database, or enhancing an existing one. The initial cost is likely to be more than offset by the savings in evaluation costs.
More Information
Here are some additional sources of information on how to conduct process and outcome evaluations of programs or partnerships, including how to select an external evaluator:
- New! The RAND Corporation hands-on manual Getting To Outcomes 2004: Promoting Accountability Through Methods and Tools for Planning, Implementation, and Evaluation highlights ten questions related to accountability that should be answered in the course of planning, implementing, and evaluating prevention programs. Although the manual specifically addresses prevention of substance use and abuse among youth, the approach can be applied to all types of prevention programs. The manual features examples from a community-based effort to develop, implement, and evaluate a program to eliminate alcohol, drug, and tobacco use among school-age youth. Worksheets contained in the manual provide a way for readers to answer the ten accountability questions, and a web-based companion tool called iGTO is currently being developed to facilitate this process. The full manual is available at www.rand.org/pubs/technical_reports/TR101/index.html.
- CYFERnet contains a list of resources on various topics related to evaluation. In particular, see the articles in "Applying Evaluation Tools to Your Program" for materials on topics such as data collection methods, collaboration and evaluation, and using existing records in evaluation. www.extension.iastate.edu/cyfar/port/eval_tools_port.html
- "Integrating Process and Outcome Evaluations" is a toolkit sponsored by the Center for Mental Health Services that highlights the ways in which data from both process and outcome evaluations can be used to assess how a program is functioning. The toolkit provides a description of different types of databases and a description of how and why each type might be of use in program administration and evaluation. www.mentalhealth.org.
- The Department of Education sponsored the development of an evaluation guide for organizations participating in comprehensive strategies to address the needs of children, families, and communities. This how-to guide covers a wide range of evaluation topics, including setting goals, identifying appropriate indicators for success, and collecting data. Sample forms are also provided as examples of tools used in the research process. www.ed.gov/inits/americareads/resourcekit/MakingInfo/
- The Edna McConnell Clark Foundation has developed a guide to help programs and communities assess the quality of evidence about program effectiveness. The continuum provides the foundation’s guideline for determining the degree of confidence that can be placed in results, depending upon the methods used in a program evaluation.
http://www.emcf.org/evaluation/process/programquality.htm - New! Educational reforms such as the No Child Left Behind act of 2001 have adopted results-based management techniques from the private sector, including techniques for setting goals, measuring outcomes, and rewarding performance or penalizing failure. The overall goal of these techniques is to increase accountability. The RAND Corporation report Organizational Improvement and Accountability: Lessons for Education from Other Sectors examines how various accountability systems have fared in several private-sector industries (manufacturing, health, law, and social service delivery) and assesses the extent to which they can provide lessons for educators. A strategy for improving accountability systems in education is provided in the report. View the report summary and full report at www.rand.org/pubs/monographs/MG136/index.html.
- RAND researchers prepared a report titled Assessing Costs and Benefits of Early Childhood Intervention Programs. This report describes methods for designing such evaluations and for assessing the feasibility of various designs given available resources, and provides examples of research studies employing these methods. The full 138-page report is available at http://www.rand.org/pubs/monograph_reports/MR1336/index.html. An Executive Summary of the report is available at http://www.rand.org/pubs/monograph_reports/MR1336.1/ and a two-page Research Brief is available at http://www.rand.org/pubs/research_briefs/RB5051/index1.html.
- The Urban Institute has prepared an Evaluation Guidebook to assist grantees receiving funds through the Violence Against Women Act. Much of this report discusses general concepts that are applicable to any type of program, including how to assess readiness for evaluation, how to select an evaluator and evaluation design, how to use evaluation information in improving and promoting programs, and how to measure short-term and long-term changes. http://www.urban.org/publications/407365.html
The Urban Institute, with the Independent Sector, has produced a report entitled Outcome Measurement In Nonprofit Organizations: Current Practice and Recommendations. The executive summary is available free online at www.independentsector.org and the full report can be ordered from www.independentsector.org. -
New! What Works for Children? Evidence Guide provides a comprehensive overview of evidence-based practice and why it is important. The report describes the skills and knowledge that are needed to implement evidence-based practices. The report describes the necessary steps in searching for evidence, judging the quality of the evidence that is found and applying the evidence in practice. Detailed information is provided on how to conduct searches for relevant research. The full report is available at www.whatworksforchildren.org.uk/docs/tools/evguide%20guide%20WEB.pdf.
- The National Center for Injury Prevention and Control has produced a book intended to guide program administrators through the key issues of program evaluation, including cost considerations, characteristics of quality external evaluators, and example data collection instruments and forms www.cdc.gov/ncipc.
- The U.S. Department of Housing and Urban Development has sponsored a guide for conducting evaluations and victimization surveys. Issues are discussed in the context of public housing and crime prevention, but the general concepts are applicable to a wide range of topics. The guide is available free online and a hard copy of the guide and accompanying workbook can also be ordered through the following site: www.huduser.org.
- The Office of Juvenile Justice and Delinquency Prevention (OJJDP) provides a helpful illustration of measurement and evaluation issues in a variety of program areas related to delinquency prevention and intervention. Each program area contains a description of the program, example performance measures, and important considerations for both process and outcome evaluations. The numerous examples provide illustrations of the general concepts of selecting performance measures and conducting program evaluations www.jrsa.org/jjec. In addition, OJJDP has prepared a practitioner brief on the following topics: why program evaluation is important, some tips on data collection, and how to use results www.ncjrs.org.
Footnotes
[1] Schorr, Lisabeth, The Case for Shifting to Results-Based Accountability, Washington, D.C.: Center for the Study of Social Policy, 1994. www.cssp.org
[2] Sherman, L. W., "Thinking About Crime Prevention," in L. W. Sherman, D. Gottfredson, D. MacKenzie, J. Eck, P. Reuter, and S. Bushway, eds., Preventing Crime: What Works, What Doesn't, What's Promising? Washington, D.C.: U.S. Department of Justice, National Institute of Justice, 1997), pp. 2-1 through 2-26. Available at: www.ncjrs.gov/works/
[3] Karoly, Lynn A., M. R. Kilburn, J. H. Bigelow, J. P. Caulkins, J. S. Cannon, and J. R. Chiesa, "Assessing Costs and Benefits of Early Childhood Intervention Programs, Executive Summary," Santa Monica, CA: RAND, 2001, pp. 2 through 4. The Executive Summary of the report is available at http://www.rand.org/pubs/monograph_reports/MR1336.1/ and the full report is available at http://www.rand.org/pubs/monograph_reports/MR1336/index.html.
[4] KRA Corporation, A Guide to Evaluating Crime Control of Programs in Public Housing, Rockville, Md.: Office of Policy Development and Research, Department of Housing and Urban Development, 1997. (from ASCII text file, www.huduser.org/publications/txt/guide.txt, see Chapter 3.) To order hardcopy and accompanying workbook, visit:www.huduser.org/publications/pubasst/crimepre.html

Back to Top