This is the accessible text file for GAO report number GAO-15-50 
entitled 'Veterans' Disability Benefits: Improvements Could Further 
Enhance Quality Assurance Efforts' which was released on November 19, 
2014. 

This text file was formatted by the U.S. Government Accountability 
Office (GAO) to be accessible to users with visual impairments, as 
part of a longer term project to improve GAO products' accessibility. 
Every attempt has been made to maintain the structural and data 
integrity of the original printed product. Accessibility features, 
such as text descriptions of tables, consecutively numbered footnotes 
placed at the end of the file, and the text of agency comment letters, 
are provided but may not exactly duplicate the presentation or format 
of the printed version. The portable document format (PDF) file is an 
exact electronic replica of the printed version. We welcome your 
feedback. Please E-mail your comments regarding the contents or 
accessibility features of this document to Webmaster@gao.gov. 

This is a work of the U.S. government and is not subject to copyright 
protection in the United States. It may be reproduced and distributed 
in its entirety without further permission from GAO. Because this work 
may contain copyrighted images or other material, permission from the 
copyright holder may be necessary if you wish to reproduce this 
material separately. 

United States Government Accountability Office: 
GAO: 

Report to the Chairman, Committee on Veterans' Affairs, House of 
Representatives: 

November 2014: 

Veterans' Disability Benefits: 

Improvements Could Further Enhance Quality Assurance Efforts: 

GAO-15-50: 

GAO Highlights: 

Highlights of GAO-15-50, a report to the Chairman, Committee on 
Veterans' Affairs, House of Representatives. 

Why GAO Did This Study: 

With a backlog of disability compensation claims, VBA faces 
difficulties in improving the accuracy and consistency of the claim 
decisions made by staff in its 57 regional offices. To help achieve 
its goal of 98 percent accuracy by fiscal year 2015, VBA recently 
implemented a new way of measuring accuracy and changed several 
quality assurance activities to assess the accuracy and consistency of 
decisions and to provide feedback and training to claims processors. 
GAO was asked to examine VBA's quality assurance activities. 

This report evaluates (1) the extent to which VBA effectively measures 
and reports the accuracy of its disability compensation claim 
decisions and (2) whether VBA's other quality assurance activities are 
coordinated and effective. GAO analyzed VBA claims and STAR accuracy 
data from fiscal year 2013 (the most recent fiscal year for which 
complete data are available); reviewed relevant federal laws, VBA 
guidance, and other documents relevant to quality assurance 
activities; and interviewed VBA staff from headquarters and four VBA 
regional offices (selected to achieve variety in geography, workload, 
and accuracy rates), as well as veteran service organization officials. 

What GAO Found: 

The Veterans Benefits Administration (VBA)—-within the Department of 
Veterans Affairs-—measures and reports the accuracy of its disability 
compensation claim decisions in two ways: (1) by claim and (2) by 
disabling condition, though its approach has limitations. When 
calculating accuracy rates for either measure through its Systematic 
Technical Accuracy Review (STAR), VBA does not always follow generally 
accepted statistical practices, resulting in imprecise performance 
information. For example, VBA does not adjust its accuracy estimates 
to reflect that it samples the same number of claims for review from 
each regional office—-despite their varying workloads—-and thus 
produces imprecise estimates of national and regional accuracy. 
Further, VBA reviews about 39 percent (over 5,000) more claims 
nationwide than is necessary to achieve its desired precision in 
reported accuracy rates, thereby diverting limited resources from 
other important quality assurance activities, such as targeted reviews 
of error-prone cases. In addition to issues with its statistical 
practices, VBA's process for selecting claims for STAR review creates 
an underrepresentation of claims that are moved between regional 
offices, which may inflate accuracy estimates because these claims 
have had historically lower accuracy rates. Finally, VBA has not 
clearly explained in public reports the differences in how its two 
accuracy measures are calculated or their associated limitations, as 
suggested by best practices for federal performance reporting. 

VBA has taken steps to enhance and coordinate its other quality 
assurance activities, but GAO found shortcomings in how VBA is 
implementing and evaluating these activities. To improve local 
accuracy, VBA created regional office quality review teams (QRTs) with 
staff dedicated primarily to performing local accuracy reviews. QRTs 
assess individual claims processor performance and conduct special 
reviews to forestall certain types of errors. In addition, VBA began 
using questionnaires for assessing decision-making consistency, which 
are more efficient to administer than VBA's prior approach to 
conducting consistency studies. VBA also coordinates quality assurance 
efforts by disseminating national accuracy and consistency results, 
trends, and related guidance to regional offices for use in training 
claims processors. Further, VBA uses STAR results to inform other 
quality assurance activities, such as focusing certain QRT reviews on 
commonly made errors. However, GAO identified implementation 
shortcomings that may detract from the effectiveness of VBA's quality 
assurance activities. For example, contrary to accepted practices for 
ensuring the clarity and validity of questionnaires, VBA did not pre-
test its consistency questionnaires to ensure the clarity of questions 
or validity of the expected results, although VBA officials indicated 
that they plan to do so for future questionnaires. In contrast with 
federal internal control standards that call for capturing and 
distributing information in a form that allows people to efficiently 
perform their duties, staff in the four regional offices that we 
visited had trouble finding the guidance they needed to do their work, 
which could affect the accuracy as well as the speed with which staff 
decide claims. Federal standards also call for knowing the value of 
efforts such as quality assurance activities and monitoring their 
performance over time; however, VBA has not evaluated the effect of 
its special QRT reviews or certain consistency studies on improving 
targeted accuracy rates, and lacks clear plans to do so. 

What GAO Recommends: 

GAO is making eight recommendations to VA to improve its measurement 
and reporting of accuracy, review the multiple sources of policy 
guidance available to claims processors, enhance local data systems, 
and evaluate the effectiveness of quality assurance activities. VA 
concurred with all of GAO's recommendations. 

View [hyperlink, http://www.gao.gov/products/GAO-15-50]. For more 
information, contact Daniel Bertoni at (202) 512-7215 or 
bertonid@gao.gov. 

[End of section] 

Contents: 

Letter: 

Background: 

VBA's Approach to Measuring and Reporting Accuracy of Claim Decisions 
Has Limitations: 

VBA Has Enhanced and Coordinated Its Quality Assurance Activities, 
Though Gaps in Implementation May Limit Their Effectiveness: 

Conclusions: 

Recommendations for Executive Action: 

Agency Comments and Our Evaluation: 

Appendix I: Objectives, Scope and Methodology: 

Appendix II: Statistical Sampling Methodology: 

Appendix III: Comments from the Department of Veterans Affairs: 

Appendix IV: GAO Contact and Staff Acknowledgments: 

Tables: 

Table 1: Regional Offices Selection Criteria: 

Table 2: STAR Monthly Sample Results for VBA's Boston Regional Office, 
Fiscal Year 2013: 

Figures: 

Figure 1: Effect of Weighting on Regional Office Claim-Based Accuracy 
Rankings, Fiscal Year 2013: 

Figure 2: Ranking of Weighted Estimates of Claim-Based Accuracy Rates 
with 95 Percent Confidence Intervals, by Regional Office, Fiscal Year 
2013: 

Figure 3: Claim-Based and Issue-Based Accuracy Rates with 95 Percent 
Confidence Intervals by Number of Issues Claimed, Fiscal Year 2013: 

Abbreviations: 

IRR: inter-rater reliability: 

OIG: Office of Inspector General: 

QRT: quality review team: 

RVSR: Rating Veterans Service Representative: 

STAR: Systematic Technical Accuracy Review: 

VA: Department of Veterans Affairs: 

VBA: Veterans Benefits Administration: 

VSR: Veterans Service Representative: 

[End of section] 

United States Government Accountability Office: 
GAO:
441 G St. N.W. 
Washington, DC 20548: 

November 19, 2014: 

The Honorable Jeff Miller: 
Chairman: 
Committee on Veterans' Affairs: 
House of Representatives: 

Dear Mr. Chairman: 

The Department of Veterans Affairs' (VA) disability compensation 
program provides cash benefits to veterans for disabling conditions 
incurred or aggravated while in military service. In fiscal year 2013, 
VA paid $53.6 billion in disability compensation to 3.6 million 
veterans. Within VA, the Veterans Benefits Administration (VBA), which 
is charged with processing disability compensation claims, faces a 
backlog of claims, due in part to the recent wars in Iraq and 
Afghanistan and the increasing number of servicemembers leaving the 
military. At the same time, VBA set a goal of achieving 98 percent 
accuracy in fiscal year 2015 for compensation claim decisions, which 
are made by staff in 57 VBA regional offices. Accurate claim decisions 
can help ensure that VBA is paying disability benefits only to those 
entitled to such benefits, in the correct amounts. Meanwhile, 
consistent decisions help ensure that veterans' claims receive 
comparable treatment, regardless of which VBA adjudicator or regional 
office processes the claim. 

Questions have been raised about recent changes in the calculation of 
VBA's national accuracy rate, which is based on its national 
Systematic Technical Accuracy Review (STAR), and whether such changes 
reflect reliable measures of accuracy and VBA's commitment to serving 
veterans. GAO and VA's Office of Inspector General (OIG) have also 
previously reported on shortcomings in VBA's quality assurance 
activities.[Footnote 1] This report examines (1) the extent to which 
VBA effectively measures and reports the accuracy of compensation 
claim decision-making, and (2) whether VBA's other quality assurance 
activities are coordinated and effective. 

To determine the extent to which VBA effectively measures the accuracy 
of compensation claim decisions, we reviewed STAR guidance, reports, 
and data and interviewed cognizant staff. We assessed VBA's sampling 
methodology and analyzed STAR and other VBA data on claims processed 
and reviewed from October 2012 through September 2013. We focused on 
the STAR process for reviewing disability compensation claims that 
were evaluated by VBA.[Footnote 2] We did not review quality assurance 
efforts involving pension claims or appealed cases.[Footnote 3] To 
assess how VBA reports accuracy, we reviewed relevant VBA performance 
reports and compared VBA practices with legal requirements for agency 
performance reporting and related GAO work.[Footnote 4] To determine 
whether VBA's quality assurance activities are coordinated and 
effective, we reviewed VBA quality assurance policies, reports and 
guidance to identify key quality assurance activities, and then 
examined each activity's function and process by reviewing relevant 
guidance and policy documents and interviewing central office 
officials. We also interviewed VBA officials from four regional 
offices to gain their perspectives on how quality assurance activities 
are implemented at the regional office level, as well as how 
information is shared among quality assurance activities.[Footnote 5] 
We compared VBA's quality assurance activities against its internal 
guidance and standards for internal control in the federal government. 
[Footnote 6] We also reviewed VBA's methods for designing and 
implementing its consistency studies against generally accepted 
practices in survey and questionnaire development. We assessed the 
reliability of VBA data used for all our analyses and determined that 
they were sufficiently reliable for the purposes of providing 
information on trends in claims decisions. For additional details on 
our objectives, scope, and methodology, see appendix I. 

We conducted this performance audit from September 2013 to November 
2014 in accordance with generally accepted government auditing 
standards. Those standards require that we plan and perform the audit 
to obtain sufficient, appropriate evidence to provide a reasonable 
basis for our findings and conclusions based on our audit objectives. 
We believe that the evidence obtained provides a reasonable basis for 
our findings and conclusions based on our audit objectives. 

Background: 

VA pays monthly disability compensation to veterans with service-
connected disabilities (i.e., injuries or diseases incurred or 
aggravated while on active military duty) according to the severity of 
the disability.[Footnote 7] VBA staff in 57 regional offices process 
disability compensation claims.[Footnote 8] These claims processors 
include Veterans Service Representatives (VSR) who gather evidence 
needed to determine entitlement, and Rating Veterans Service 
Representatives (RVSR) who decide entitlement and the rating 
percentage. Veterans may claim more than one medical condition, and a 
rating percentage is assigned for each claimed medical condition, as 
well as for the claim overall.[Footnote 9] In fiscal year 2013, VBA 
decided more than 1 million compensation claims. 

Since fiscal year 1999, VBA has used STAR to measure the decisional 
accuracy of disability compensation claims. Through the STAR process, 
VBA reviews a stratified random sample of completed claims, and 
certified reviewers use a checklist to assess specific aspects of each 
claim.[Footnote 10] Specifically, for each of the 57 regional offices, 
completed claims are randomly sampled each month and the data are used 
to produce estimates of the accuracy of all completed claims. VA 
reports national estimates of accuracy from its STAR reviews to 
Congress and the public through its annual performance and 
accountability report and annual budget submission. VBA also produces 
regional office accuracy estimates, which it uses to manage the 
program. Regional office and national accuracy rates are reported in a 
publicly available performance database, the Aspire dashboard. 
[Footnote 11] 

Prior to October 2012, VBA's estimates of accuracy were claim-based; 
that is, claims free of errors that affect veterans' benefits were 
considered accurate and, conversely, claims with one or more errors 
that affect benefits were considered inaccurate.[Footnote 12] 
Beginning in October 2012, VBA also began using STAR data to produce 
issue-based estimates of accuracy that measure the accuracy of 
decisions on the individual medical conditions within each claim. For 
example, a veteran could submit one claim seeking disability 
compensation for five disabling medical conditions. If VBA made an 
incorrect decision on one of those conditions, the claim would be 
counted as 80 percent accurate under the new issue-based measure. By 
comparison, under the existing claim-based measure, the claim would be 
counted as 0 percent accurate unless the error did not affect benefits 
when considered in the context of the whole claim. In March 2014, VBA 
reported a national estimate of issue-based accuracy in its fiscal 
year 2015 annual budget submission and plans to update this estimate 
in VA's next performance and accountability report. VBA also produces 
issue-based estimates by regional office, and reports them in the 
Aspire dashboard. For fiscal year 2013, the regional office claim-
based accuracy rates ranged from an estimated 78.4 to 96.8 percent, 
and the issue-based accuracy rates ranged from an estimated 87.0 to 
98.7 percent. 

Beyond STAR, VBA has programs for conducting regional office quality 
reviews and for measuring the consistency of decisions. In March 2012, 
VBA established quality review teams (QRT) with one at each regional 
office. A QRT conducts individual quality reviews of claims 
processors' work for performance assessment purposes. The QRT also 
conducts in-process reviews before claims are finalized to help 
prevent inaccurate decisions by identifying specific types of common 
errors. Such reviews also serve as learning experiences for staff 
members. Since fiscal year 2008, VBA has also conducted studies to 
assess the consistency of disability claims decisions across regional 
offices. Initially, this initiative used inter-rater reliability (IRR) 
studies to assess the extent to which a cross-section of claims 
processors from all regional offices agree on an eligibility 
determination when reviewing the entire body of evidence from the same 
claim. In 2013, VBA revised its approach and began using 
questionnaires as its primary means for assessing consistency. A 
questionnaire includes a brief scenario on a specific medical 
condition for which claims processors must correctly answer several 
multiple-choice questions. 

VBA's Approach to Measuring and Reporting Accuracy of Claim Decisions 
Has Limitations: 

VBA Does Not Follow Accepted Statistical Practices and Thus Generates 
Imprecise Accuracy Data: 

When calculating accuracy rates, VBA does not always follow generally 
accepted statistical practices. For example, VBA does not weight the 
results of its STAR reviews to reflect its approach to selecting 
claims by regional office, which can affect the accuracy of 
estimates.[Footnote 13] According to our analysis of VBA data, 
weighting would have resulted in a small change to VBA's nationwide 
claim-based accuracy rate for fiscal year 2013--from 89.5 to 89.1 
percent.[Footnote 14] At the regional level, 29 of the 57 offices 
would have experienced a somewhat greater increase or decrease in 
their accuracy rates.[Footnote 15] Without taking weighting into 
consideration, regional office accuracy performance may be misleading 
and VBA management may focus corrective action or positive recognition 
on the wrong offices. For example, by taking weighting into account 
for the 57 regional offices in fiscal year 2013, the Reno regional 
office would have improved in relative accuracy by 12 places (from 
34th to 22nd place), whereas the Los Angeles office would have 
declined in relative accuracy by 10 places (from 46th to 56th place) 
(see figure 1). 

Figure 1: Effect of Weighting on Regional Office Claim-Based Accuracy 
Rankings, Fiscal Year 2013: 

[Refer to PDF for image: horizontal bar graph] 

Regional Offices: 

Fort Harrison, MT; 
Rank: 1; 
Change in rank: None. 

Milwaukee, WI; 
Rank: 2; 
Change in rank: Moved up in rank 2. 

Togus, ME; 
Rank: 3; 
Change in rank: None. 

Lincoln, NE; 
Rank: 4; 
Change in rank: Moved down in rank 2. 

Sioux Falls, SD; 
Rank: 5; 
Change in rank: Moved up in rank 1. 

Boise, ID; 
Rank: 6; 
Change in rank: Moved up in rank 1. 

Columbia, SC; 
Rank: 7; 
Change in rank: Moved down in rank 2. 

Nashville, TN; 
Rank: 8; 
Change in rank: None. 

Saint Paul, MN; 
Rank: 9; 
Change in rank: Moved up in rank 1. 

Des Moines, IA; 
Rank: 10; 
Change in rank: Moved down in rank 1. 

Cheyenne, WY; 
Rank: 11; 
Change in rank: Moved up in rank 1. 

Portland, OR; 
Rank: 12; 
Change in rank: Moved down in rank 1. 

Fargo, ND; 
Rank: 13; 
Change in rank: Moved up in rank 2. 

St. Petersburg, FL; 
Rank: 14; 
Change in rank: Moved up in rank 2. 

Manila, Philippines; 
Rank: 15; 
Change in rank: Moved down in rank 1. 

Muskogee, OK; 
Rank: 16; 
Change in rank: Moved down in rank 3. 

Roanoke, VA; 
Rank: 17; 
Change in rank: Moved up in rank 4. 

Albuquerque, NM; 
Rank: 18; 
Change in rank: Moved down in rank 1. 

Wichita, KS; 
Rank: 19; 
Change in rank: Moved down in rank 1. 

New Orleans, LA; 
Rank: 20; 
Change in rank: Moved down in rank 1. 

Little Rock, AR; 
Rank: 21; 
Change in rank: Moved up in rank 1. 

Reno, NV; 
Rank: 22; 
Change in rank: Moved up in rank 12. 

Oakland, CA; 
Rank: 23; 
Change in rank: Moved down in rank 3. 

Hartford, CT; 
Rank: 24; 
Change in rank: Moved up in rank 11. 

Louisville, KY; 
Rank: 25; 
Change in rank: Moved down in rank 2. 

Cleveland, OH; 
Rank: 26; 
Change in rank: Moved down in rank 2. 

Manchester, NH; 
Rank: 27; 
Change in rank: Moved up in rank 4. 

Denver, CO; 
Rank: 28; 
Change in rank: Moved down in rank 1. 

Philadelphia, PA; 
Rank: 29; 
Change in rank: Moved down in rank 1. 

Chicago, IL; 
Rank: 30; 
Change in rank: Moved down in rank 5. 

Indianapolis, IN; 
Rank: 31; 
Change in rank: Moved down in rank 5. 

Buffalo, NY; 
Rank: 32; 
Change in rank: Moved down in rank 3. 

Salt Lake City, UT; 
Rank: 33; 
Change in rank: Moved up in rank 4. 

Phoenix, AZ; 
Rank: 34; 
Change in rank: Moved down in rank 1. 

Pittsburgh, PA; 
Rank: 35; 
Change in rank: Moved up in rank 1. 

New York, NY; 
Rank: 36; 
Change in rank: Moved up in rank 2. 

Providence, RI; 
Rank: 37; 
Change in rank: Moved down in rank 5. 

Waco, TX; 
Rank: 38; 
Change in rank: Moved up in rank 1. 

Boston, MA; 
Rank: 39; 
Change in rank: Moved down in rank 9. 

Detroit, MI; 
Rank: 40; 
Change in rank: None. 

San Diego, CA; 
Rank: 41; 
Change in rank: Moved up in rank 3. 

Seattle, WA; 
Rank: 42; 
Change in rank: Moved up in rank 7. 

Montgomery, AL; 
Rank: 43; 
Change in rank: Moved down in rank 1. 

Houston, TX; 
Rank: 44; 
Change in rank: Moved down in rank 1. 

Honolulu, HI; 
Rank: 45; 
Change in rank: None. 

San Juan, PR; 
Rank: 46; 
Change in rank: Moved down in rank 5. 

Atlanta, GA; 
Rank: 47; 
Change in rank: Moved up in rank 1. 

White River Junction, VT; 
Rank: 48; 
Change in rank: Moved up in rank 3. 

Saint Louis, MO; 
Rank: 49; 
Change in rank: Moved up in rank 1. 

Anchorage, AK; 
Rank: 50; 
Change in rank: Moved up in rank 4. 

Winston-Salem, NC; 
Rank: 51; 
Change in rank: Moved down in rank 4. 

Wilmington, DE; 
Rank: 52; 
Change in rank: None. 

Newark, NJ; 
Rank: 53; 
Change in rank: Moved up in rank 3. 

Huntington, WV;
Rank: 54; 
Change in rank: Moved up in rank 1. 

Jackson, MS; 
Rank: 55; 
Change in rank: Moved down in rank 2. 

Los Angeles, CA; 
Rank: 56; 
Change in rank: Moved down in rank 10. 

Baltimore, MD; 
Rank: 57; 
Change in rank: None. 

Source: GAO analysis of Systematic Technical Accuracy Review (STAR) 
data of the Veterans Benefits Administration. GAO-15-50. 

[End of figure] 

VBA also does not calculate the confidence intervals associated with 
the accuracy estimates that it generates, which prevents a complete 
understanding of trends over time and comparisons among offices. 
[Footnote 16] Accuracy estimates for different regional offices, or 
for the same office over time, are considered statistically different 
from each other when their confidence intervals do not overlap. As 
such, meaningful comparisons could be made on the basis of our 
analysis between, for example, Fort Harrison's estimated claim-based 
accuracy rate (ranked #1) and New York's estimated claim-based 
accuracy rate (ranked #36) because their confidence intervals did not 
overlap in fiscal year 2013 (see figure 2). Conversely, comparisons 
between Fort Harrison's and Milwaukee's or Pittsburgh's estimated 
claim-based accuracy rates (ranked #2 and #35 respectively)--which had 
overlapping confidence intervals in fiscal year 2013--require a 
statistical test to determine if their differences are statistically 
meaningful.[Footnote 17] In effect, the claim-based accuracy rate of 
Fort Harrison and those of the regional offices with the next 34 
highest reported accuracy rates may not be meaningfully different 
despite being ranked 1 through 35 of 57. Similarly, according to 
agency officials, VBA also does not calculate the confidence intervals 
associated with its newer issue-based accuracy estimates, which 
prevents meaningful comparisons between those estimates as well. 
Because VBA produces issue-based estimates using the same sample drawn 
to produce claim-based estimates, it would have to take extra steps to 
calculate the associated confidence intervals.[Footnote 18] As with 
the claim-based accuracy estimates, not computing the confidence 
intervals associated with issue-based estimates limits VBA's ability 
to monitor its regional offices' relative performance and its overall 
performance over time. 

Figure 2: Ranking of Weighted Estimates of Claim-Based Accuracy Rates 
with 95 Percent Confidence Intervals, by Regional Office, Fiscal Year 
2013: 

[Refer to PDF for image: horizontal bar graph] 

National average FY 13 rate: 80%. 

Regional Offices: 

Fort Harrison, MT; 
Rank: 1; 
Accuracy rate: 
FY 13 rate: 93.4%; 
Lower bound: 2.4%; 
Upperbound: 2.0%. 

Estimates for these locations require statistical significance tests 
to determine whether they differ from Fort Harrison's: 

Milwaukee, WI; 
Rank: 2; 
Accuracy rate: 
FY 13 rate: 93.2%; 
Lower bound: 3.2%; 
Upperbound: 2.0%. 

Togus, ME; 
Rank: 3; 
Accuracy rate: 
FY 13 rate: 92.4%; 
Lower bound: 3.6%; 
Upperbound: 2.2%. 

Lincoln, NE; 
Rank: 4; 
Accuracy rate: 
FY 13 rate: 91.7%; 
Lower bound: 3.5%; 
Upperbound: 2.3%. 

Sioux Falls, SD; 
Rank: 5; 
Accuracy rate: 
FY 13 rate: 91.0%; 
Lower bound: 3.8%; 
Upperbound: 22.5%. 

Boise, ID; 
Rank: 6; 
Accuracy rate: 
FY 13 rate: 90.5%; 
Lower bound: 3.8%; 
Upperbound: 2.6%. 

Columbia, SC; 
Rank: 7; 
Accuracy rate: 
FY 13 rate: 90.4%; 
Lower bound: 3.8%; 
Upperbound: 2.6%. 

Nashville, TN; 
Rank: 8; 
Accuracy rate: 
FY 13 rate: 89.0%; 
Lower bound: 4.2%; 
Upperbound: 3.0%. 

Saint Paul, MN; 
Rank: 9; 
Accuracy rate: 
FY 13 rate: 88.2%; 
Lower bound: 4.5%; 
Upperbound: 3.2%. 

Des Moines, IA; 
Rank: 10; 
Accuracy rate: 
FY 13 rate: 88.5%; 
Lower bound: 4.1%; 
Upperbound: 3.0%. 

Cheyenne, WY; 
Rank: 11; 
Accuracy rate: 
FY 13 rate: 88.3%; 
Lower bound: 4.2%; 
Upperbound: 3.0%. 

Portland, OR; 
Rank: 12; 
Accuracy rate: 
FY 13 rate: 88.3%; 
Lower bound: 4.0%; 
Upperbound: 3.0%. 

Fargo, ND; 
Rank: 13; 
Accuracy rate: 
FY 13 rate: 87.7%; 
Lower bound: 4.3%; 
Upperbound: 3.2%. 

St. Petersburg, FL; 
Rank: 14; 
Accuracy rate: 
FY 13 rate: 86.7%; 
Lower bound: 5.0%; 
Upperbound: 3.6%. 

Manila, Philippines; 
Rank: 15; 
Accuracy rate: 
FY 13 rate: 87.4%; 
Lower bound: 4.3%; 
Upperbound: 3.2%. 

Muskogee, OK; 
Rank: 16; 
Accuracy rate: 
FY 13 rate: 87.4%; 
Lower bound: 4.2%; 
Upperbound: 3.1%. 

Roanoke, VA; 
Rank: 17; 
Accuracy rate: 
FY 13 rate: 87.2%; 
Lower bound: 4.3%; 
Upperbound: 4.3%. 

Albuquerque, NM; 
Rank: 18; 
Accuracy rate: 
FY 13 rate: 86.6%; 
Lower bound: 4.3%; 
Upperbound: 3.3%. 

Wichita, KS; 
Rank: 19; 
Accuracy rate: 
FY 13 rate: 86.0%; 
Lower bound: 4.9%; 
Upperbound: 3.6%. 

New Orleans, LA; 
Rank: 20; 
Accuracy rate: 
FY 13 rate: 85.1%; 
Lower bound: 5.9%; 
Upperbound: 4.1%. 

Little Rock, AR; 
Rank: 21; 
Accuracy rate: 
FY 13 rate: 86.4%; 
Lower bound: 4.3%; 
Upperbound: 3.3%. 

Reno, NV; 
Rank: 22; 
Accuracy rate: 
FY 13 rate: 86.1%; 
Lower bound: 4.6%; 
Upperbound: 3.5%. 

Oakland, CA; 
Rank: 23; 
Accuracy rate: 
FY 13 rate: 85.5%; 
Lower bound: 4.8%; 
Upperbound: 3.6%. 

Hartford, CT; 
Rank: 24; 
Accuracy rate: 
FY 13 rate: 86.0%; 
Lower bound: 4.3%; 
Upperbound: 3.3%. 

Louisville, KY; 
Rank: 25; 
Accuracy rate: 
FY 13 rate: 85.7%; 
Lower bound: 44.6%; 
Upperbound: 3.5%. 

Cleveland, OH; 
Rank: 26; 
Accuracy rate: 
FY 13 rate: 85.4%; 
Lower bound: 4.7%; 
Upperbound: 3.6%. 

Manchester, NH; 
Rank: 27; 
Accuracy rate: 
FY 13 rate: 86.0%; 
Lower bound: 4.3%; 
Upperbound: 3.3%. 

Denver, CO; 
Rank: 28; 
Accuracy rate: 
FY 13 rate: 85.7%; 
Lower bound: 4.6%; 
Upperbound: 3.5%. 

Philadelphia, PA; 
Rank: 29; 
Accuracy rate: 
FY 13 rate: 85.4%; 
Lower bound: 4.7%; 
Upperbound: 3.6%. 

Chicago, IL; 
Rank: 30; 
Accuracy rate: 
FY 13 rate: 84.8%; 
Lower bound: 5.3%; 
Upperbound: 3.9%. 

Indianapolis, IN; 
Rank: 31; 
Accuracy rate: 
FY 13 rate: 85.0%; 
Lower bound: 5.1%; 
Upperbound: 3.8%. 

Buffalo, NY; 
Rank: 32; 
Accuracy rate: 
FY 13 rate: 85.5%; 
Lower bound: 4.5%; 
Upperbound: 3.5%. 

Salt Lake City, UT; 
Rank: 33; 
Accuracy rate: 
FY 13 rate: 84.0%; 
Lower bound: 5.7%; 
Upperbound: 4.2%. 

Phoenix, AZ; 
Rank: 34; 
Accuracy rate: 
FY 13 rate: 84.4%; 
Lower bound: 5.1%; 
Upperbound: 3.9%. 

Pittsburgh, PA; 
Rank: 35; 
Accuracy rate: 
FY 13 rate: 84.7%; 
Lower bound: 4.8%; 
Upperbound: 3.7%. 

Estimates for these locations are statistically different from Fort 
Harrison's: 

New York, NY; 
Rank: 36; Accuracy rate: 
FY 13 rate: 85.4%; 
Lower bound: 3.9%; 
Upperbound: 3.2%. 

Providence, RI; 
Rank: 37; 
Accuracy rate: 
FY 13 rate: 84.4%; 
Lower bound: 4.7%; 
Upperbound: 3.7%. 

Waco, TX; 
Rank: 38; 
Accuracy rate: 
FY 13 rate: 82.5%; 
Lower bound: 5.8%; 
Upperbound: 4.4%. 

Boston, MA; 
Rank: 39; 
Accuracy rate: 
FY 13 rate: 82.4%; 
Lower bound: 4.8%; 
Upperbound: 3.9%. 

Detroit, MI; 
Rank: 40; 
Accuracy rate: 
FY 13 rate: 82.6%; 
Lower bound: 4.7%; 
Upperbound: 3.8%. 

San Diego, CA; 
Rank: 41; 
Accuracy rate: 
FY 13 rate: 81.8%; 
Lower bound: 4.7%; 
Upperbound: 3.9%. 

Seattle, WA; 
Rank: 42; 
Accuracy rate: 
FY 13 rate: 81.3%; 
Lower bound: 4.9%; 
Upperbound: 4.0%. 

Montgomery, AL; 
Rank: 43; 
Accuracy rate: 
FY 13 rate: 81.3%; 
Lower bound: 4.9%; 
Upperbound: 4.0%. 

Houston, TX; 
Rank: 44; 
Accuracy rate: 
FY 13 rate: 79.8%; 
Lower bound: 5.9%; 
Upperbound: 4.7%. 

Honolulu, HI; 
Rank: 45; 
Accuracy rate: 
FY 13 rate: 80.5%; 
Lower bound: 5.2%; 
Upperbound: 4.2%. 

San Juan, PR; 
Rank: 46; 
Accuracy rate: 
FY 13 rate: 80.6%; 
Lower bound: 5.1%; 
Upperbound: 4.2%. 

Atlanta, GA; 
Rank: 47; 
Accuracy rate: 
FY 13 rate: 80.3%; 
Lower bound: 5.0%; 
Upperbound: 4.2%. 

White River Junction, VT; 
Rank: 48; 
Accuracy rate: 
FY 13 rate: 80.1%; 
Lower bound: 5.2%; 
Upperbound: 4.3%. 

Saint Louis, MO; 
Rank: 49; 
Accuracy rate: 
FY 13 rate: 79.6%; 
Lower bound: 5.6%; 
Upperbound: 4.6%. 

Anchorage, AK; 
Rank: 50; 
Accuracy rate: 
FY 13 rate: 79.9%; 
Lower bound: 5.3%; 
Upperbound: 4.4%. 

Winston-Salem, NC; 
Rank: 51; 
Accuracy rate: 
FY 13 rate: 78.3%; 
Lower bound: 5.3%; 
Upperbound: 4.4%. 

Wilmington, DE; 
Rank: 52; 
Accuracy rate: 
FY 13 rate: 78.2%; 
Lower bound: 5.3%; 
Upperbound: 4.5%. 

Newark, NJ; 
Rank: 53; 
Accuracy rate: 
FY 13 rate: 77.2%; 
Lower bound: 5.6%; 
Upperbound: 4.8%. 

Huntington, WV;
Rank: 54; 
Accuracy rate: 
FY 13 rate: 77.2%; 
Lower bound: 5.4%; 
Upperbound: 4.6%. 

Jackson, MS; 
Rank: 55; 
Accuracy rate: 
FY 13 rate: 75.6%; 
Lower bound: 6.6%; 
Upperbound: 5.4%. 

Los Angeles, CA; 
Rank: 56; 
Accuracy rate: 
FY 13 rate: 70.0%; 
Lower bound: 8.5%; 
Upperbound: 7.0%. 

Baltimore, MD; 
Rank: 57; 
Accuracy rate: 
FY 13 rate: 88.4%; 
Lower bound: 0.7%; 
Upperbound: 0.7%. 

Source: GAO analysis of Systematic Technical Accuracy Review (STAR) 
data of the Veterans Benefits Administration. GAO-15-50. 

Note: STAR accuracy estimates are derived from sample data and have 
sampling error associated with them. The confidence interval is a 
range of values around the estimate, which is likely to include the 
actual population value. 

[End of figure] 

VBA's approach to measuring accuracy is also inefficient because it 
reviews more claims than are statistically required to estimate 
accuracy. VBA randomly selects about 21 claims per month from each of 
its regional offices for STAR review, regardless of the offices' 
varying workloads and historical accuracy rates. According to VBA, 
this uniform approach allows the agency to achieve a desired level of 
precision of its accuracy estimates for each regional office.[Footnote 
19] However, accepted statistical practices would allow for fewer 
cases to be reviewed at regional offices where the number of claims 
processed has been relatively small or accuracy has been high. 
According to our analysis of fiscal year 2013 regional office workload 
and accuracy results, VBA could reduce the overall number of claims it 
reviews annually by about 39 percent (over 5,000 claims) and still 
achieve its desired precision for its regional office accuracy 
estimates.[Footnote 20] More efficient sampling could allow VBA to 
select fewer cases for review and free up limited resources for other 
important quality assurance activities, such as additional targeted 
accuracy reviews on specific types of error-prone or complex claims. 
Specifically, reviewing about 5,000 fewer claims could free up about 
1,000 staff days because, according to VBA officials, STAR staff 
review at least 5 claims per day. 

Calculating weighted estimates and confidence intervals, and adjusting 
sampling according to shifting workloads and accuracy rates, requires 
use of statistical methodology. According to VBA officials we 
interviewed, although STAR management used a statistician to help 
develop the way in which they measure accuracy, it currently does not 
use a statistician to, for example, weight STAR results and calculate 
confidence intervals for accuracy estimates. Further, VBA officials 
said they did not consult a statistician when developing the new issue-
based accuracy measure, but rather relied on the same sampling 
methodology and approach for estimating accuracy as for the claim-
based measure. We have previously reported that to be useful, 
performance information must meet users' needs for completeness, 
accuracy, consistency, and validity, among other factors.[Footnote 21] 
In response to our draft July 2014 testimony based on preliminary 
work, VBA officials stated they are exploring alternatives to their 
current methodology for estimating accuracy.[Footnote 22] 

Beyond not following generally accepted statistical practices, VBA's 
STAR review systematically excludes certain claims, which may inflate 
accuracy rate estimates. Specifically, according to VBA officials, 
when a claim moves from one regional office to another, because a 
veteran has moved or workloads are redistributed, the database VBA 
uses to select claims for STAR review does not always reflect the 
office responsible for making the final determination for the 
claim.[Footnote 23] As a result, STAR staff often select for review, 
then subsequently de-select, claims that have changed regional office 
jurisdiction.[Footnote 24] Of the 14,286 rating claims randomly 
selected initially by VBA for review in fiscal year 2013, about 10 
percent were de-selected because of a change in jurisdiction and 
replaced with other randomly selected claims. Those de-selected claims 
are not eligible for STAR review for the regional office that was 
ultimately responsible for the claim, thereby causing an 
underrepresentation of these claims in the STAR sample. Such 
underrepresentation may inflate VBA's reported accuracy rate because 
redistributed claims have historically had lower accuracy rates than 
non-redistributed claims.[Footnote 25] In responding to our draft 
report, VBA indicated it is revising its procedures to ensure that 
claims selected for STAR review are included in the accuracy rate of 
the responsible regional office regardless of whether a change of 
jurisdiction occurred. 

VBA Does Not Report Key Information to Help Users Understand Accuracy 
Metrics: 

Federal agencies should report clear performance information to the 
Congress and the public to ensure that the information is useful for 
decision making. In prior work, we identified clarity as a key 
attribute to a successful performance measure, meaning that the 
measure is clearly stated and its associated methodology is 
identified.[Footnote 26] Measures that lack clarity may confuse or 
mislead users, and not provide a good picture of how well the agency 
is performing. We have also reported on best practices in implementing 
related federal performance reporting requirements, such as those in 
the GPRA Modernization Act of 2010.[Footnote 27] Specifically, 
agencies must disclose information about the accuracy and validity of 
their performance information in their performance plans, including 
the sources for their data and actions to address any limitations. 
[Footnote 28] 

VBA's accuracy reporting lacks methodological details that would help 
users understand the distinction between its two accuracy measures and 
their associated limitations. While VBA's new issue-based measure 
provides some additional perspective on the quality of claim decisions 
to date, VBA has not fully explained in its public reports how the 
issue-based and claim-based measures differ. For example, the issue-
based measure tends to be higher than the claim-based measure because 
the former allows for claims to be considered partially correct, 
whereas the claim-based measure does not. According to VBA officials, 
the issue-based estimate provides a better measure of quality because 
veterans' claims have increasingly included multiple medical issues. 
[Footnote 29] Our analysis of STAR data confirms that as the number of 
issues per claim increases, the chance of at least one issue being 
decided incorrectly within a single claim increases because there are 
more opportunities for error (see figure 3). However, VA did not 
report in its fiscal year 2015 budget request how these measures are 
calculated and why the issue-based measure might be higher than the 
claim-based measure. VA has also not reported these distinctions in 
its Aspire dashboard.[Footnote 30] 

Figure 3: Claim-Based and Issue-Based Accuracy Rates with 95 Percent 
Confidence Intervals by Number of Issues Claimed, Fiscal Year 2013: 

[Refer to PDF for image: horizontal bar graph] 

Issues per claim: 1; Claim-based; 
Accuracy rate: 96.6%; 
Estimated lower bound: 1%; 
Estimated upper bound: 0.7%. 

Issues per claim: 1; Issue-based; 
Accuracy rate: 96.5%; 
Estimated lower bound: 0.8%; 
Estimated upper bound: 0.8%. 

Issues per claim: 2 to 3; Claim-based; 
Accuracy rate: 92.9%; 
Estimated lower bound: 1%; 
Estimated upper bound: 1%. 

Issues per claim: 2 to 3; Issue-based; 
Accuracy rate: 95.9%; 
Estimated lower bound: 0.5%; 
Estimated upper bound: 0.5%. 

Issues per claim: 4 to 6; Claim-based; 
Accuracy rate: 86.6%; 
Estimated lower bound: 1.5%; 
Estimated upper bound: 1.5%. 

Issues per claim: 4 to 6; Issue-based; 
Accuracy rate: 95.5%; 
Estimated lower bound: 0.5%; 
Estimated upper bound: 0.5%. 

Issues per claim: 7 to 10; Claim-based; 
Accuracy rate: 80.7%; 
Estimated lower bound: 2.2%; 
Estimated upper bound: 2.2%. 

Issues per claim: 7 to 10; Issue-based; 
Accuracy rate: 95.5%; 
Estimated lower bound: 0.6%; 
Estimated upper bound: 0.6%. 

Issues per claim: 11 or more; Claim-based; 
Accuracy rate: 73.2%; 
Estimated lower bound: 2.7%; 
Estimated upper bound: 2.7%. 

Issues per claim: 11 or more; Issue-based; 
Accuracy rate: 95.2%; 
Estimated lower bound: 0.7%; 
Estimated upper bound: 0.7%. 

Source: GAO analysis of Systematic Technical Accuracy Review (STAR) 
data of the Veterans Benefits Administration. GAO-15-50. 

[End of figure] 

VBA also counts claims processing errors differently under its claim-
based measure than it does under its issue-based measure but does not 
report these distinctions, which raises questions about the 
transparency and consistency of VBA's accuracy measures. For both 
measures, VBA differentiates between benefit entitlement errors that 
may financially affect the veteran and other errors, such as 
documentation and administrative errors that do not financially affect 
the veteran. For claim-based accuracy, VBA counts errors that 
financially affect the veteran now, but does not count errors that may 
financially affect the veteran in the future, although it works to 
correct both types of errors. For example, if one of several claimed 
medical conditions was rated incorrectly (e.g., 10 percent instead of 
20 percent), but this error did not immediately affect the overall 
rating of the claim, VBA would not consider the claim in error because 
it did not affect the benefits that the veteran would receive. 
[Footnote 31] For the issue-based accuracy measure, however, VBA would 
count this as an error even if the error did not immediately affect 
the veteran's benefits. Unlike claim-based accuracy, issue-based 
accuracy may also include errors that would never affect future 
payments. For example, an incorrect effective date that is within the 
same month as the correct effective date does not affect benefits, but 
is counted as an error in VBA's issue-based accuracy measure. 
Conversely, according to VBA officials, this is not counted as an 
error in its claim-based measure. According to our analysis of STAR 
data, up to 6.9 percent of reviewed claims in fiscal year 2013 had 
these types of errors (i.e., benefit entitlement errors that do not 
immediately and may never affect benefits), and if they were all 
counted as errors, VBA's unweighted claim-based accuracy rate would 
have decreased by about 2 percent.[Footnote 32] 

Further, VA has not explained in public reports that its accuracy 
measures are estimates that have distinct confidence intervals and 
limitations. Users should be aware of these confidence intervals to 
make meaningful comparisons, for example, between the two measures or 
over time for the same measure. In terms of each accuracy measure's 
limitations, the claim-based measure does not provide a sense of the 
proportion of issues that the agency decides correctly because the 
measure counts an entire claim as incorrect if any error is found. On 
the other hand, the issue-based measure does not provide a sense of 
the proportion of claims that the agency decides with no errors. 

VBA Has Enhanced and Coordinated Its Quality Assurance Activities, 
Though Gaps in Implementation May Limit Their Effectiveness: 

VBA Has Taken Steps to Enhance and Coordinate Key Quality Assurance 
Activities: 

In addition to its STAR reviews, VBA's quality assurance framework 
includes other complementary activities, which have been enhanced to 
help meet its goal of 98 percent accuracy in fiscal year 2015. 
Specifically, VBA (1) established quality review teams (QRT) in March 
2012 in regional offices as a means of strengthening its focus on 
quality where claims are processed, and (2) enhanced efforts to assess 
the consistency of decisions. 

Although regional offices were previously responsible for assessing 
individual performance, QRTs represent a departure from the past 
because QRT personnel are dedicated primarily to performing these and 
other local quality reviews.[Footnote 33] In addition, VBA requires 
QRT staff to pass a skills certification test annually--similar to VBA 
requirements for STAR staff and in contrast to requirements for claims 
processors who must pass a test every 2 years. In July 2013, VBA 
issued national guidance to ensure consistent QRT roles and practices 
across regional offices. For example, it included guidance on 
selecting individual quality review claim samples and conducting 
additional reviews for claims processors who do not meet their 
accuracy goals.[Footnote 34] In addition to conducting individual 
quality reviews, QRT personnel are charged with conducting in-process 
reviews of claims that are not yet finalized, looking for specific 
types of common errors. Quality reviewers are also responsible for 
providing feedback to claims processors on the results of their 
quality reviews, typically as reviews are completed, including formal 
feedback from the results of individual quality reviews and more 
informal feedback from the results of in-process reviews. In addition, 
at the four offices we contacted, quality reviewers are available to 
answer questions and provide guidance to claims processors as needed. 

VBA's efforts to assess consistency of claims decisions have also 
expanded in recent years. Up until 2013, VBA largely relied on inter-
rater reliability (IRR) studies to assess consistency, which to date 
have been time consuming and resource intensive. Claims processors 
typically required about 4 hours to review an entire claim. The 
process was administered by proctors in the regional offices and the 
results were hand-graded by national VBA staff. Given the resources 
involved, IRR studies have been typically limited to 300-500 (about 25-
30 percent) claims processors, randomly selected from the regional 
offices. In 2009, VBA expanded its consistency program to include 
questionnaires, which it now relies on more heavily to assess 
consistency. The more streamlined consistency questionnaires require 
less staff time to complete because, in addition to a brief scenario 
on a specific condition, participants have 10 or fewer multiple-choice 
questions to answer. The questionnaires are administered 
electronically through the VA Talent Management System, removing the 
need to proctor or hand-grade the tests, which has allowed VBA to 
significantly increase employee participation. A recent consistency 
questionnaire was taken by about 3,000 claims processing employees--
representing all employees responsible for rating claims. Further, VBA 
now administers these studies more frequently, from about 3 to 24 per 
year. According to VBA officials, they plan to further expand the use 
of consistency studies from two questionnaires per month to six to 
eight per month, pending approval of additional quality assurance 
staff.[Footnote 35] 

VBA also has taken steps to coordinate its quality assurance efforts 
in several ways, such as systematically disseminating information on 
national accuracy and consistency results and trends to regional 
office management and QRTs, which in turn share this information with 
claims processing staff. With respect to STAR, in addition to 
receiving monthly updates on overall accuracy performance, regional 
offices receive quarterly reports with analyses of accuracy 
performance including information by error type. QRT reviewers also 
participate in monthly conference calls with STAR staff during which 
they discuss error trend information. While claims processing staff 
learn about errors they made on claims directly from STAR, managers or 
QRT members at each of the regional offices we contacted noted that 
they also share STAR trend data with claims processors during periodic 
training focused on STAR error trends. With respect to consistency 
studies, regional offices receive national results; regional office-
specific results; and, since February 2014, individual staff results. 
Officials at each of the four regional offices we visited told us QRT 
staff share the results of consistency studies with staff and inform 
claims processors of the correct answers to the questions. 

Coordination also occurs when QRT personnel disseminate guidance and 
support regional office training based on error trends identified 
through STAR and other quality assurance activities. Two of the four 
offices we contacted cited instances where they have used consistency 
study results for training purposes. At one office, the results from a 
consistency study were used to provide training on when to request an 
exam for certain conditions, such as tinnitus. In general, at each of 
the four offices, officials told us that QRT reviewers conduct, or 
work with regional office training coordinators to conduct, periodic 
training forums for claims processors. Regional offices we contacted 
also supplement training with other communications informed by quality 
review results. For example, QRTs at three of the four regional 
offices we contacted produce periodic newsletters for regional office 
claims processors, which include guidance based on errors found in all 
types of reviews. Specifically, at one office, a newsletter was used 
to disseminate guidance on ensuring that a rating decision addresses 
all issues in a claim. The need for this guidance was identified on 
the basis of STAR and local quality review results. 

Lastly, VBA coordinates its quality assurance activities by using STAR 
results to guide other quality assurance efforts. According to VBA 
officials, the agency has used STAR data to identify error trends 
associated with specific medical issues, which in turn were used to 
target efforts to assess consistency of decision-making related to 
those issues. Recent examples are (1) the August 2013 IRR study, which 
examined rating percentages and effective dates assigned for diabetes 
mellitus (including peripheral neuropathy); and (2) a February 2014 
study on obtaining correct disability evaluations on certain 
musculoskeletal and respiratory conditions. In addition, according to 
VBA, the focus of in-process reviews performed by QRTs has been guided 
by STAR error trend data. VBA established in-process reviews in March 
2012 to help the QRTs identify and prevent claim development errors 
related to medical examinations and opinions, which it described as 
the most common error type. More recently, VBA has added two more 
common error types--incorrect rating percentages and incorrect 
effective benefit dates--to its in-process review efforts. VBA 
officials stated that they may add other common error types based on 
future STAR error analyses. 

Some Gaps in Implementation Persist and the Effectiveness of Quality 
Assurance Activities Is Unclear: 

While QRTs reflect VBA's increased focus on quality, during our site 
visits we identified shortcomings in QRT practices and implementation 
that could reduce their effectiveness. Specifically, we identified the 
following shortcomings: (1) the exclusion of claims processed during 
overtime to assess individual performance; (2) the inability to 
correct errors identified before a claim is finalized in certain 
situations; and (3) a lack of pre-testing of consistency 
questionnaires. 

Regarding the first shortcoming, we learned that three of the four 
offices we contacted had agreements with their local unions that 
prevented QRT personnel from reviewing claims processed during 
overtime to assess individual performance.[Footnote 36] As a result, 
those regional offices were limited in their ability to address issues 
with the quality of work performed during overtime. Centrally, VBA 
officials did not know which or how many regional offices excluded 
claims processed during overtime, or the extent to which excluding 
cases worked during overtime occurred nationally. According to VBA 
data, claims processed on overtime represented about 10 percent of 
rating-related claims completed nationally in fiscal year 2013. After 
we reported this finding,[Footnote 37] VBA issued guidance in August 
2014 to regional offices stipulating inclusion of claims processed on 
overtime, and that the regional offices work with their local unions 
to rescind any agreements that exclude such claims from review. 

Second, officials at four regional offices we contacted told us that 
they face a challenge in conducting individual quality and in-process 
reviews as expected[Footnote 38] because VBA's Veterans Benefits 
Management System lacks the capability to briefly pause the process 
and prevent claims from being completed while a review is still 
underway.[Footnote 39] VBA officials acknowledged that this was a 
problem for regional offices in completing reviews, based on anecdotal 
information from regional offices, but did not have information on the 
extent to which this occurred. VBA officials noted that reviews could 
be performed after a claim is completed; however, if an error is 
found, the regional office might need to rework the claim and provide 
the veteran with a revised decision. The officials also noted that VBA 
is working toward modifying its Veterans Benefits Management System to 
address this issue, but is at the initial planning stage of gathering 
requirements and could not provide a time frame for completion. 

Thirdly, although VBA has developed a more streamlined approach to 
measuring consistency, VBA officials told us that consistency 
questionnaires were developed and implemented without any pre-testing, 
which would have helped the agency determine whether the test 
questions were appropriate for field staff and were accurately 
measuring consistency. Pre-testing is a generally accepted practice in 
sound questionnaire development for examining the clarity of questions 
or the validity of the questionnaire results. In the course of our 
review, VBA quality assurance officials noted that they plan to begin 
pre-testing consistency questionnaires as a part of a new development 
process. Specifically, after each questionnaire has been developed, 
two to three quality assurance staff who have claims processing 
experience, but were not involved in the questionnaire's development, 
would be targeted to pre-test it. Quality assurance staff responsible 
for the consistency studies would then adjust the questionnaire if 
necessary before it is administered widely. While initially slated to 
occur in July 2014, VBA quality assurance staff now anticipate pre-
testing to begin in September 2014. 

Beyond these implementation shortcomings, staff in each of the four 
offices we contacted said that several key supports were not 
sufficiently updated to help quality review staff and claims 
processors do their jobs efficiently and effectively. Staff at these 
offices consistently described persistent problems with central 
guidance, training, and data systems. 

* Guidance: Federal internal control standards highlight the need for 
pertinent information being captured and distributed in a form that 
allows people to perform their duties efficiently.[Footnote 40] 
However, regional office quality review staff said they face 
challenges locating the most current guidance among all of the 
information they are provided. Managers or staff at each of the 
regional offices we contacted said that VBA's policy manuals are 
outdated. As a result, staff must search numerous sources of guidance 
to locate current policy, which is time-consuming and difficult. This, 
in turn, could affect the accuracy with which they decide claims. One 
office established a spreadsheet to consolidate guidance because the 
sources were not readily available to claims processors. VBA officials 
acknowledged that there are several ways it provides guidance to 
regional offices. In addition to the existence of relevant regulations 
and VBA's policy and procedures manual, VBA provides guidance to 
claims processors through policy and procedures letters, monthly 
quality calls and notes from these calls, various bulletins, and 
training letters and other materials maintained on VBA's intranet 
site. While agreeing that having multiple sources of guidance could be 
confusing to staff, VBA officials noted they face challenges in 
updating the policy manual and other available guidance materials to 
ensure that they are as current as possible. After we reported on this 
issue,[Footnote 41] VBA officials noted that they are considering 
streamlining the types of guidance provided. They also plan to develop 
a system of consolidated links to guidance documents by alphabetized 
topic to help claims processors access the information more 
efficiently; however, VBA officials acknowledge that developing a 
single repository will be a challenging project and have not yet 
dedicated adequate resources for this effort. 

* Training: Staff in the offices we contacted also said that in some 
cases national training has not been updated to reflect the most 
current guidance, which in turn makes it difficult to provide claims 
processors with the information they need to avoid future errors. For 
example, staff from one regional office noted that training modules on 
an error-prone issue--Individual Unemployability and related effective 
dates of benefits--had not been updated to reflect all new guidance, 
the sources of which included conference calls, guidance letters, and 
frequently asked questions compiled by VBA's central office.[Footnote 
42] Further, officials at regional offices we contacted expressed 
concern that VBA limits their flexibility to update out-of-date course 
materials. In response to these concerns, VBA training officials 
explained that they are continually updating national training to 
reflect new guidance, but how long it takes is a function of the 
extent of the policy change. These officials noted that updating the 
Individual Unemployability training was particularly delayed because 
of numerous, unanticipated changes in policy and related guidance that 
resulted in their setting aside previously updated course materials 
and starting over. VBA training officials also explained that while 
VBA does not allow changes to the contents of courses in its catalog, 
regional offices can propose courses for the catalog, based on their 
needs identified through quality reviews. 

* Data systems: Regional office quality review staff also told us that 
they are required to log errors into three systems or databases that 
do not "speak to one another" and two lack the capability to fully 
track errors trends, thereby limiting their ability to take corrective 
actions. At the regional office level, quality assurance information 
is entered into three different databases or systems.[Footnote 43] 
Staff at each of the four offices we contacted said that the Automated 
Standardized Performance Elements Nationwide system used for tracking 
individual accuracy for performance management purposes lacks 
functionality to create reports on error trends by claimed medical 
issue or reasons for specific types of errors. As a result, three 
offices maintain separate spreadsheets to identify error trends 
related to individual accuracy. Regional office staff also noted that 
one of the two systems used to track in-process reviews does not help 
track error trends, for example, by employee, resulting in two offices 
maintaining additional spreadsheets to track this information. At the 
national level, VBA central office has made some improvements in 
reporting and now has the ability to analyze regional office 
information on errors by medical issue. According to VBA officials, 
they share this information with regional office managers and quality 
staff during training calls. VBA officials stated that a planned 
replacement for its Automated Standardized Performance Elements 
Nationwide system would have addressed reporting limitations at the 
local level, but was halted. As of September 2014, VBA did not have a 
timeframe for restarting the process for acquiring a new system. 

Finally, VBA's efforts to evaluate the effectiveness of its quality 
assurance activities have been limited. Specifically, VBA officials 
told us that although they have not seen an increase in the national 
accuracy rate in fiscal year 2014, the number of errors related to 
claim development has declined, demonstrating the success of QRT 
reviews and training in targeting these errors.[Footnote 44] Also, VBA 
identified 13 regional offices whose issue-based accuracy rates 
improved between the first and third quarters of fiscal year 2014, 
attributing these improvements to actions taken by quality assurance 
staff in fiscal year 2014.[Footnote 45] However, it was not clear from 
the documentation VBA provided whether and how it monitored the 
effectiveness of these actions for all regional offices. With respect 
to consistency studies, VBA also has not evaluated--and lacks plans to 
evaluate--the efficacy of using consistency questionnaires relative to 
the more resource-intensive IRR studies. According to a VBA official, 
the consistency questionnaires have helped identify regional offices 
and individuals in need of further training on the basis of the 
percentage of incorrect answers, as well as the need for national 
training. However, officials could not provide data or evaluations 
indicating that consistency questionnaires have improved accuracy 
rates in the areas studied. 

VBA officials noted that they are considering a new data system that 
would combine all local and national quality assurance data--including 
STAR, in-process reviews, and individual quality reviews--and allow 
for more robust analyses of root causes of errors. Specifically, they 
expect the system will show relationships across the results of 
various quality assurance reviews to determine employee competence 
with various aspects of claims processing. According to VBA officials, 
this system would also enable them to more easily evaluate the 
effectiveness of specific quality assurance efforts. Evaluation can 
help to determine the "value added" of the expenditure of federal 
resources or to learn how to improve performance--or both. It can also 
play a key role in strategic planning and in program management, 
informing both program design and execution.[Footnote 46] Continuous 
monitoring also helps to ensure that progress is sustained over time. 
[Footnote 47] However, VBA officials indicated that this proposal is 
still in the conceptual phase and requires final approval for funding 
and resources. 

Conclusions: 

VBA's dual approach for measuring accuracy is designed to provide 
additional information to better target quality improvement efforts, 
but its methods and practices lack rigor and transparency, thereby 
undermining the usefulness and credibility of its measures. By not 
leveraging a statistician or otherwise following statistical practices 
in developing accuracy estimates, VBA is producing and relying on 
inaccurate estimates to make important internal management decisions. 
Similarly, by using a one-size sampling methodology, VBA is 
unnecessarily expending limited resources that could be used 
elsewhere. The systematic exclusion of redistributed claims and those 
moved between offices further calls into question the rigor of its 
accuracy estimates. Lastly, VBA's reporting of its two accuracy 
metrics lacks sufficient transparency to help members of Congress and 
other stakeholders fully understand the differences and limitations of 
each, and thus may undermine their trust in VBA's reported performance. 

VBA has enhanced and coordinated other aspects of its quality 
assurance framework, but shortcomings in implementation and evaluation 
detract from their overall effectiveness. For example, although VBA is 
disseminating the results of national STAR reviews and consistency 
studies, and local QRTs are using those results to focus related 
training or guidance to claims processing staff, until centralized 
guidance is consolidated and streamlined, staff lack ready access to 
information that will help them prevent errors. Moreover, absent 
adequate system capabilities to support local quality reviews, QRTs 
are unable to stop incorrect decisions from being finalized, and may 
not be aware of error trends that could be mitigated through training 
or other corrective action. Finally, although some of its quality 
assurance activities are relatively new, VBA lacks specific plans to 
evaluate their effectiveness and may miss opportunities to further 
improve or target these activities to more error-prone areas. In 
general, unless VBA takes steps to improve the rigor of all its 
quality assurance methods and practices, VBA may find progress toward 
achieving its goal of 98 percent accuracy in fiscal year 2015 
illusive--especially in the face of challenging workloads, limited 
resources, and expectations of timely claim decisions. 

Recommendations for Executive Action: 

To help improve the quality of VBA's disability compensation claim 
decisions, we recommend that the Secretary of Veterans Affairs direct 
the Under Secretary for Benefits to: 

* Leverage appropriate expertise to help VBA do each of the following: 

- weight its accuracy estimates to reflect the sample design for 
reviewed claims; 

- determine and report the confidence intervals associated with its 
reported accuracy estimates; and: 

- re-examine its approach to calculating the regional office sample 
size for STAR. 

* Take steps to ensure that redistributed claims and those moved 
between regional offices are not underrepresented in the STAR sample. 

* Increase transparency in explaining how the claim-based and issue-
based accuracy rates are calculated as well as their key limitations 
when publicly reporting these metrics. 

* Review the multiple sources of policy guidance VBA provides to 
determine ways to consolidate them or otherwise improve their 
availability and accessibility for use by staff in regional offices. 

* Take steps to ensure that any future upgrades to local data systems 
allow QRTs to pause the claims process when errors are detected and 
enable QRTs to better track error trends. 

* Take additional steps to evaluate the effectiveness of quality 
assurance activities to identify opportunities to improve or better 
target these activities. 

Agency Comments and Our Evaluation: 

We provided a draft of this report to VA for review and comment, and 
its written comments are reproduced as appendix III in this report. VA 
generally agreed with our conclusions and concurred with all of our 
recommendations. The agency outlined how it plans to address our 
recommendations as follows: 

* Regarding our recommendations to leverage appropriate expertise to 
improve its measurement and reporting of accuracy, VA stated that a 
VBA statistician has begun developing a revised sampling methodology 
that takes into consideration output and claims processing accuracy at 
each regional office to determine sample sizes. VBA also plans to 
appropriately weight accuracy estimates and calculate the margins of 
error based on the revised sampling methodology. VBA intends to report 
results based on this new methodology beginning in March 2015. 

* Regarding our recommendation to take steps to ensure that 
redistributed claims and those moved between regional offices are not 
underrepresented in the STAR sample, VA stated that VBA's revised 
sampling methodology will be based on the office completing the claim, 
and that no claims will be excluded from samples due to changes in 
jurisdiction. VBA intends to implement this revised sampling 
methodology by the end of March 2015. 

* Regarding our recommendation to increase transparency in explaining 
how the claim-based and issue-based accuracy rates are calculated, VA 
stated that VBA will describe its sampling, assessment criteria, 
calculation, and reporting methodologies for claim and issue-level 
accuracy as part of future performance documents and public reports. 
VBA anticipates implementing this recommendation by the end of March 
2015. 

* Regarding our recommendation to review the multiple sources of 
policy guidance VBA provides to regional office staff, VA stated that 
in September 2014, VBA began improving the availability and 
accessibility of policy guidance, as well as consolidating references 
to this guidance. VBA anticipates completing this project by the end 
of April 2015. 

* Regarding our recommendation to take steps to ensure that any future 
upgrades to local data systems allow QRTs to pause the claims process 
when errors are detected and enable QRTs to better track error trends, 
VA stated that VBA is designing a new database that will incorporate 
all types of quality reviews (i.e., regional office reviews, STAR, and 
consistency studies) and provide VBA with more data analysis 
capabilities. Although VA did not outline specific steps VBA plans to 
take to upgrade local data systems so that QRTs may pause the claims 
process, VBA plans to implement this recommendation by the end of June 
2015. 

* Regarding our recommendation to take additional steps to evaluate 
the effectiveness of quality assurance activities to identify 
opportunities to improve or better target these activities, VA stated 
that VBA's new database will enable VBA to do so by the end of June 
2015. 

VA also provided technical comments, which we incorporated as 
appropriate. 

We are sending copies of this report to the appropriate congressional 
committees and the Secretary of Veterans Affairs. In addition, the 
report is available at no charge on the GAO website at [hyperlink, 
http://www.gao.gov]. 

If you or your staff have any questions about this report, please 
contact me at (202) 512-7215 or bertonid@gao.gov. Contact points for 
our Offices of Congressional Relations and Public Affairs may be found 
on the last page of this report. GAO staff who made key contributions 
to this report are listed in appendix IV. 

Sincerely yours, 

Signed by: 

Daniel Bertoni: 
Director, Education, Workforce, and Income Security Issues: 

[End of section] 

Appendix I: Objectives, Scope and Methodology: 

The objectives of this report were to examine (1) the extent to which 
the Veterans Benefits Administration (VBA) effectively measures and 
reports the accuracy of compensation claim decision-making, and (2) 
whether VBA's other quality assurance activities are coordinated and 
effective. 

Review of Systematic Technical Accuracy Review (STAR): 

To assess VBA's measurement and reporting of the accuracy of 
compensation claim decision-making, we focused on the STAR process for 
reviewing disability compensation claims that VBA identifies as rating-
related--that is, requiring a decision on the claimant's eligibility 
for benefits and the monthly benefit amount. We did not review quality 
assurance over disability compensation claims that did not involve a 
rating, including adjustments for additional dependents. We also did 
not review quality assurance efforts involving appealed cases, aspects 
of which fall under the Board of Veterans' Appeals. Finally, we did 
not review pension claims, which represent a small portion of VBA's 
disability benefits workload, because VBA is reviewing its approach to 
the accuracy assessment of pension claims. [Footnote 48] 

To determine the extent to which STAR appropriately reflects the 
accuracy of claims, we reviewed VBA policy manuals, the STAR 
checklist, and other tools used in VBA's STAR review. We interviewed 
VBA and Office of Inspector General (OIG) officials to learn whether 
there are claim types that are omitted from STAR review and, if so, 
the reasons for these omissions. To determine how errors are 
identified and counted under STAR, we examined the ways in which the 
checklist and other STAR procedures are used to quantify errors. We 
visited VBA's office in Nashville, Tennessee, where the STAR reviews 
are conducted to observe the review process and program methodology in 
action. We reviewed checklists used to assess accuracy of claims and 
identified information VBA uses on the basis of these checklists to 
calculate accuracy rates. 

To assess the extent to which VBA uses generally accepted statistical 
practices to generate accuracy rates, we analyzed VBA data on claims 
processed and reviewed from October 2012 through September 2013. In 
analyzing STAR data, we calculated the weighted claim-based annual 
accuracy rate for each regional office and nationwide. We then 
calculated the 95 percent confidence intervals associated with these 
estimated accuracy rates. We applied a statistical sample size formula 
suitable for use in a stratified random sample and analyzed the 
differences this approach produced compared to VBA's sample size 
estimation methodology for regional offices. 

We assessed the reliability of VBA's STAR data by performing 
electronic data testing, reviewing related documentation, and 
interviewing knowledgeable agency officials. We also assessed the 
reliability of VBA's claim processing data by interviewing 
knowledgeable agency officials about the data. To electronically 
assess the reliability of the STAR data, we tested for duplicate 
benefit records, tested the claim disposition date field to ensure we 
only analyzed STAR claims from fiscal year 2013, checked the benefit 
claim end product code to ensure we only included benefit claims with 
end product codes eligible for inclusion in the STAR accuracy sample, 
checked for missing data in key analysis variables, and examined the 
range of values in key variables to check for outliers. We determined 
that the data were sufficiently reliable for our purposes. 

To assess how VBA reports accuracy, we identified and reviewed 
relevant VBA performance reports, such as VA's Performance and 
Accountability Report and Aspire dashboard data. We also interviewed 
VBA officials about the rationale for creating the issue-based 
accuracy measure, and the agency's plans for reporting its performance 
on accuracy and consistency. We compared VBA practices with legal 
requirements for agency performance reporting such as the GPRA 
Modernization Act of 2010 and related GAO work (e.g., GAO, Managing 
For Results: GPRA Modernization Act Implementation Provides Important 
Opportunities to Address Government Challenges, GAO-11-617T, 
Washington, D.C.: May 10, 2011). 

Coordination and Effectiveness of Quality Assurance Activities: 

To determine whether VBA's quality assurance activities are 
coordinated and effective, we reviewed VBA quality assurance policies, 
reports, and guidance to identify key quality assurance activities. 
Based on this review, we focused on quality review teams (QRT), which 
are located in each regional office and responsible for local quality 
assurance, as well as on VBA's consistency program that is 
administered by VBA's centralized quality assurance staff. We then 
examined each activity's function and process by reviewing relevant 
guidance and policy documents and interviewing central office 
officials. Specifically: 

* We reviewed VBA policy and procedure documents for quality review 
teams (QRT) to learn the purposes of, and the information generated 
by, these efforts. In addition, we interviewed VBA central office and 
regional office officials to gather their perspectives on any 
redundancy or gaps between quality assurance efforts. We compared the 
functions of and information yielded by quality assurance components 
with the framework laid out in VBA's Quality Assurance Program Plan, 
as well as standards for internal control in the federal government 
(see GAO, Standards for Internal Control in The Federal Government, 
[hyperlink, http://www.gao.gov/products/GAO/AIMD-00-21.3.1, 
Washington, D.C. : November 1999). In addition, we interviewed VBA 
regional office officials to learn about processes QRTs follow and how 
these procedures may vary across regional offices. We also reviewed 
and compared VBA criteria for QRT staff, STAR reviewer, and claims 
processor certification. 

* We reviewed documents and interviewed VBA officials to learn more 
about the recent changes to the agency's approach to assessing 
consistency. More specifically, we explored the rationale for the 
change from using inter-rater reliability (IRR) studies to using 
consistency questionnaires. We assessed the development and 
implementation of the recent consistency questionnaires by, for 
example, examining VBA's consideration of pre-testing the instruments 
using generally accepted survey procedures, and how pre-testing may 
affect the resulting measures of consistency. Finally, to further 
determine how consistency questionnaires are complementary with other 
quality assurance efforts, we reviewed VBA's process for determining 
topics for consistency questionnaires. Specifically, we asked about 
the methods used to select and prioritize topics, including the extent 
to which officials use findings from QRTs and STAR. 

To further determine what and how information is shared among quality 
assurance components and how this coordination helps to identify 
problem areas, we interviewed VBA regional office officials to gather 
their perspectives on how information is shared from STAR, QRT, 
consistency studies, and regional office compliance visits and how 
that information-sharing could be improved. We interviewed officials 
at the regional level to gain their perspectives on coordination and 
effectiveness of all of VBA's quality assurance activities. At each 
office, we spoke with service center managers and quality assurance 
staff, as well as representatives of local veteran service 
organizations. The regional offices were selected to reflect a range 
of characteristics related to: (1) geography (at least one regional 
office in each of VA's four areas), (2) number of claims processed 
annually, (3) claim-based accuracy rates, and (4) issue-based accuracy 
rates. We did not identify specific quality assurance pilots or 
initiatives being tested in regional offices. We selected 4 of VBA's 
57 regional offices for review. We visited the Oakland and Newark 
regional offices and conducted telephone interviews with Nashville and 
Waco regional office staff. Table 1 provides information about the 
regional offices we selected to visit. 

Table 1: Regional Offices Selection Criteria: 

Office: Oakland, CA; 
VBA Area: Western; 
Compensation caseload: 21st; 
STAR rating accuracy (claims): 20th; 
STAR rating accuracy (issues): 34th. 

Office: Newark, NJ; 
VBA Area: Eastern; 
Compensation caseload: 41st; 
STAR rating accuracy (claims): 54th; 
STAR rating accuracy (issues): 55th. 

Office: Nashville, TN; 
VBA Area: Southern; 
Compensation caseload: 6th; 
STAR rating accuracy (claims): 7th; 
STAR rating accuracy (issues): 5th. 

Office: Waco, TX; 
VBA Area: Central; 
Compensation caseload: 2nd; 
STAR rating accuracy (claims): 38th; 
STAR rating accuracy (issues): 36th. 

Source: GAO analysis of VA data. 

[End of table] 

[End of section] 

Appendix II: Statistical Sampling Methodology: 

This appendix provides additional technical details on ratio 
estimation for producing issue-based accuracy rates, as well as the 
audit work we did to re-estimate the regional office Systematic 
Technical Accuracy Review (STAR) sample sizes using a formula for 
stratified random probability samples. 

Ratio Estimation: 

Because STAR is designed to sample claims and produce an estimate of 
the claim-based accuracy rate and because the number of medical issues 
per claim varies, ratio estimation should be used to develop issue-
based accuracy rates. Furthermore, during their review of sampled 
claims, STAR reviewers may find that one or more inferred issues were 
missed or, conversely, that the review process included one or more 
issues inappropriately. Thus, the STAR sample of claims must be used 
to estimate both the total number of issues as well as the number of 
issues that were processed correctly. With respect to STAR, ratio 
estimation takes the form shown below. 

[formula] 

In the formula, the subscript i represents the regional office, the 
subscript j represents the month of the fiscal year, represents the 
monthly sample size for regional office i in month j, represents the 
stratum sampling weight for regional office i in month j, represents 
the number of issues adjudicated correctly on claim k in month j and 
regional office i and represents the total number of issues on claim k 
in month j and regional office i. 

The ability to calculate a ratio estimate and its associated 
confidence interval are available in most statistical software 
applications. 

Sample Size Re-Estimation: 

Each month the Veterans Benefits Administration (VBA) selects a random 
sample of benefit claims within each VA regional office to review 
under the STAR program. The measure of interest is the estimated 
percent of claims that were processed correctly by VBA regional office 
staff. The sample size formula used by VBA to derive the number of 
claims to select in each VBA regional office is shown below. 

[formula] 

In the formula, Z = the quantile from the Normal distribution for the 
desired level of confidence. The desired margin of sampling error is 
denoted by E. The assumed percent of accuracy in the population is 
denoted by P, and Q is defined as Q = (1 - P). For their calculations, 
VBA uses the following values: 

[formula] 

When these values are plugged into the equation, n = 246. This is 
VBA's target annual sample size for each VA regional office. With 57 
regional offices, this translates into 14,022 claims selected 
nationally per fiscal year in the STAR sample. On a monthly basis, 
when divided by 12, 246/12 = 20.5 which rounds up to 21. Thus, VBA's 
monthly sample size for each regional office is 21 claims. By 
definition, the sample frame for each month is the set of veteran 
benefit claims completed by the regional office within the previous 
month. 

The standard statistical formula for the sample size calculation with 
a stratified random sample is shown below. We applied this formula to 
determine an annual total sample size for a regional office in the 
coming fiscal year using observed monthly accuracy rates and monthly 
number of claims completed from the previous fiscal year. 

[formula] 

In turn, this initial sample size is adjusted with the finite 
population correction factor. The formula for the adjusted sample size 
is shown below. 

[formula] 

In these formulas, the terms Z and E are defined as before in the 
sample size formula currently used by VBA. The term is the observed 
historical monthly accuracy rate for the regional office in the prior 
fiscal year. The term is defined as . The term represents the monthly 
fraction of the annual total number of claims processed by the 
regional office in the prior fiscal year. The term is the total number 
of claims processed by the regional office in the prior fiscal year. 
Because STAR is intended for monitoring benefit claim processing, we 
re-set the value of to a value of 0.90 for any month where in order to 
ensure a minimum monthly sample allocation. In order to demonstrate 
how this formula works in practice, data for the Boston regional 
office are shown in table 2 as an example. 

Table 2: STAR Monthly Sample Results for VBA's Boston Regional Office, 
Fiscal Year 2013: 

Month: October; 
Claims Processed: 581; 
Stratum: 1; 
Monthly Accuracy Rate: 0.905. 

Month: November; 
Claims Processed: 453; 
Stratum: 2; 
Monthly Accuracy Rate: 0.941. 

Month: December; 
Claims Processed: 537; 
Stratum: 3; 
Monthly Accuracy Rate: 0.905. 

Month: January; 
Claims Processed: 671; 
Stratum: 4; 
Monthly Accuracy Rate: 1.000. 

Month: February; 
Claims Processed: 562; 
Stratum: 5; 
Monthly Accuracy Rate: 0.950. 

Month: March; 
Claims Processed: 971; 
Stratum: 6; 
Monthly Accuracy Rate: 0.905. 

Month: April; 
Claims Processed: 930; 
Stratum: 7; 
Monthly Accuracy Rate: 0.850. 

Month: May; 
Claims Processed: 1,207; 
Stratum: 8; 
Monthly Accuracy Rate: 0.842. 

Month: June; 
Claims Processed: 1,408; 
Stratum: 9; 
Monthly Accuracy Rate: 0.900. 

Month: July; 
Claims Processed: 1,668; 
Stratum: 10; 
Monthly Accuracy Rate: 0.955. 

Month: August; 
Claims Processed: 2,023; 
Stratum: 11; 
Monthly Accuracy Rate: 0.818. 

Month: September; 
Claims Processed: 1,575; 
Stratum: 12; 
Monthly Accuracy Rate: 0.818. 

Month: Total; 
Claims Processed: 12,586; 
Monthly Accuracy Rate: 0.899. 

Source: GAO analysis of Systematic Technical Accuracy Review (STAR) 
data of the Veterans Benefits Administration (VBA). 

[End of table] 

Here are the calculations for the Boston regional office using the 
data in table 1. 

[formula] 

Calculation gives and n=158 after applying the correction for sampling 
from a finite population. 

The effect of applying a stratified random sample formula, which uses 
historical observed monthly accuracy rates from the prior fiscal year 
as well as accounting for the population size from the prior fiscal 
year, is a reduction in the needed annual STAR sample size from 246 to 
158 claims for the Boston regional office. 

[End of section] 

Appendix III: Comments from the Department of Veterans Affairs: 

Department of Veterans Affairs: 
Washington, DC 20420: 

October 28, 2014: 

Mr. Daniel Bertoni: 
Director, Education, Workforce, and Income Security Issues: 
U.S. Government Accountability Office: 
441 G Street, NW: 
Washington, DC 20548: 

Dear Mr. Bertoni: 

The Department of Veterans Affairs (VA) has reviewed the Government
Accountability Office's (GAO) draft report, "Veterans' Disability 
Benefits: Improvements Could Further Enhance Quality Assurance 
Efforts" (GAO-15-50). VA generally agrees with GAO's conclusions and 
concurs with GAO's recommendations to the Department. 

The enclosure specifically addresses GAO's recommendations and 
provides an action plan for each, and provides technical comments to 
the draft report. VA appreciates the opportunity to comment on your 
draft report. 

Sincerely, 

Signed by: 

Jose D. Riojas: 
Chief of Staff: 

Enclosure; 

Department of Veterans Affairs (VA) Response to Government 
Accountability Office (GAO) Draft Report "Veterans' Disability 
Benefits: Improvements Could Further Enhance Quality Assurance Efforts"
(GAO-15-50): 

GAO Recommendation: To help improve the quality of VBA's disability
compensation claim decisions, GAO recommends that the Secretary of 
Veterans Affairs direct the Under Secretary for Benefits to: 

Recommendation 1: Leverage appropriate expertise to help VBA do each 
of the following: 

* weight its accuracy estimates to reflect the sample design for 
reviewed claims; 

* determine and report the confidence intervals associated with its 
reported accuracy estimates; and; 

* re-examine its approach to calculating the regional office sample 
size for STAR. 

VA Comment: Concur. A statistician from the Veterans Benefits 
Administration (VBA) Office of Performance Analysis and Integrity 
(PA&I) is working to revise the sampling and reporting of VBA's 
compensation and pension claims processing accuracy program. In August 
2014, VBA initiated a review of the accuracy sampling and reporting 
process. As a result, VBA is developing a sample methodology to consider
output and claims processing accuracy at each station to determine 
sample sizes. Additionally, all work completed will be sampled 
according to the regional office that completed the claim, eliminating 
cases that have historically been excluded from quality review due to 
brokering. VBA will calculate margins of error and appropriately weigh
accuracy estimates based on the revised sampling methodology. 

In addition, VBA has completed a thorough review of the current 
Systematic Technical Accuracy Review (STAR) process and will make 
necessary programming changes and test the new sampling methodology in 
December 2014. The new quality samples will be created beginning in 
January 2015 for cases completed in December 2014. Reporting for these 
cases will begin in March 2015 and will include the confidence 
intervals for each regional office. Target Completion Date: March 31, 
2015. 

Recommendation 2: Take steps to ensure that redistributed claims and 
those moved between regional offices are not underrepresented in the 
STAR sample. 

VA Comment: Concur. VBA's revised sampling methodology will be based 
on the office completing the claim. No claims will be excluded from 
the samples due to changes in jurisdiction. VBA will capture both the 
office of original jurisdiction as well as the office completing the 
claim to ensure that this work, known as "brokered work," is not 
underrepresented and meets the same high-quality expectations of our
compensation and pension programs. Target Completion Date: March 31, 
2015. 

Recommendation 3: Increase transparency in explaining how the claim-
based and issue-based accuracy rates are calculated as well as their 
key limitations when publicly reporting these metrics. 

VA Comment: Concur. VBA will create an abstract describing its sampling,
assessment criteria, accuracy calculation, and reporting methodologies 
for claim and issue-level accuracy. This abstract will accompany 
future performance documents and public reports to explain key 
differences between the claim-based and issue-based accuracy rates. 
Target Completion Date: March 31, 2015. 

Recommendation 4: Review the multiple sources of policy guidance VBA
provides to determine ways to consolidate them or otherwise improve 
their availability and accessibility for use by the field. 

VA Comment: Concur. In September 2014, VBA began the process of 
improving the availability and accessibility of policy guidance, as 
well as consolidating all policy guidance references. VBA anticipates 
completing the project by April 2015. Target Completion Date: April 
30, 2015. 

Recommendation 5: Take steps to ensure that any future upgrades to 
local data systems include improvements to allow for the pausing of 
processing claim decisions identified by QRTs as incorrect, and to 
enable QRTs to better track error trends. 

VA Comment: Concur. VBA is currently designing a new database that will
incorporate all types of quality reviews, to include local regional 
office reviews, STAR, and consistency studies, and capture data at 
various stages of the claims process. The database will provide VBA 
with increased data analysis capabilities for accuracy review and 
improved tracking of error trends. Target Completion Date: June 30, 
2015. 

Recommendation 6: Take additional steps to evaluate the effectiveness 
of quality assurance activities to identify opportunities to improve 
or better target these activities. 

VA Comment: Concur. As stated in the response to Recommendation 5, VBA 
is currently designing a new database that will incorporate all types 
of quality reviews and capture data at various stages of the claims 
process. The database will provide VBA with increased data analysis 
capabilities, to include improved tracking of error trend analysis. 
This will allow VBA to evaluate the effectiveness of the quality 
assurance activities and identify opportunities for improvement. 
Target Completion Date: June 30, 2015. 

[End of section] 

Appendix IV: GAO Contact and Staff Acknowledgments: 

GAO Contact: 

Daniel Bertoni, bertonid@gao.gov, or (202) 512-7215: 

Staff Acknowledgments: 

In addition to the contact named above, Michele Grgich (Assistant 
Director), Dana Hopings (Analyst-In-Charge), Carl Barden, James 
Bennett, David Chrisinger, Alexander Galuten, Joel Green, Avani Locke, 
Vernette Shaw, Almeta Spencer, Walter Vance, and Greg Whitney made key 
contributions to this report. 

[End of section] 

Footnotes: 

[1] See, for example, VA Disability Claims Processing: Preliminary 
Observations on Accuracy Rates and Quality Assurance Activities, 
[hyperlink, http://www.gao.gov/products/GAO-14-731T] (Washington, 
D.C.: July 14, 2014), and VA Office of Inspector General, Audit of 
Veterans Benefits Administration Compensation Rating Accuracy and 
Consistency Reviews, (Washington D.C.: March 12, 2009). 

[2] Most of these are initial or reopened claims for benefits. A 
veteran may reopen a claim, for example, for increased benefits based 
on a new service-connected disability, or a worsening existing 
disability. We did not review quality assurance efforts for other 
types of disability compensation actions, referred to by VBA as 
authorizations. These include, for example, changes to benefit 
payments for additional dependents. 

[3] We did not include pension claims because VBA is reviewing its 
approach to the accuracy assessment of pension claims, which represent 
a small proportion of VBA's disability benefits workload. As of August 
23, 2014, VBA had an inventory of about 10,000 pending pension claims 
among a total inventory of approximately 546,000 claims awaiting a 
rating. 

[4] See 31 U.S.C. § 1116 for legal requirements. 

[5] We visited the Newark, New Jersey and Oakland, California VBA 
regional offices and conducted telephone interviews with Nashville, 
Tennessee and Waco, Texas staff. We selected these offices to achieve 
variety in each of the following criteria: (1) number of claims 
processed annually; (2) geography (at least one regional office in 
each of VBA's four geographic divisions); (3) claims-based accuracy 
rates; and (4) issue-based accuracy rates. For each location, we 
interviewed managers, quality assurance staff, and veteran service 
organization representatives. 

[6] GAO, Standards for Internal Control in the Federal Government, 
[hyperlink, http://www.gao.gov/products/GAO/AIMD-00-21.3.1] 
(Washington, D.C.: November 1999). 

[7] 38 U.S.C. § 1101 et seq. VA's ratings are awarded in 10-percent 
increments, from 0 to 100 percent. Generally, VA does not pay 
disability compensation for disabilities rated at 0 percent. As of 
December 2013, basic monthly payments ranged from $130.94 for a 
veteran with 10-percent disability rating and no dependents, and to 
$3,134 for a veteran with a 100-percent disability rating, a spouse 
and one child. 

[8] For quality assurance purposes, VBA counts one of its sub-offices 
as a separate regional office, in addition to its 56 regional offices. 
Thus, for reporting purposes, we refer to 57 offices. 

[9] In this report, we use the terms "medical condition" and "medical 
issue" interchangeably. 

[10] The STAR review has two major components. The benefit entitlement 
review assesses whether the correct steps were followed in addressing 
all issues in the claim, collecting appropriate evidence, and whether 
the resulting decision was correct, including effective dates and 
payment rates. Accuracy performance measures are calculated based on 
the results of the benefit entitlement review. The STAR review also 
assesses whether claims processors appropriately documented the 
decision and notified claimants. 

[11] The Aspire dashboard is an online report of VBA's performance by 
program. Data are updated monthly and available by regional office and 
nationally. See [hyperlink, 
http://www.benefits.va.gov/REPORTS/Aspire_dashboard.asp]. 

[12] VBA does not count errors that do not affect veterans' benefits, 
but it notes them during STAR reviews and works to correct them. 
Examples of such errors include missing signatures and lacking 
decision notification. 

[13] VBA samples about the same number of claims from each regional 
office regardless of the offices' varying sizes. Thus, smaller 
regional offices are disproportionately represented, and the set of 
all claims reviewed nationally does not comprise a simple random 
sample of all claims processed by regional offices. Weighting adjusts 
for this fact and yields more correct estimates. 

[14] The estimated accuracy rate of 89.5 percent has a 95 percent 
confidence interval that ranges from 89 to 90 percent. The estimated 
accuracy rate of 89.1 has a 95 percent confidence interval that ranges 
from 88.4 to 89.8 percent. 

[15] In comparing the weighted accuracy estimates that we computed to 
the unweighted estimates that VBA reported for regional offices in 
fiscal year 2013, we found that weighting would increase the accuracy 
rate more than .4 percent for 17 offices and decrease the accuracy 
rate more than .4 percent for 12 offices. Weighting would increase the 
accuracy estimates for regional offices by as much as 2.1 percent and 
decrease the estimates by as much as 3.6 percent. 

[16] STAR accuracy estimates are derived from sample data and have 
sampling error associated with them. The confidence interval is a 
range of values around the estimate which is likely to include the 
actual population value, and helps determine whether different 
estimates are significantly different from a statistical perspective. 
The margin of error is the maximum of the difference between the lower 
bound of the confidence interval and the estimate, and the difference 
between the upper bound of the confidence interval and the estimate. 

[17] The required statistical test is called a t-test, which is a 
statistical hypothesis test that can be used to determine if two 
estimates are statistically different from each other. It is 
calculated by dividing the difference of the two estimates by the 
standard error of the difference. 

[18] Specifically, without changing its sampling approach for issue-
based accuracy reviews, VBA would need to use a statistical technique 
called ratio estimation because the current sampling approach is based 
on claims, and not issues. For more information, see appendix II. 

[19] VBA arrived at its sample size--246 rating claims per regional 
office per year--based on an assumed accuracy of rate of 80 percent 
for each regional office, and a desired precision that reflects 
sampling error of plus or minus 5 percentage points at the 95 percent 
level of confidence in accuracy estimates for each regional office. 
For more information, see appendix II. 

[20] The precise reduction in total sample size from the current level 
is dependent on regional office workload and accuracy performance in 
the baseline year or years used for the calculations. We determined an 
overall reduction of over 5,000 claims or about 39 percent in the 
required sample size for STAR was possible using fiscal year 2013 
regional office workload and accuracy data as the baseline for our 
calculations. Only for one regional office did we find that VBA would 
need to increase the number of claims currently reviewed to achieve 
its desired level of sample precision. 

[21] GAO, Managing For Results: GPRA Modernization Act Implementation 
Provides Important Opportunities to Address Government Challenges, 
[hyperlink, http://www.gao.gov/products/GAO-11-617T] (Washington, 
D.C.: May 10, 2011). 

[22] [hyperlink, http://www.gao.gov/products/GAO-14-731T]. 

[23] VBA refers to redistributing workloads from backlogged regional 
offices to other locations as "brokering." In fiscal year 2013, VBA 
brokered about 10 percent of rating claims between regional offices. 

[24] When cases are deselected from a regional office's sample, STAR 
staff commensurately increase the number of claims to be selected for 
review for that office in the following month according to VBA 
officials. 

[25] According to VBA, in fiscal year 2012 redistributed rating claims 
had an average accuracy rate of 82.6 percent whereas non-redistributed 
rating claims had an average accuracy rate of 86.5 percent. Fiscal 
year 2012 was the last full year that redistributed claims were 
decided by separate processing centers, and that STAR staff reviewed 
redistributed claims separately from non-redistributed claims. 

[26] GAO, Tax Administration: IRS Needs to Further Refine Its Tax 
Filing Season Performance Measures, [hyperlink, 
http://www.gao.gov/products/GAO-03-143] (Washington, D.C.: Nov. 22, 
2002). 

[27] GAO, Managing For Results: GPRA Modernization Act Implementation 
Provides Important Opportunities to Address Government Challenges, 
[hyperlink, http://www.gao.gov/products/GAO-11-617T] (Washington, 
D.C.: May 10, 2011). 

[28] 31 U.S.C. § 1116. 

[29] According to VBA, the average number of issues per rating claim 
increased from 2.8 in fiscal year 2009 to 4.9 in fiscal year 2013. 

[30] VA's most recent performance and accountability report does not 
contain issue-based accuracy data, but VA plans to include issue-based 
data in its next performance and accountability report. 

[31] Such an error could affect the veteran's benefits if it is not 
corrected, the veteran were to claim new or worsened conditions in the 
future, and the subsequent re-calculation of the overall rating was 
affected by the error, according to agency officials. 

[32] The estimate of 6.9 percent has a 95 percent confidence interval 
that ranges from 6.5 to 7.3 percent. The estimated accuracy rate 
decrease is 1.7 percent and has a 95 percent confidence interval that 
ranges from 1.5 to 1.9 percent. 

[33] Prior to QRTs, accuracy of individual claims processors was 
assessed against targets for each employee. However, these reviews 
were generally performed by the claims processors' supervisors. VBA 
expected that having QRT members perform the individual reviews would 
allow supervisors to focus more on performance management. 

[34] QRT reviewers review an average of 5 randomly-selected claims per 
claims processing staff member per month. For claims processing staff 
members found in need of accuracy improvement, 10 reviews per claims 
processing staff member per month may be performed. 

[35] VBA would like to increase the number of consistency 
questionnaires to target additional claims processing positions such 
as the Claims Assistant position, according to officials. 

[36] To help reduce its claims backlog, VBA has required claims 
processors to work 20 hours per month of mandatory overtime during 
portions of fiscal years 2013 and 2014. 

[37] [hyperlink, http://www.gao.gov/products/GAO-14-731T]. 

[38] A regional office is expected to perform in-process reviews 
equivalent to 10 percent of their expected claims decisions per month, 
according to VBA guidance. 

[39] VBA's Veterans Benefits Management System is intended to help 
streamline the claims process by allowing for paperless claims 
processing, including electronic claims files. 

[40] GAO, Standards for Internal Control in the Federal Government, 
[hyperlink, http://www.gao.gov/products/GAO/AIMD-00-21.3.1] 
(Washington, D.C.: November 1999). 

[41] [hyperlink, http://www.gao.gov/products/GAO-14-731T]. 

[42] Individual Unemployability is a part of VA's disability 
compensation program that allows VA to pay benefits at the 100 percent 
level to certain veterans whose service-connected disabilities prevent 
them from maintaining substantial gainful employment. 

[43] Specifically, individual performance reviews are entered into the 
Automated Standardized Performance Elements Nationwide system, whereas 
in-process reviews are entered into either a WebLogon database or a 
SharePoint database depending on the type of error being reviewed. 

[44] The STAR review assesses whether adequate evidence was developed 
to support the rating decision. Possible development errors include 
failure to obtain sufficient medical records, including a medical 
examination or opinion. 

[45] VBA cited several actions including fielding in-process reviews 
for additional error types, performing consistency studies to identify 
claims processors needing training, holding quality calls with 
regional office staff, and releasing clarifying guidance to the 
regional offices. 

[46] GAO, Designing Evaluations: 2012 Revision, [hyperlink, 
http://www.gao.gov/products/GAO-12-208G] (Washington, D.C.: January 
2012). 

[47] [hyperlink, http://www.gao.gov/products/GAO/AIMD-00-21.3.1]. 

[48] As of August 25, 2014, VBA had an inventory of about 10,000 
pending pension claims among a total inventory of approximately 
545,000 claims awaiting a rating. 

[End of section] 

GAO's Mission: 

The Government Accountability Office, the audit, evaluation, and 
investigative arm of Congress, exists to support Congress in meeting 
its constitutional responsibilities and to help improve the 
performance and accountability of the federal government for the 
American people. GAO examines the use of public funds; evaluates 
federal programs and policies; and provides analyses, recommendations, 
and other assistance to help Congress make informed oversight, policy, 
and funding decisions. GAO's commitment to good government is 
reflected in its core values of accountability, integrity, and 
reliability. 

Obtaining Copies of GAO Reports and Testimony: 

The fastest and easiest way to obtain copies of GAO documents at no 
cost is through GAO's website [hyperlink, http://www.gao.gov]. Each 
weekday afternoon, GAO posts on its website newly released reports, 
testimony, and correspondence. To have GAO e-mail you a list of newly 
posted products, go to [hyperlink, http://www.gao.gov] and select 
"E-mail Updates." 

Order by Phone: 

The price of each GAO publication reflects GAO's actual cost of 
production and distribution and depends on the number of pages in the 
publication and whether the publication is printed in color or black 
and white. Pricing and ordering information is posted on GAO's 
website, [hyperlink, http://www.gao.gov/ordering.htm]. 

Place orders by calling (202) 512-6000, toll free (866) 801-7077, or 
TDD (202) 512-2537. 

Orders may be paid for using American Express, Discover Card, 
MasterCard, Visa, check, or money order. Call for additional 
information. 

Connect with GAO: 

Connect with GAO on facebook, flickr, twitter, and YouTube.
Subscribe to our RSS Feeds or E mail Updates. Listen to our Podcasts.
Visit GAO on the web at [hyperlink, http://www.gao.gov]. 

To Report Fraud, Waste, and Abuse in Federal Programs: 

Contact: 
Website: [hyperlink, http://www.gao.gov/fraudnet/fraudnet.htm]; 
E-mail: fraudnet@gao.gov; 
Automated answering system: (800) 424-5454 or (202) 512-7470. 

Congressional Relations: 

Katherine Siggerud, Managing Director, siggerudk@gao.gov: 
(202) 512-4400: 
U.S. Government Accountability Office: 
441 G Street NW, Room 7125: 
Washington, DC 20548. 

Public Affairs: 

Chuck Young, Managing Director, youngc1@gao.gov: 
(202) 512-4800: 
U.S. Government Accountability Office: 
441 G Street NW, Room 7149: 
Washington, DC 20548. 

[End of document]