This is the accessible text file for GAO report number GAO-05-866 
entitled 'Data Mining: Agencies Have Taken Key Steps to Protect Privacy 
in Selected Efforts, but Significant Compliance Issues Remain' which 
was released on August 29, 2005.

This text file was formatted by the U.S. Government Accountability 
Office (GAO) to be accessible to users with visual impairments, as part 
of a longer term project to improve GAO products' accessibility. Every 
attempt has been made to maintain the structural and data integrity of 
the original printed product. Accessibility features, such as text 
descriptions of tables, consecutively numbered footnotes placed at the 
end of the file, and the text of agency comment letters, are provided 
but may not exactly duplicate the presentation or format of the printed 
version. The portable document format (PDF) file is an exact electronic 
replica of the printed version. We welcome your feedback. Please E-mail 
your comments regarding the contents or accessibility features of this 
document to Webmaster@gao.gov. 

This is a work of the U.S. government and is not subject to copyright 
protection in the United States. It may be reproduced and distributed 
in its entirety without further permission from GAO. Because this work 
may contain copyrighted images or other material, permission from the 
copyright holder may be necessary if you wish to reproduce this 
material separately. 

Report to the Ranking Minority Member, Subcommittee on Oversight of 
Government Management, Committee on Homeland Security and Governmental 
Affairs, U.S. Senate: 

August 2005: 

Data Mining: 

Agencies Have Taken Key Steps to Protect Privacy in Selected Efforts, 
but Significant Compliance Issues Remain: 

[Hyperlink, http://www.gao.gov/cgi-bin/getrpt?GAO-05-866]: 

GAO Highlights: 

Highlights of GAO-05-866, a report to the Ranking Minority Member, 
Subcommittee on Oversight of Government Management, Committee on 
Homeland Security and Governmental Affairs, U.S. Senate: 

Why GAO Did This Study: 

Data mining—a technique for extracting knowledge from large volumes of 
data—is being used increasingly by the government and by the private 
sector. Many federal data mining efforts involve the use of personal 
information, which can originate from government sources as well as 
private sector organizations. 

The federal government’s increased use of data mining since the 
terrorist attacks of September 11, 2001, has raised public and 
congressional concerns. As a result, GAO was asked to describe the 
characteristics of five federal data mining efforts and to determine 
whether agencies are providing adequate privacy and security protection 
for the information systems used in the efforts and for individuals 
potentially affected by these data mining efforts. 

What GAO Found: 

The five data mining efforts we reviewed are used by federal agencies 
to fulfill a variety of purposes and use various information sources, 
including both information collected on behalf of the agency and 
information originally collected by other agencies and commercial 
sources. Although the systems differed, the general process each used 
was basically the same. Each system incorporates data input, data 
analysis, and results output (see figure). 

The Data Mining Process: 

[See PDF for image] 

[End of figure] 

While the agencies responsible for these five efforts took many of the 
key steps required by federal law and executive branch guidance for the 
protection of personal information, they did not comply with all 
related laws and guidance. Specifically, most agencies notified the 
general public that they were collecting and using personal information 
and provided opportunities for individuals to review personal 
information when required by the Privacy Act. However, agencies are 
also required to provide notice to individual respondents explaining 
why the information is being collected; two agencies provided this 
notice, one did not provide it, and two claimed an allowable exemption 
from this requirement because the systems were used for law 
enforcement. In addition, agency compliance with key security 
requirements was inconsistent. Finally, three of the five agencies 
completed privacy impact assessments—important for analyzing the 
privacy implications of a system or data collection—but none of the 
assessments fully complied with Office of Management and Budget 
guidance. Until agencies fully comply with these requirements, they 
lack assurance that individual privacy rights are being appropriately 
protected. 

What GAO Recommends: 

GAO is making recommendations to the agencies responsible for the five 
data mining efforts to ensure that their efforts include adequate 
privacy and security protections. The agencies responsible for the five 
efforts we reviewed generally agreed with the majority of our 
recommendations, but disagreed with others. 

www.gao.gov/cgi-bin/getrpt?GAO-05-866. 

To view the full product, including the scope and methodology, click on 
the link above. For more information, contact Linda D. Koontz (202) 512-
6240 or koontzl@gao.gov. 

[End of section] 

Contents: 

Letter: 

Results in Brief: 

Background: 

Data Mining Efforts Have a Variety of Characteristics: 

Agencies Addressed Many Required Privacy Provisions, but None Addressed 
All Requirements: 

Conclusions: 

Recommendations: 

Agency Comments and Our Evaluation: 

Appendixes: 

Appendix I: Scope and Methodology: 

Appendix II: Risk Management Agency's Data Mining Effort: 

Appendix III: The Citibank Custom Reporting System Used by the 
Department of State: 

Appendix IV: Internal Revenue Service's Reveal System: 

Appendix V: FBI's Foreign Terrorist Tracking Task Force Data Mining 
Effort: 

Appendix VI: Small Business Administration's Loan/Lender Monitoring 
System: 

Appendix VII: Detailed Assessments of Agency Actions to Address 
Security Requirements in Data Mining Efforts: 

Appendix VIII: Comments from the U.S. Department of Agriculture: 

Appendix IX: Comments from the Department of the Treasury: 

Appendix X: Comments from the Department of State: 

Appendix XI: Comments from the Small Business Administration: 

Appendix XII: GAO Contact and Staff Acknowledgments: 

Tables: 

Table 1: Key Steps Agencies Are Required to Take to Protect Privacy, 
with Examples of Related Detailed Procedures and Sources: 

Table 2: Examples of Privacy Act Provisions from Which Systems of 
Records Used in Law Enforcement May Be Exempt: 

Table 3: Characteristics of Information Inputs Used by the Data Mining 
Efforts We Reviewed: 

Table 4: Questions Related to Agency Actions to Notify the Public about 
New or Changed Information Collections or Efforts: 

Table 5: Questions Related to Agency Actions to Provide Individuals 
with Access to Their Personal Records: 

Table 6: Questions Related to Agency Actions to Notify Individuals at 
the Time Personal Information Was Collected: 

Table 7: Questions Related to Agency Actions Safeguarding and Ensuring 
the Quality of Records Containing Personal Information: 

Table 8: Questions Related to Agency Actions to Conduct Privacy Impact 
Assessments: 

Table 9: Scenarios Used to Identify Potential Abusers: 

Table 10: Questions Related to Agency Actions Safeguarding and Ensuring 
the Quality of Records Containing Personal Information: 

Figures: 

Figure 1: An Overview of the Data Mining Process: 

Figure 2: An Overview of the RMA System: 

Figure 3: An Overview of the Citibank Custom Reporting System: 

Figure 4: An Overview of the Reveal Data Mining System: 

Figure 5: An Overview of FBI's Foreign Terrorist Tracking Task Force 
Data Mining Effort: 

Figure 6: An Overview of the Loan/Lender Monitoring System: 

Abbreviations: 

CIO: chief information officer: 

FBI: Federal Bureau of Investigation: 

FISMA: Federal Information Security Management Act: 

GSA: General Services Administration: 

IRS: Internal Revenue Service: 

NIST: National Institute of Standards and Technology: 

OMB: Office of Management and Budget: 

RMA: Risk Management Agency: 

SBA: Small Business Administration: 

Letter August 15, 2005: 

The Honorable Daniel K. Akaka: 
Ranking Minority Member: 
Subcommittee on Oversight of Government Management, 
Committee on Homeland Security and Governmental Affairs: 
United States Senate: 

Dear Senator Akaka: 

Data mining--a technique for extracting knowledge from large volumes of 
data--is being used increasingly by the government and by the private 
sector. Many federal data mining efforts involve the use of personal 
information, which can originate from government sources as well as 
private sector organizations.[Footnote 1]

This report responds to your request that we review federal data mining 
efforts that use personal information. Specifically, our objectives 
were to describe the characteristics of selected federal data mining 
efforts, including each system's data sources, outputs, and uses, and 
to determine whether agencies are providing adequate privacy and 
security protections for the information systems used in these efforts 
and for individuals potentially affected by them. 

To address these objectives, we reviewed five data mining efforts at 
the Small Business Administration (SBA), the Department of 
Agriculture's Risk Management Agency (RMA), the Department of the 
Treasury's Internal Revenue Service (IRS), the Department of State 
(State), and the Department of Justice's Federal Bureau of 
Investigation (FBI). These efforts were selected for review because 
they met several criteria, including the use of personal information 
and data obtained from another agency or a private sector source, and 
because they were used for one of several specific purposes.[Footnote 
2] To address both objectives, we reviewed agency-provided documents 
and interviewed agency officials. To evaluate the agencies' 
implementation of key privacy protections, we also reviewed related 
notices, reports, and other documents. Our scope and methodology are 
discussed in more detail in appendix I. 

We performed our work from May 2004 to June 2005 in accordance with 
generally accepted government auditing standards. 

Results in Brief: 

The data mining efforts we reviewed have a variety of purposes and uses 
and employ different data inputs and outputs. In addition to 
information collected directly from individuals, the efforts use 
information provided by other agencies (such as the National Oceanic 
and Atmospheric Administration) and private sector sources (such as 
credit card companies). These efforts include the following: 

* The RMA effort is used to detect fraud, waste, and abuse in the 
Federal Crop Insurance Program. 

* The Citibank Custom Reporting System, an offering of the General 
Service Administration's Government-wide Purchase Card program, is used 
by State to analyze government charge card spending patterns by its 
employees. 

* The data mining effort of the FBI Foreign Terrorist Tracking Task 
Force helps federal law enforcement and intelligence agencies locate 
foreign terrorists and their supporters in the United States. 

* The IRS's Reveal system is used to detect evidence of financial 
crimes, fraud, and terrorist activity. 

* The SBA Lender/Loan Monitoring System, provided under contract by Dun 
& Bradstreet, is designed to identify, measure, and manage risk in two 
SBA loan programs. 

While the agencies responsible for these five efforts took many of the 
key steps required by federal law and executive branch guidance for the 
protection of personal information, none followed all key procedures. 
Specifically, most agencies notified the general public that they were 
collecting and using personal information and provided opportunities 
for individuals to review personal information, when required by the 
Privacy Act. However, agencies are also required to provide notice to 
individual respondents explaining why information is being collected: 
two agencies provided this notice, one did not provide it, and two 
claimed an allowable exemption from this requirement because the 
systems were used for law enforcement. Agencies' compliance with key 
security requirements that are intended to protect the confidentiality 
and integrity of personal information was inconsistent. Finally, three 
of the five agencies had prepared a privacy impact assessment--an 
important tool for analyzing the privacy implications of a system or 
data collection--of their data mining efforts, but none of the 
assessments fully complied with Office of Management and Budget (OMB) 
guidance. Until agencies fully comply with these requirements, they 
lack assurance that individual privacy rights are appropriately 
protected. 

We are making recommendations to the agencies responsible for the five 
data mining efforts to ensure that their efforts include adequate 
privacy and security protections. 

In providing comments on a draft of this report, the agencies generally 
agreed with the majority of our recommendations, but disagreed with 
others. USDA agreed with the majority of our recommendations, and 
stated that it plans to take the necessary steps to address them. The 
General Service Administration's (GSA) Assistant Commissioner for 
Acquisition (who provided comments via e-mail) generally disagreed with 
our recommendations, stating that the Privacy Act does not apply to its 
system and that it had taken appropriate security measures. However, in 
our view, GSA's system is subject to the Privacy Act. Additionally, 
while we acknowledge GSA's efforts to secure its system, it is 
nonetheless required to comply with the specific requirements of the 
Federal Information Security Management Act of 2002 and with related 
guidance. State and SBA generally agreed with our recommendations and 
provided information on their planned actions. Treasury generally 
agreed with the recommendation to conduct a new privacy impact 
assessment, but in response to our recommendation on security, Treasury 
stated that it believes it already has adequate security measures in 
place. We acknowledge that while Treasury has applied several security 
measures, required regular testing and evaluation was not yet in place 
and we have clarified our recommendation to reflect this. Justice 
stated that it had no comments on our draft. 

Background: 

In our May 2004 report on federal data mining efforts,[Footnote 3] we 
defined data mining as the application of database technology and 
techniques--such as statistical analysis and modeling--to uncover 
hidden patterns and subtle relationships in data and to infer rules 
that allow for the prediction of future results. We based this 
definition on the most commonly used terms found in a survey of the 
technical literature. For the purposes of this report, we are using the 
same definition. 

Data mining has been used successfully for a number of years in the 
private and public sectors in a broad range of applications. In the 
private sector, these applications include customer relationship 
management, market research, retail and supply chain analysis, medical 
analysis and diagnostics, financial analysis, and fraud detection. In 
the government, data mining was initially used to detect financial 
fraud and abuse. For example, we used data mining techniques in our 
prior reviews of federal government purchase and credit card 
programs.[Footnote 4]

Following the terrorist attacks of September 11, 2001, data mining has 
been used increasingly as a tool to help detect terrorist threats 
through the collection and analysis of public and private sector data. 
Its use has also expanded to other purposes. In our May 2004 
report,[Footnote 5] we identified several uses of federal data mining 
efforts. The most common were: 

* improving service or performance;

* detecting fraud, waste, and abuse;

* analyzing scientific and research information;

* managing human resources;

* detecting criminal activities or patterns; and: 

* analyzing intelligence and detecting terrorist activities. 

While the characteristics of each data mining effort can vary greatly, 
data mining generally incorporates three processes: data input, data 
analysis, and results output. In data input, data are collected in a 
central data warehouse, validated, and formatted for use in data 
mining. In the data analysis phase, data are typically searched through 
a query. The two most common types of queries are pattern-based queries 
and subject-based queries. 

* Pattern-based queries search for data elements that match or depart 
from a predetermined pattern (e.g., unusual claim patterns in an 
insurance program). 

* Subject-based queries search for any available information on a 
predetermined subject using a specific identifier. This could be 
personal information such as an individual identifier (e.g., a Social 
Security number or the name of a person) or the identifier of a 
specific thing. For example, the Navy uses subject-based data mining to 
identify trends in the failure rate of parts used in its ships. 

The data analysis phase can be iterative, with the results of one query 
being used to define criteria for a subsequent query. The output phase 
can produce results in printed or electronic format. These reports can 
be accessed by agency personnel, and can also be shared with other 
personnel from other agencies. Figure 1 depicts a generic data mining 
process. 

Figure 1: An Overview of the Data Mining Process: 

[See PDF for image] 

Note: From Vipin Kumar and Mohammed J. Zaki, High Performance Data 
Mining, University of Minnesota, undated; [Hyperlink, 
http://www.cs.rip.edu/~zaki?PSKDDTUT00.PDF]

[End of figure] 

Data Mining Poses Privacy Challenge: 

The impact of computer systems on the ability of organizations to 
protect personal information was recognized as early as 1973, when a 
federal advisory committee on automated personal data systems observed 
that "The computer enables organizations to enlarge their data 
processing capacity substantially, while greatly facilitating access to 
recorded data, both within organizations and across boundaries that 
separate them." In addition, the committee concluded that "The net 
effect of computerization is that it is becoming much easier for record-
keeping systems to affect people than for people to affect record-
keeping systems."[Footnote 6] 

More recently, the federal government's increased use of data mining 
has raised public and congressional concerns. A December 2003 report by 
a task force on information sharing and analysis in homeland security 
noted that agencies at all levels of government are now interested in 
collecting and mining large amounts of data from commercial 
sources.[Footnote 7] The report noted that agencies may use such data 
not only for investigations of specific individuals, but also to 
perform large-scale data analysis and pattern discovery in order to 
discern potential terrorist activity by unknown individuals. 

As we noted in our May 2004 report, mining government and private 
databases containing personal information creates a range of privacy 
concerns. Through data mining, agencies can quickly and efficiently 
obtain information on individuals or groups by exploiting large 
databases containing personal information aggregated from public and 
private records. Information can be developed about a specific 
individual or a group of individuals whose behavior or characteristics 
fit a specific pattern. The ease with which organizations can use 
automated systems to gather and analyze large amounts of previously 
isolated information raises concerns about the impact on personal 
privacy. Before data aggregation and data mining came into use, 
personal information contained in paper records stored at widely 
dispersed locations, such as courthouses or other government offices, 
was relatively difficult to gather and analyze. 

Federal Laws and Guidance Define Steps to Protect Privacy of Personal 
Information: 

The 1973 federal advisory committee recommended that the federal 
government adopt a set of fair information practices to address what it 
termed a poor level of protection afforded to privacy under 
contemporary law. These practices formed the basis of the main federal 
privacy law, the Privacy Act of 1974. 

The Privacy Act places limitations on agencies' collection, disclosure, 
and use of personal information maintained in systems of records. The 
act describes "records" as any item, collection, or grouping of 
information about an individual that is maintained by an agency and 
contains his name or another personal identifier. It also describes 
systems of records as a group of records under the control of any 
agency from which information is retrieved by the name of the 
individual or by an individual identifier.[Footnote 8] The Privacy Act 
requires that when agencies establish or make changes to a system of 
records, they must notify the public by a notice in the Federal 
Register identifying the type of data collected, the types of 
individuals that information is collected about, the intended routine 
uses of the data, and procedures that individuals can use to review 
personal information. 

The Federal Information Security Management Act of 2002 (FISMA) also 
addresses the protection of personal information. FISMA defines federal 
requirements for securing information and information systems that 
support federal agency operations and assets; it requires agencies to 
develop agencywide information security programs that extend to 
contractors and other providers of federal data and systems.[Footnote 
9] Under FISMA, information security includes protecting information 
and information systems from unauthorized access, use, disclosure, 
disruption, modification, or destruction, including controls for 
confidentiality--that is, those controls necessary to preserve 
authorized restrictions on access and disclosure to protect personal 
privacy. 

A third federal law with provisions related to privacy, the E- 
Government Act of 2002, provides additional protection for personal 
information in government information systems or information 
collections by requiring that agencies conduct privacy impact 
assessments.[Footnote 10] A privacy impact assessment is: 

"an analysis of how information is handled: (i) to ensure handling 
conforms to applicable legal, regulatory, and policy requirements 
regarding privacy; (ii) to determine the risks and effects of 
collecting, maintaining, and disseminating information in identifiable 
form in an electronic information system; and (iii) to examine and 
evaluate protections and alternative processes for handling information 
to mitigate potential privacy risks."[Footnote 11]

Agencies must conduct a privacy assessment (1) before developing or 
procuring information technology that collects, maintains, or 
disseminates information that is in a personally identifiable form or 
(2) before initiating any new electronic data collections containing 
personal information on 10 or more individuals. Among other actions 
that should require a privacy assessment, according to guidance from 
OMB, is significant merging of information in databases, for example, 
in a linking that "may aggregate data in ways that create privacy 
concerns not previously at issue" or "when agencies systematically 
incorporate into existing information systems databases of information 
in identifiable form purchased or obtained from commercial or public 
sources."

These laws, along with OMB guidance that outlines how agencies are to 
comply with the laws, lay out a series of steps that agencies should 
take to protect the privacy of personal information. Each of the steps 
includes detailed procedures agencies are to follow to fully implement 
the requirements. Table 1 lists the key steps, with examples of the 
procedures agencies are to use to address the step, and the primary 
statutory source for the protections. 

Table 1: Key Steps Agencies Are Required to Take to Protect Privacy, 
with Examples of Related Detailed Procedures and Sources: 

Key steps to protect privacy of personal information: Publish notice in 
the Federal Register when creating or modifying system of records; 
Examples of procedures: 
* Specify the routine uses for the system; 
* Identify the individual responsible for the system; 
* Outline procedures individuals can use to gain access to their 
records; 
Primary statutory source: 
* Privacy Act. 

Key steps to protect privacy of personal information: Provide 
individuals with access to their records; 
Examples of procedures: 
* Permit individuals to review records about themselves; 
* Permit individuals to request corrections to their records; 
Primary statutory source: 
* Privacy Act. 

Key steps to protect privacy of personal information: Notify 
individuals of the purpose and authority for the requested information 
when it is collected; 
Examples of procedures: 
* Notify individuals of the authority that authorized the agency to 
collect the information; 
* Notify individuals of the principal purposes for which the 
information is to be used; 
Primary statutory source: 
* Privacy Act. 

Key steps to protect privacy of personal information: Implement 
guidance on system security and data quality; 
Examples of procedures: 
* Perform a risk assessment to determine the information system 
vulnerabilities, identify threats, and develop countermeasures to those 
threats; 
* Have the system certified and accredited by management; 
* Ensure the accuracy, relevance, timeliness, and completeness of 
information; 
Primary statutory source: 
* FISMA; 
* Privacy Act. 

Key steps to protect privacy of personal information: Conduct a privacy 
impact assessment; 
Examples of procedures: 
* Describe and analyze how information is secured; 
* Describe and analyze intended use of information; 
* Have assessment reviewed by chief information officer or equivalent; 
* Make assessment publicly available, if practicable; 
Primary statutory source: 
* E-Government Act. 

Source: GAO analysis of the Privacy Act, E-Government Act, FISMA, and 
related guidance. 

[End of table]

Agencies Are Allowed to Claim Exemptions from Some Privacy Provisions: 

While the federal laws and guidance previously outlined provide a wide 
range of privacy protections, agencies are allowed to claim exemptions 
from some of these provisions if the records are used for certain 
purposes. For example, records compiled for criminal law enforcement 
purposes can be exempt from a number of provisions of the Privacy Act, 
including the requirement to notify individuals of the purposes and 
uses of the information at the time of collection and the requirement 
to ensure the accuracy, relevance, timeliness, and completeness of 
records. A broader category of investigative records compiled for 
criminal or civil law enforcement purposes can also be exempted from a 
somewhat smaller number of Privacy Act provisions, including the 
requirement to provide individuals with access to their records and to 
inform the public of the categories of sources of records. In general, 
the exemptions for law enforcement purposes are intended to prevent the 
disclosure of information collected as part of an ongoing investigation 
that could impair the investigation or allow those under investigation 
to change their behavior or take other actions to escape prosecution. 

The Privacy Act allows, but does not require, agencies to claim an 
exemption for certain designated purposes. If the agency decides to 
claim an exemption, the act requires the agencies to do so through a 
rule that provides the reason behind its decision. Table 2 shows 
provisions of the Privacy Act from which systems of records used for 
law enforcement may be exempt. 

Table 2: Examples of Privacy Act Provisions from Which Systems of 
Records Used in Law Enforcement May Be Exempt: 

Provision: Providing individuals with access to their information and 
the ability to request corrections; 
Law enforcement exemptions in the Privacy Act: Information used for 
criminal law enforcement: Can be exempt; 
Law enforcement exemptions in the Privacy Act: Information used in law 
enforcement investigations: Can be exempt. 

Provision: Notifying individuals of the purposes and uses of the 
information at the time of collection; 
Law enforcement exemptions in the Privacy Act: Information used for 
criminal law enforcement: Can be exempt; 
Law enforcement exemptions in the Privacy Act: Information used in law 
enforcement investigations: Not exempt. 

Provision: Maintaining records with the necessary accuracy, relevance, 
timeliness, and completeness; 
Law enforcement exemptions in the Privacy Act: Information used for 
criminal law enforcement: Can be exempt; 
Law enforcement exemptions in the Privacy Act: Information used in law 
enforcement investigations: Not exempt. 

Source: GAO analysis of federal laws and guidance. 

[End of table]

Similarly, the requirement to conduct a privacy impact assessment does 
not apply to all systems. For example, no assessment is required when 
the information collected relates to internal government operations, 
the information has been previously assessed under an evaluation 
similar to a privacy impact assessment, or when privacy issues are 
unchanged. Nonetheless, OMB encourages agencies to conduct privacy 
impact assessments on systems that contain personal information in 
identifiable form about government personnel, when appropriate. In 
addition, individual agencies have adopted policies that require 
assessments for all systems, including those used for government 
operations. 

In June 2003, we reported on our assessment of agencies' compliance 
with the Privacy Act and related OMB guidance.[Footnote 12] At that 
time, we determined that the agencies' compliance was high in many 
areas, but uneven across the federal government. Agency officials 
attributed the areas of noncompliance in part to a need for more 
leadership and guidance from OMB. In our report, we recommended that 
the Director, OMB, take a number of steps aimed at improving agencies' 
compliance with the Privacy Act, including overseeing and monitoring 
agencies' actions, assessing the need for additional guidance to 
agencies, and raising agency awareness of the importance of the act. In 
response, OMB established an Interagency Privacy Committee to discuss 
privacy issues and issued updated guidance. However, it has not 
addressed our other recommendations: to work with agencies to ensure 
that they address the areas of noncompliance we identified; institute a 
governmentwide effort to determine the level of resources needed to 
fully implement the Privacy Act; and develop a plan to address 
identified gaps in resources devoted to protecting privacy. 

Data Mining Efforts Have a Variety of Characteristics: 

The data mining efforts that we reviewed have a variety of purposes, 
uses, and outputs. For example, the efforts are used for program 
management, law enforcement, and analyzing intelligence. The efforts 
fulfill these purposes through a mix of subject-based and pattern-based 
queries, as previously defined, and result in reports that are used by 
program officials or shared with others. A detailed summary of each of 
the efforts we reviewed is included in appendixes II through VI. A 
short summary of the purpose and characteristics of each of the efforts 
is included here. 

* The purpose of RMA's data mining effort is to detect fraud, waste, 
and abuse in the federal crop insurance program. It is used to identify 
potential abusers, improve program policies and guidance, and improve 
program performance and data quality. RMA uses information collected 
from insurance applicants as well as from insurance agents and claims 
adjusters. It produces several types of outputs, including lists of 
names of individuals whose behavior matches patterns of anomalous 
behavior, which are provided to program investigators and sometimes 
insurance agencies. It also produces programmatic information, such as 
how a procedural change in the federal crop insurance program's policy 
manual would impact the overall effectiveness of the program, and 
information on data quality and program performance, both of which are 
used by program managers. 

* The purpose of the Citibank Custom Reporting System used by State is 
to detect fraud, waste, and abuse by its employees who use the 
government purchase card program. The purchase card program is a 
governmentwide program run by the General Services Administration 
(GSA). Agencies like State use GSA's master contract to provide their 
employees with charge cards from an approved vendor. Citibank, the 
vendor chosen by State, provides its customers with a custom reporting 
system, which includes several tools that can be used for managing card 
accounts. State uses the system to analyze government charge card 
spending patterns by its employees. System outputs include summaries of 
card account holder information and purchases and can include personal 
information. Summaries are used by program managers and are on occasion 
provided to interested parties such as such as State's inspector 
general, GAO, and OMB for oversight. 

* The purpose of IRS's Reveal system is to detect criminal activities 
or patterns, analyze intelligence, and detect terrorist activities. IRS 
uses the system to identify financial crime, including individual and 
corporate tax fraud, and terrorist activity. Its outputs include 
reports containing names, Social Security numbers, addresses, and other 
personal information of individuals suspected of financial crime, 
including individual and corporate tax fraud and terrorist activity. 
Reports are shared with IRS field office personnel, who conduct 
investigations based on the report's results. 

* The purpose of the data mining effort used by the FBI's Foreign 
Terrorist Tracking Task Force is to detect criminal or terrorist 
activities or patterns and to analyze intelligence. The effort uses two 
information systems--one classified and one unclassified--to support 
ongoing investigations by law enforcement agencies and the intelligence 
community, including locating foreign terrorists and their supporters 
who are in or have visited the United States. Its outputs include 
reports based on a request received from field investigators. Reports 
range from lists of individuals who might meet a certain profile to 
detailed information on a certain suspect and typically contain 
personal information. Reports are shared with field investigators, 
field offices, and other federal investigators. 

* The purpose of SBA's Lender/Loan Monitoring System is to improve 
service or performance. The system was developed by Dun & Bradstreet 
under contract to SBA. SBA uses the system to identify, measure, and 
manage risk in two of its business loan programs. Its outputs include 
reports that identify the total amount of loans outstanding for a 
particular lender and estimate the likelihood of loans becoming 
delinquent in the future based on predefined patterns. 

These systems use information that the agency collects directly, as 
well as information provided by other agencies, such as the Social 
Security Administration, and private sector sources, such as credit 
card companies. Table 3 details the inputs of each effort we reviewed 
and summarizes each effort by the types of information sources used. 

Table 3: Characteristics of Information Inputs Used by the Data Mining 
Efforts We Reviewed: 

Data mining effort: RMA's data mining effort; 
Types of inputs: Government: Systems of records: 4 sources, including 
insurance records on policyholders, agents, and loss adjusters; 
Types of inputs: Government: Not identified as systems of records: 3 
sources: soils data, weather data, and land survey data; 
Types of inputs: Commercial sources: None; 
Types of inputs: Public records: Various sources, including publicly 
available information; 
Types of inputs: International records: None. 

Data mining effort: Citibank Custom Reporting System (State); 
Types of inputs: Government: Systems of records: None; 
Types of inputs: Government: Not identified as systems of records: 
Account information from State employees provided to Citibank; 
Types of inputs: Commercial sources: Commercial data provided by 
Citibank consisting of information on purchases made by State 
employees; 
Types of inputs: Public records: None; 
Types of inputs: International records: None. 

Data mining effort: Reveal (IRS); 
Types of inputs: Government: Systems of records: 4 sources, including 
suspicious activity reports and extracts of corporate and taxpayer 
information; 
Types of inputs: Government: Not identified as systems of records: 
None; 
Types of inputs: Commercial sources: None; 
Types of inputs: Public records: None; 
Types of inputs: International records: None. 

Data mining effort: Foreign Terrorist Tracking Task Force (FBI); 
Types of inputs: Government: Systems of records: 29 sources, including 
information from FBI's criminal database, immigration and visa data, 
and customs data; 
Types of inputs: Government: Not identified as systems of records: 1 
source; 
Types of inputs: Commercial sources: 11 sources, consisting of data 
from commercial sources; 
Types of inputs: Public records: None; 
Types of inputs: International records: 4 sources, including lost 
property reported to Interpol and intelligence data. 

Data mining effort: Loan/Lender Monitoring System (SBA); 
Types of inputs: Government: Systems of records: 1 source, including 
loan and lender information for SBA's loan programs; 
Types of inputs: Government: Not identified as systems of records: 
None; 
Types of inputs: Commercial sources: 3 sources, including corporate-and 
consumer-level data from private companies; 
Types of inputs: Public records: None; 
Types of inputs: International records: None. 

Source: GAO analysis of agency information. 

[End of table]

Agencies Addressed Many Required Privacy Provisions, but None Addressed 
All Requirements: 

While the agencies responsible for the five data mining efforts took 
many of the key steps needed to protect the privacy and security of 
personal information used in the efforts, none followed all the key 
procedures. Most of the agencies provided a general public notice about 
the collection and use of the personal information used in their data 
mining efforts. However, fewer followed other required steps, such as 
notifying individuals about the intended uses of their personal 
information when it was collected or ensuring the security and accuracy 
of the information used in their data mining efforts. In addition, 
three of the five agencies completed a privacy impact assessment of 
their data mining efforts, but none of the assessments fully complied 
with OMB guidance. Complete assessments are a tool agencies can use to 
identify areas of noncompliance with federal privacy laws, evaluate 
risks arising from electronic collection and maintenance of information 
about individuals, and evaluate protections or alternative processes 
needed to mitigate the risks identified. Agencies that do not take all 
the steps required to protect the privacy of personal information limit 
the ability of individuals to participate in decisions that affect 
them, as required by law, and risk the improper exposure or alteration 
of their personal information. 

Agencies Generally Provided Public Notice as Required: 

The Privacy Act requires agencies to notify the public, through notices 
published in the Federal Register, when they create or modify a system 
of records. The act's provisions include requirements for agencies to 
provide general notice about the operation and uses of a system of 
records. According to OMB's guidance on implementing the act, this 
public notice provision is central to one of the act's basic 
objectives: fostering agency accountability through a system of public 
scrutiny. This echoes the 1973 federal advisory committee's statement 
that public involvement is essential for an effective consideration of 
the pros and cons of establishing a personal data system. 

Of the five efforts we reviewed, the personal information used in four 
(IRS, RMA, FBI, and SBA) were the subject of published system of 
records notices in the Federal Register. The public was not notified in 
the case of the fifth system--State. Table 4 details the steps agencies 
took to notify the public about the five efforts we reviewed. 

Table 4: Questions Related to Agency Actions to Notify the Public about 
New or Changed Information Collections or Efforts: 

Question: Was a timely system of records notice published in the 
Federal Register? 
Yes: CDE; 
Partial: A[A]; 
No: B. 

Question: Did the notice indicate the name and location of the system 
of records? 
Yes: ACDE; 
No: B. 

Question: Did the notice specify the category of individuals in the 
system of records? 
Yes: ACDE; 
No: B. 

Question: Did the notice specify the category of records in the system 
of records? 
Yes: ACDE; 
No: B. 

Question: Did the notice specify the routine uses of the system of 
records? 
Yes: ACDE; 
No: B. 

Question: Did the notice specify how the agency stores, maintains, and 
accesses the records? 
Yes: ACDE; 
No: B. 

Question: Did the notice identify the individual responsible for 
maintaining the information in the system of records and give 
instructions on how to contact that person? 
Yes: ACD; 
Partial: E; 
No: B. 

Question: Did the notice specify the process by which an individual can 
request notification if the system contains records pertaining to him 
or her? 
Partial: E; 
No: B; 
Exempt: ACD. 

Question: Did the notice specify the procedures by which an individual 
can gain access to a record pertaining to him or her and challenge its 
contents? 
Partial: E; 
No: B; 
Exempt: ACD. 

Question: Did the notice specify the categories of information sources 
used by the system? 
Yes: DE; 
No: B; 
Exempt: AC. 

Legend: 

A: RMA's data mining effort: 

B: State's Citibank Custom Reporting System: 

C: IRS's Reveal effort: 

D: FBI's Foreign Terrorist Tracking Task Force effort: 

E: SBA's Lender/Loan Monitoring System: 

Source: GAO analysis of agency information. 

[A] RMA's notice was not timely because it was published after its 
effort had been implemented. 

[End of table]

The published system of records notices related to the data mining 
efforts at IRS, FBI, and RMA generally included the information 
required by the Privacy Act. However, the notice published by SBA was 
only partially compliant with the act because it did not clearly 
describe the process individuals could use to review their information. 
For example, SBA's notice listed several dozen contacts and indicated 
that individuals should identify the appropriate contact from the list 
when making requests related to their information. However, the notice 
did not describe how to identify which contact would be appropriate. 

No notice was published for the Citibank purchase card management tool 
used by State. As the agency responsible for the governmentwide 
purchase card program, GSA is responsible for ensuring that the program 
follows statutory requirements, including those in the Privacy Act. 
However, it has not published a system of records notice that would 
cover the activities of State or other agencies participating in the 
program. According to GSA officials, the agency did not consider 
purchase card records to be a system of records because it believed the 
names and addresses it collects pertain to government employees and 
thus are exempt from the Privacy Act. The GSA officials added that a 
programwide system of records notice has been partially drafted, but it 
has not been finalized because it is waiting for guidance from OMB on a 
recent change to the program that could require the collection of 
additional personal information. Without adequate notice of this 
information collection effort, the ability of State employees and the 
public to participate in decisions about the collection and use of 
personal information, as envisioned under the Privacy Act, is limited. 

IRS, RMA, and FBI did not include in their notices a description of how 
individuals can review their personal information because they claimed 
the exemption available for records used in law enforcement.[Footnote 
13]

Two Agencies Allowed Individuals to Access their Information; Others 
Were Exempt: 

The Privacy Act requires agencies to, among other things, allow 
individuals to (1) review their records (meaning any information 
pertaining to them that is contained in the system of records), (2) 
request a copy of their record or information from the system of 
records, and (3) request corrections in their information. Such 
provisions can provide a strong incentive for agencies to correct any 
identified errors. 

State and SBA provided mechanisms by which individuals could review the 
information the agencies collected and used in their data mining 
efforts; the three other agencies claimed allowable exemptions from 
this requirement. Table 5 details the steps the agencies took to 
provide individuals with access to their personal information used in 
the data mining efforts. 

Table 5: Questions Related to Agency Actions to Provide Individuals 
with Access to Their Personal Records: 

Question: Does the agency permit individuals to review the records 
about themselves and have a copy? 
Yes: BE; 
Exempt: ACD. 

Question: Does the agency permit individuals to request amendments of 
records pertaining to them? 
Yes: BE; 
Exempt: ACD. 

Question: Does the agency permit individuals to request corrections to 
any portion of records pertaining to them? 
Yes: BE; 
Exempt: ACD. 

Legend: 

A: RMA's data mining effort: 

B: State's Citibank Custom Reporting System: 

C: IRS's Reveal effort: 

D: FBI's Foreign Terrorist Tracking Task Force effort: 

E: SBA's Lender/Loan Monitoring System: 

Source: GAO analysis of agency information. 

[End of table]

Citibank provides State cardholders with monthly statements detailing 
their purchase card activity and account information--the personal 
information used in the data mining effort--that cardholders are 
required to review. State also has a process with Citibank to dispute 
and resolve any inaccuracies in this information. 

SBA's system of records notice described a general procedure that 
individuals could use to review personal information SBA collects 
(which is one of the information sources used in the data mining 
effort.)[Footnote 14] In addition, the agency has procedures that 
detail how individuals are permitted to review records relating to them 
and request amendment. 

FBI, IRS, and RMA claimed an allowable exemption for their efforts 
because their records are used in law or tax enforcement. FBI and IRS 
have adopted procedures under which they could waive the exemption and 
allow individuals to access their information in cases where disclosure 
would not endanger ongoing investigations or reveal investigative 
methods. 

Three Agencies Fulfilled or Partially Fulfilled Requirements Regarding 
the Notification of Individuals When Personal Information Is Collected: 

The Privacy Act requires that, when collecting personal information 
from individuals, agencies should provide those individuals with notice 
that includes the purpose for which the information was collected and 
the potential effect of not providing the information. Among other 
requirements, the act requires that the notification be located on the 
form the agency uses to collect information from the individual or on 
an accompanying form that the individual can keep, and that the notice 
cite the legal authority for the information request. According to OMB, 
this requirement is based on the assumption that individuals should be 
provided with sufficient information about the request to make a 
decision about whether to respond. The 1973 federal advisory committee 
report noted that the requirement was intended to discourage 
organizations from probing unnecessarily for details of people's lives 
under circumstances in which people may be reluctant to refuse to 
provide the requested data. 

The agencies responsible for two of the five efforts we reviewed 
generally fulfilled the Privacy Act requirements regarding providing 
notice at the time of collection, one partially fulfilled these 
requirements, and two agencies claimed exemptions from these 
requirements. Table 6 details the steps agencies took to notify 
individuals when collecting personal information. 

Table 6: Questions Related to Agency Actions to Notify Individuals at 
the Time Personal Information Was Collected: 

Question: Were individuals notified of the legal authority that 
authorized the agency to collect the information? 
Yes: E; 
Partial: A; 
No: B; 
Exempt: CD. 

Question: Were individuals notified of whether or not submitting 
information was mandatory or voluntary? 
Yes: BE; 
Partial: A; 
Exempt: CD. 

Question: Were individuals notified of the principal purposes for which 
the information was to be used? 
Yes: BE; 
Partial: A; 
Exempt: CD. 

Question: Were individuals notified of the routine uses for the 
information? 
Yes: BE; 
Partial: A; 
Exempt: CD. 

Question: Were individuals notified of the effects, if any, of not 
supplying the information? 
Yes: BE; 
Partial: A; 
Exempt: CD. 

Legend: 

A: RMA's data mining effort: 

B: State's Citibank Custom Reporting System: 

C: IRS's Reveal effort: 

D: FBI's Foreign Terrorist Tracking Task Force effort: 

E: SBA's Lender/Loan Monitoring System: 

Source: GAO analysis of agency information. 

[End of table]

State and SBA generally provided the required notice when they 
collected personal information. Since May 2005, SBA has included a 
notice on applications for its loan programs that addressed the Privacy 
Act requirements. State provided notification using both a written 
notice on the purchase card application and a mandatory training 
program that all potential purchase cardholders must take before 
applying to the program. However, neither of the methods State used to 
notify employees identified the legal basis for the information 
request, as required by the Privacy Act. State officials told us that 
they were unaware that such a notice was required, but that they intend 
to notify employees of the legal basis in the future. 

RMA also provided a notice on application forms, but these notices were 
not provided to everyone who supplied personal information. In the crop 
insurance program, participants apply for coverage from an insurance 
company that collects information from applicants and provides it to 
RMA. Because the information is collected on its behalf, RMA is 
responsible for ensuring that individuals receive the required 
notifications. However, RMA could not demonstrate that all individuals 
who provided it with data were properly notified. RMA provided 
documents showing that 16 of the 17 insurance providers included the 
disclosures required by the Privacy Act on the application forms they 
provided to borrowers. However, none of the lenders demonstrated that 
they provided adequate notice to insurance agents or adjusters, who 
also provided personal information used by RMA. According to RMA 
officials, they were unaware that this Privacy Act requirement applies 
to all the individuals about whom they collected information. When 
agencies do not fully notify individuals about the purpose and uses of 
the information they collect, the individuals have limited ability to 
make a reasonable decision about whether or not to supply the requested 
information. 

FBI and IRS claimed allowable exemptions to the requirement to provide 
direct notice to individuals when they collect information under the 
Privacy Act because they use the collected information for law 
enforcement purposes. 

Agencies' Actions to Ensure Security of Data Mining Efforts and Quality 
of Information They Used Were Inconsistent: 

The Privacy Act requires agencies to establish appropriate 
administrative, technical, and physical safeguards to ensure the 
security of records and to protect against any anticipated threats or 
hazards to their security that could result in substantial harm, 
embarrassment, inconvenience, or unfairness to any individual about 
whom information is maintained. While the act does not specify the 
types of procedures that agencies should take to ensure information 
security, FISMA and related OMB guidance define specific procedures for 
ensuring the security (which encompasses protections for availability, 
confidentiality, and integrity) of information. These procedures 
include performing risk assessments and developing security plans. 
Guidance from OMB and the National Institute of Standards and 
Technology (NIST) provide further detail on how agencies are to address 
security. 

The Privacy Act also requires agencies to maintain all records used to 
make determinations about an individual with sufficient accuracy, 
relevance, timeliness, and completeness as is reasonably necessary to 
assure fairness. For the purposes of this report, we refer to these 
requirements as data quality requirements. According to OMB, this 
provision is intended to minimize the risk that an agency will make an 
adverse determination about an individual based on inaccurate, 
incomplete, or out-of-date records. 

In the five efforts we reviewed, agency compliance with the security 
and data quality requirements was inconsistent. Table 7 summarizes the 
steps agencies took to ensure the security and accuracy of the 
information in the data mining efforts. Appendix VII provides 
additional detail on the specific actions that make up the key 
requirements and agencies' compliance with them. 

Table 7: Questions Related to Agency Actions Safeguarding and Ensuring 
the Quality of Records Containing Personal Information: 

Question: Has the agency performed a risk assessment to determine the 
information system vulnerabilities, identify threats, and develop 
countermeasures to those threats? 
Yes: ACDE; 
No: B. 

Question: Has the agency developed a security plan for each system? 
Yes: CD; 
Partial: AE; 
No: B. 

Question: Has the agency had the system(s) certified and accredited by 
management? 
Yes: ADE; 
No: B; 
Exempt: C[A]. 

Question: Does the agency have a tested contingency plan for the 
system? 
Yes: CE; 
Partial: AD; 
No: B. 

Question: Has the agency performed testing and evaluation of the data 
mining system(s)? 
Yes: DE; 
Partial: AC[A]; 
No: B. 

Question: Did the agency take steps to ensure the accuracy, relevance, 
timeliness, and completeness of the data used to make determinations 
about individuals? 
Yes: B; 
Partial: A; 
Exempt: CDE[B]. 

Legend: 

A: RMA's data mining effort: 

B: State's Citibank Custom Reporting System: 

C: IRS's Reveal effort: 

D: FBI's Foreign Terrorist Tracking Task Force effort: 

E: SBA's Lender/Loan Monitoring System: 

Source: GAO analysis of agency information. 

[A] The IRS Reveal effort became operational in February 2005 and has 
interim authority to operate-not full certification and accreditation. 
IRS is currently testing the system. 

[B] SBA's data mining effort is not used to make decisions about 
individuals. 

[End of table]

Security. While the agencies responsible for the data mining efforts we 
reviewed followed a number of key security procedures, none had fully 
implemented all the procedures we evaluated. Although SBA, FBI, and RMA 
applied many of the key procedures required for the information systems 
used in their data mining efforts, their documentation did not include 
all the information called for in federal guidance. Specifically, SBA 
and RMA did not fully document its incident response capability, and 
neither FBI nor RMA demonstrated that their systems had tested 
contingency plans--a key requirement for adequate security planning. 
IRS produced several of the required security-related documents, but 
its documentation did not demonstrate that all of the underlying 
requirements had been met. IRS's system became operational in February 
2005 and is currently undergoing testing. 

Neither of the two agencies responsible for State's data mining effort 
took the steps required to ensure that the information systems used in 
the effort had adequate security. As the contracting agency for the 
governmentwide purchase card program, GSA is responsible for ensuring 
that information and information systems used in the program--including 
those provided by contractors--follow FISMA guidance. However, 
according to agency officials, GSA has not evaluated vendors' systems 
for compliance with the specific provisions of FISMA; instead, GSA 
currently relies on the banks to provide security and on the Office of 
the Comptroller of the Currency[Footnote 15] for oversight of the 
banks. 

Because State uses an information system operated by Citibank, through 
its task order under the purchase card program contract, FISMA requires 
that State ensure that Citibank's system complies with FISMA 
provisions. While State performed a general review of Citibank's 
security processes before starting to use its systems, State did not 
specifically evaluate Citibank's compliance with federal security 
requirements. Agencies that do not take adequate steps to ensure 
information security risk having information improperly exposed, 
altered, or destroyed. For example, another bank participating in a 
related program lost backup tapes containing personal information on 
government employees.[Footnote 16] GSA program officials noted that 
they were satisfied that the situation was an accident and not a 
reflection of a significant security failing on the bank's part. 

Data quality. State took steps to ensure that the information used in 
its data mining efforts is accurate, relevant, timely, and complete. 
State used a monthly review process whereby cardholders review the 
account statements provided by Citibank for accuracy. The same 
information is also reviewed by the cardholders' supervisors. In 
addition, area program coordinators must review the purchase card 
programs in their area annually. 

RMA took steps that partially ensure the quality of the data in its 
data mining effort; for example, it has an editing and data validation 
process in place. However, while this process addresses the accuracy of 
the system's data, it does not address the relevance, timeliness or 
completeness of the personal information in the data mining system 
because program officials were unaware of the requirement to do so. 
Those agencies that do not take adequate steps to ensure the quality of 
the information they use and collect risk making unwarranted decisions 
based on inaccurate information. 

The provision regarding data quality did not apply to three efforts. 
SBA does not use the information in its data mining effort to make 
determinations about individuals; rather, it uses it to manage groups 
of loans. FBI and IRS claimed an allowable exemption because their 
records are used for criminal law enforcement. According to the rule 
justifying FBI's exemption, it is impossible to make such 
determinations in part because information that may initially appear to 
be untimely or irrelevant can acquire new significance as an 
investigation proceeds. 

Five Agencies Lacked Comprehensive Privacy Impact Assessments for Their 
Data Mining Efforts: 

The E-Government Act of 2002 requires that federal government agencies 
conduct privacy impact assessments before developing or procuring 
information technology or initiating any new electronic data 
collections containing personal information on 10 or more individuals. 
According to OMB, such assessments help agencies to: 

* determine whether the agency's information handling practices conform 
to the established legal, regulatory, and policy requirements regarding 
privacy;

* evaluate risks arising from electronic collection and maintenance of 
information about individuals; and: 

* evaluate protections or alternative processes needed to mitigate the 
risks identified. 

Thus, a timely and comprehensive privacy impact assessment can be used 
by agencies as a tool to ensure not only strict compliance with the 
various laws related to privacy, but also as a means to consider 
broader privacy principles, such as the fair information practices that 
formed the basis for those laws. 

The E-Government Act lays out a series of requirements for assessments, 
such as (1) they must describe and analyze how the information is 
secured, (2) they must describe and analyze the intended uses of 
information, (3) the agency's chief information officer (or designee) 
must review the assessment, and (4) the assessment must be publicly 
available unless making it so would raise security concerns or reveal 
sensitive or classified information. OMB guidance does not require 
privacy impact assessments for systems used for internal government 
operations or for national security systems; however, individual 
agencies may have more stringent privacy impact assessment 
requirements. 

While four of the five agencies were required to conduct assessments by 
statute or by agency rule, three (RMA, SBA, and IRS) did so. However, 
none of these assessments adequately addressed all the statutory 
requirements. Table 8 summarizes agency actions to assess the privacy 
impacts of their data mining efforts. 

Table 8: Questions Related to Agency Actions to Conduct Privacy Impact 
Assessments: 

Question: Was a privacy impact assessment prepared? 
Yes: ACE; 
No: D; 
Exempt: B[B]. 

Question: Did the privacy impact assessment describe and analyze what 
information was to be collected? 
Partial: ACE; 
No: D; 
Exempt: B[B]. 

Question: Did the privacy impact assessment describe and analyze why 
the information was to be collected? 
Partial: AC; 
No: DE; 
Exempt: B[B]. 

Question: Did the privacy impact assessment describe and analyze the 
intended use of the information? 
Partial: AC; 
No: DE; 
Exempt: B[B]. 

Question: Did the privacy impact assessment describe and analyze with 
whom the collected information was to be shared? 
Partial: ACE; 
No: D; 
Exempt: B[B]. 

Question: Did the privacy impact assessment describe and analyze the 
notice or opportunity for consent for individuals impacted by the 
system? 
No: ADE; 
Exempt: C[A] Bb. 

Question: Did the privacy impact assessment describe and analyze how 
the information was to be secured? 
Partial: ACE; 
No: D; 
Exempt: B[B]. 

Question: Did the privacy impact assessment describe and analyze 
whether a Privacy Act system of records is being created? 
Partial: ACE; 
No: D; 
Exempt: B[B]. 

Question: Did the privacy impact assessment identify the choices the 
agency made as a result of performing the assessment? 
Partial: C; 
No: ADE; 
Exempt: B[B]. 

Question: Was the privacy impact assessment reviewed by the agency's 
chief information officer or his/her equivalent? 
Yes: C; 
No: ADE; 
Exempt: B[B]. 

Question: Was the privacy impact assessment made publicly available? 
Yes: E; 
Partial: C; 
No: AD; 
Exempt: B[B]. 

Legend: 

A: RMA's data mining effort: 

B: State's Citibank Custom Reporting System: 

C: IRS's Reveal effort: 

D: FBI's Foreign Terrorist Tracking Task Force effort: 

E: SBA's Lender/Loan Monitoring System: 

Source: GAO analysis of agency information. 

[A] The IRS Reveal system is exempt from giving notice at the time of 
collection based on a law enforcement exemption to the Privacy Act. 

[B] OMB guidance does not require privacy impact assessments for 
internal government systems. 

[End of table]

Three agencies conducted assessments that partially addressed the 
requirements. For example, while RMA's plan addressed the information 
to be collected and how it was to be used, it did not receive the 
required review by the agency chief information officer or designee. In 
addition, RMA's assessment was not made publicly available, even though 
the document did not include any sensitive information.[Footnote 17] 
IRS's notice stated that it would use the information for queries, but 
did not analyze the purpose for collecting the information or its 
intended uses, as required. For instance, IRS's privacy impact 
assessment states that the system "is used to identify potential 
criminal investigations of individuals or groups" in "support of the 
overall IRS mission." While this describes the purpose for collecting 
the information and its intended uses, it does not analyze how the 
agency reached these decisions. RMA and IRS did not fully address these 
steps because they used a prior version of guidance that did not 
address all the current requirements when conducting their assessments. 
SBA conducted an assessment of a previous loan monitoring effort that 
addressed several aspects of their current data mining effort. This 
assessment included general descriptions of what information was to be 
collected, why the information was to be collected, the intended use of 
the information, and how the information was to be secured. However, 
the assessment did not analyze these decisions, as required by OMB's 
guidance. According to SBA officials, the privacy assessment was not 
more specific because at the time it was completed, the possible uses 
of the system and the format it would take were not certain. SBA 
officials added that a more specific privacy assessment of the data 
mining effort has been drafted and is expected to be published later in 
the current fiscal year. 

FBI has not conducted a privacy impact assessment for its data mining 
effort. FBI is not required by statute to conduct assessments on these 
systems because they are classified as national security systems. 
However, under FBI regulations, assessments are required for these 
systems. According to agency officials, FBI is in the process of 
preparing privacy assessments for the two systems that make up its data 
mining effort, but these assessments were delayed due to competing 
priorities for its operational support team. The officials said that 
the agency does not have a target date for completing the assessments. 

The lack of comprehensive assessments is a missed opportunity for 
agencies to ensure that the data mining efforts we reviewed are subject 
to the most appropriate privacy protections. Because the assessments 
did not address all the required subjects, including those related to 
several Privacy Act provisions, agencies were sometimes unaware that 
they were not following all the requirements of the act. Further, 
without analyses regarding their approaches to privacy protection, 
agencies have little assurance that their approaches reflect the 
appropriate balance between individual privacy rights and the 
operational needs of the government. 

GSA, the contracting agency for the governmentwide purchase card 
program, did not conduct a privacy assessment because OMB guidance does 
not require them for internal government programs. However, OMB 
guidance encourages agencies to conduct privacy impact assessments on 
systems that collect information in identifiable form about government 
personnel. Further, according to agency officials, GSA is developing 
guidance requiring assessments for all new agency systems which will 
apply to the purchase card program. 

Conclusions: 

The five data mining efforts illustrate ways in which federal agencies 
collect and use personal information for purposes such as program 
oversight and law enforcement. The agencies responsible for these data 
mining efforts took many of the key steps required to protect the 
privacy and security of the personal information they used. However, 
none of the agencies followed all the key privacy and security 
provisions we reviewed. Those that did not apply key privacy 
protections limited the ability of the public--including those 
individuals whose information was used--to participate in the 
management of that personal information. Those agencies that did not 
apply the appropriate security protections increased the risk that 
personal information could be improperly exposed or altered. Until 
agencies fully comply with the Privacy Act, they lack assurance that 
individual privacy rights are appropriately protected. 

Further, none of the agencies we reviewed conducted a complete privacy 
impact assessment. Had their assessments fully addressed the required 
Privacy Act provisions, the agencies would have had an opportunity to 
identify and remedy areas of noncompliance. In addition, none of the 
privacy impact assessments adequately addressed the choices that 
agencies made regarding privacy in their data mining efforts. As a 
result, the basis for their choices regarding tradeoffs between privacy 
protections and operational needs is unclear. Better analyses of such 
choices could help agencies strike the appropriate balance between 
operational needs and individuals' rights to privacy. 

Recommendations: 

To ensure that the data mining efforts reviewed include adequate 
privacy protections, we are making 19 recommendations to the agencies 
responsible for them. Specifically, we recommend that the Secretary of 
Agriculture direct the Administrator of the Risk Management Agency 
(RMA) to: 

* provide the required Privacy Act notices to individuals, including 
producers, insurance agents, and adjusters, when personal information 
is collected from them;

* apply the appropriate information security measures defined in OMB 
and NIST guidance to the systems used in the RMA data mining effort, 
specifically, the development of a complete system security plan, a 
tested contingency plan, and regular testing and evaluation of the 
systems used in the effort;

* develop and implement procedures that ensure the accuracy, relevance, 
timeliness, and completeness of personal information used in the RMA 
data mining effort to make determinations about individuals;

* revise the privacy impact assessment for the RMA data mining effort 
to comply with OMB guidance, including analyses of the intended use of 
the information it collects, with whom the information will be shared, 
how the information is to be secured, opportunities for impacted 
individuals to comment, and the choices made by the agency as a result 
of the assessment;

* have the completed privacy impact assessment approved by the chief 
information officer or equivalent official; and: 

* make the completed privacy impact assessment available to the public, 
as appropriate. 

We recommend that the Secretary of the Treasury direct the Commissioner 
of the Internal Revenue Service to: 

* apply the appropriate information security measures defined in OMB 
and NIST guidance to the systems used in the Reveal data mining effort, 
specifically, the performance of regular system testing and evaluation 
against NIST guidance;

* revise the privacy impact assessment for the Internal Revenue 
Service's Reveal system to comply with OMB guidance, including analyses 
of the information to be collected, the purposes of the collection, the 
intended use of the information, how the information is to be secured, 
and opportunities for impacted individuals to comment; and: 

* make the completed privacy impact assessment available to the public, 
as appropriate. 

We recommend that the Attorney General direct the Director of the 
Federal Bureau of Investigation to: 

* apply the appropriate information security measures defined in OMB 
and NIST guidance to the systems used in the Foreign Terrorist Tracking 
Task Force data mining effort, including the development of tested 
contingency plans;

* establish a date for the completion of a privacy impact assessment 
for its data mining effort that complies with OMB guidance, including 
analyses of the information to be collected, the purposes of the 
collection, the intended use of the information, with whom information 
will be shared, how the information is to be secured, opportunities for 
impacted individuals to comment, and the choices made by the agency as 
a result of the assessment;

* have the completed privacy impact assessment approved by the chief 
information officer or equivalent official; and: 

* make the completed privacy impact assessment available to the public, 
as appropriate. 

We recommend that the Secretary of State direct the Under Secretary for 
Management to notify purchase card participants of the legal basis 
under which the department collects their personal information, as 
required. 

We recommend that the Administrator of the Small Business 
Administration: 

* amend the system of records notice regarding its data mining effort 
to clearly identify the individual responsible for the effort, the 
process by which individuals can request notification that the system 
includes records about them, and the procedures individuals should use 
to review records pertaining to them;

* complete a privacy impact assessment for the data mining effort that 
complies with OMB guidance, including analyses of the information to be 
collected, the purposes of the collection, the intended use of the 
information, how the information is to be secured, opportunities for 
impacted individuals to comment, and the choices made by the agency as 
a result of the assessment; and: 

* make the completed privacy impact assessment available to the public, 
as appropriate. 

We recommend that the Administrator of the General Services 
Administration: 

* publish a system of records notice for the purchase card program that 
specifies the name of the system, the categories of individuals and 
records in the system, the categories of information sources used by 
the system, the routine uses of the system, how the agency stores and 
maintains the system, the individual responsible for the effort, the 
process by which individuals can request notification that the system 
includes records about them, and the procedures individuals should use 
to review records pertaining to them and: 

* ensure that the appropriate information security measures defined in 
OMB and NIST guidance are applied to the systems used in the Citibank 
Custom Reporting System data mining effort, including the development 
of a risk assessment, a system security plan, a tested contingency 
plan, the performance of regular testing and evaluation, and the 
completion of certification and accreditation by agency management. 

Agency Comments and Our Evaluation: 

We provided Agriculture, Treasury, Justice, State, SBA, and GSA with a 
draft of this report for their review and comment. We received written 
comments on the report and its recommendations from SBA, Agriculture, 
State, and Treasury, and comments via e-mail from GSA's Assistant 
Commissioner for Acquisition. These agencies generally agreed with the 
majority of our recommendations, but disagreed with others. Justice's 
Senior Audit Liaison stated that the department had no comments. 
Agriculture, IRS, State, and SBA also provided technical comments, 
which we addressed as appropriate. 

The Administrator, RMA, stated that RMA agreed with the majority of our 
recommendations and that the agency had taken steps to implement many 
of them. In response to our recommendation that RMA strengthen security 
measures, the Administrator stated that RMA has a security plan for its 
data mining system and performs regular testing and evaluation. While 
our draft indicated that RMA had implemented some of the necessary 
security measures, we noted that it did not follow all related 
guidance. Specifically, the system security plan did not describe its 
incident response capability, and RMA did not document that it had 
conducted annual testing or that its tests included penetration or 
vulnerability testing. We clarified this recommendation to focus on the 
incomplete and undocumented security measures we identified. In 
response to our recommendation that RMA develop and implement 
procedures that ensure the quality of personal information used in its 
data mining system, USDA commented that they already have an editing 
and validation process in place. We clarified the discussion of this 
point in our report. However, while this process addresses the accuracy 
of the system's data, it does not address the relevance, timeliness or 
completeness of the personal information in the data mining system. 
USDA's comments are contained in appendix VIII. 

Treasury's Chief Information Officer generally agreed with our 
recommendations regarding a privacy impact assessment, and said that 
IRS will conduct a new privacy impact assessment that complies with 
current OMB guidance after Reveal becomes operational. While conducting 
a new privacy impact assessment is an appropriate step, we note that 
the E-Government Act and OMB guidance require that assessments be 
conducted before systems become operational. In responding to our 
recommendation to ensure that appropriate security measures are applied 
to IRS's Reveal data mining effort, Treasury stated that Reveal is in 
compliance with OMB, NIST, and Treasury security guidance and is 
operating under an interim authorization to operate while it undergoes 
certification and accreditation. Our report acknowledges that IRS had 
applied several security measures, but also notes that required regular 
testing and evaluation was not yet in place. We clarified this 
recommendation to focus on these requirements. Treasury's comments are 
contained in appendix IX. 

State's Assistant Secretary and Chief Financial Officer generally 
agreed with our recommendation that it notify purchase card 
participants of the legal basis under which the Department collects 
their personal information; State responded that it will take the 
necessary steps to address this recommendation. In addition, regarding 
a recommendation we made to GSA concerning the Citibank Custom 
Reporting System, State raised the issue of whether a privacy impact 
assessment is required for systems that collect information on federal 
employees, as is the case with this system. As discussed below in our 
response to GSA, we agree that OMB guidance exempts internal government 
systems from the requirement to conduct privacy impact assessments and 
have clarified our report to reflect this. State's comments are 
contained in appendix X. 

SBA's Associate Deputy Administrator for Office of Capital Access 
generally agreed with our recommendations and provided information on 
its planned actions. SBA's comments are contained in appendix XI. 

GSA's Assistant Commissioner for Acquisition generally disagreed with 
our recommendations. He stated that GSA has not published a system of 
records notice for the purchase card program because this program does 
not capture personal information. However, as described in the report, 
the system retrieves information about individuals by personal 
identifiers, and thus meets the Privacy Act's definition of a system of 
records. In commenting on our recommendation that GSA ensure that 
appropriate security measures defined in OMB and NIST guidance are 
applied to the data mining effort, GSA explained that they have 
reviewed the security standards of the five financial institutions on 
the GSA SmartPay master contract, and have concluded that the 
commercial standards and procedures provided by these institutions 
offer the Citibank Custom Reporting System sufficient security 
protection. However, GSA is required to ensure that information and 
information systems used in the program--including those provided by 
contractors--meet the requirements of FISMA, including the implementing 
guidance from OMB and NIST. Further, recent OMB guidance requires 
agencies to ensure implementation of security measures identical to 
those required under FISMA. GSA also provided a security risk 
assessment of the security in the SmartPay Master Contract. However, 
the assessment does not address any of the elements of the NIST 
guidance for implementing risk assessments, such as identifying the 
system's vulnerabilities and threats. Finally, in response to our three 
recommendations regarding the requirement to conduct a privacy impact 
assessment, the Assistant Commissioner stated that GSA is not required 
to conduct a privacy impact assessment because it is contracting for a 
financial system, not an IT system. Because it is an internal 
government system, we agree that GSA is not required by OMB guidance to 
conduct a privacy impact assessment on the Citibank system and have 
clarified our report to reflect this. 

As agreed with your office, unless you publicly release the contents of 
this report earlier, we plan no further distribution until 30 days from 
the report date. We will send copies of this report to the Chairmen and 
Ranking Minority Members of other Senate and House committees and 
subcommittees that have jurisdiction and oversight responsibility for 
SBA, Agriculture, State, Treasury, GSA, and Justice. Copies will be 
made available to others on request. In addition, this report will be 
available at no charge on the GAO Web site at [Hyperlink, 
http://www.gao.gov]. 

If you have any questions concerning this report, please contact me at 
(202) 512-6240 or by e-mail at [Hyperlink, koontzl@gao.gov]. Contact 
points for our Offices of Congressional Relations and Public Affairs 
may be found on the last page of this report. GAO staff who made major 
contributions to this report are listed in appendix XII. 

Sincerely yours,

Signed by: 

Linda D. Koontz: 
Director, Information Management Issues: 

[End of section]

Appendixes: 

Appendix I: Scope and Methodology: 

To address our objectives, we used a case study methodology. We 
selected the data mining efforts to be included in our evaluations from 
the 122 federal data mining systems reported to us in 2004.[Footnote 
18] In that report, we identified the six most common purposes for the 
data mining activities reported to us. For the purposes of this review, 
we excluded systems used for two purposes: we did not select any 
systems used for analyzing scientific and research information because 
few of those systems used personal information, and we excluded systems 
used for managing human resources because such records fall under 
different privacy rules and regulations. 

The remaining four most common purposes were: 

* improving service or performance;

* detecting fraud, waste, and abuse;

* detecting criminal activities or patterns; and: 

* analyzing intelligence and detecting terrorist activities. 

From the systems that were used for these purposes, we selected all 
those that met each of the following criteria: 

* used personal identifiers,

* were operational, and: 

* used data from another agency or private sector data. 

These criteria were chosen to ensure that the efforts we selected 
illustrated agency practices regarding personal information. In 
addition, we selected no more than one system from each department or 
agency. 

We analyzed the information provided in 2004 and determined that 11 
data mining efforts met all of our initial selection criteria. We 
contacted the agencies responsible for the systems to confirm the 
accuracy of the information previously provided. As a result of the 
updated information, we eliminated from consideration several systems 
that no longer met all of the selection criteria, resulting in the 
final selection of five data mining systems for our case study review. 

To describe the characteristics of the selected federal data mining 
efforts, we analyzed system documentation, public notices, and other 
relevant documents and interviewed officials at the responsible 
department or agency, and, when applicable, the supporting contractor. 
Agency officials were provided with several opportunities to review our 
descriptions of the selected systems and the graphical depictions 
included in appendixes II through VI. 

To determine whether agencies provided adequate privacy protection for 
the personal information used in the selected data mining efforts, we 
analyzed federal privacy and security laws, regulations, and other 
guidance to identify key steps and procedures for protecting the 
privacy of individual information. We then developed a data collection 
instrument consisting of a series of questions about agency actions 
that followed the key steps and procedures, as well as questions on the 
detailed characteristics of the data mining systems, and provided the 
instrument to the responsible agencies. We reviewed the agencies' 
responses and any supporting documentation they provided, and assigned 
an answer of yes (compliant with all of the guidance related to that 
question), no (not compliant with any of the guidance related to that 
question), or partial (compliant with some, but not all of the 
guidance) to each question. We also reviewed rules claiming exemptions. 
We discussed the results with agency officials and made adjustments as 
appropriate. 

Because we studied only five data mining efforts and because of the 
method of selection, we cannot conclude that our results represent any 
larger group of data mining efforts. Although they were not 
representative of all federal data mining efforts, we believe that the 
five efforts we reviewed illustrate some of the ways in which agencies 
satisfy federal privacy provisions and the circumstances under which 
agencies can claim exemptions to these provisions. 

We conducted our work from May 2004 to June 2005 at the Washington, 
D.C., area offices of the Departments of State and Agriculture, 
Internal Revenue Service, Federal Bureau of Investigation, Small 
Business Administration, and General Services Administration, at an 
agency facility in Philadelphia, Pennsylvania, and at the Stephenville, 
Texas, location of an agency contractor. Our work was conducted in 
accordance with generally accepted government auditing standards. 

[End of section]

Appendix II: Risk Management Agency's Data Mining Effort: 

The Risk Management Agency[Footnote 19] (RMA) uses a data mining system 
designed by Tarleton State University's Center for Agribusiness 
Excellence (CAE) to assist it in detecting fraud, waste, and abuse in 
the federal crop insurance program. The data mining system is used to 
identify producers, insurance agents, and loss adjusters who may be 
abusing the program. Its inputs include insurance records on policy 
holders, agents, and loss adjusters, as well as data on soil, weather, 
and land. It produces several types of outputs, including lists of 
names of individuals whose behavior is anomalous. 

Purpose and Uses: 

The purpose of the RMA data mining system is to detect fraud, waste, 
and abuse in the federal crop insurance program by investigating 
potential leads and confirming suspicious activity in high-profile 
cases.[Footnote 20] It also uses the system to improve program 
policies, guidance, and data quality. According to RMA officials, the 
system significantly augmented agency program integrity initiatives and 
accounted for over $340 million in cost avoidance savings since its 
inception. 

According to RMA officials, CAE analysts identify potential abusers of 
the federal crop insurance program primarily by developing scenarios of 
abuse of the program by producers, insurance agents, and loss 
adjusters. Analysts query the data warehouse by using data mining and 
pattern recognition techniques to identify information, patterns, 
anomalies, or relationships indicative of fraud, waste, and abuse. CAE 
analysts then generate reports for RMA regional compliance offices, 
which use the reports to determine which producers should be inspected 
for potential abuse. 

RMA uses reports produced by the data mining system for policy 
development in the Crop Insurance Handbook and improvement of the 
federal crop insurance program. RMA's officials often request data 
mining reports (1) to help evaluate pilot programs before making policy 
changes, (2) to determine the best way to change program procedures 
once the policies are implemented, and (3) to determine ways to enhance 
the data through quality control reviews. 

How It Works: 

RMA's data mining effort uses a data warehouse containing crop 
insurance data and information from weather, soil, and land survey 
sources to develop and conduct pattern-based searches for identifying 
information, patterns, anomalies, or relationships indicative of fraud, 
waste, and abuse. Pattern-based searches are based on scenarios of 
fraudulent schemes for obtaining crop insurance indemnities (the dollar 
amount paid in the event of an insured loss) that are developed by 
analysts and agricultural experts. The data mining system helps 
analysts uncover these patterns through an iterative process. Each 
scenario is tested and refined by querying data in the warehouse. The 
results are then provided to a CAE product review team that approves or 
rejects the scenario. Once a scenario is approved, analysts can use it 
to search the data warehouse for individuals who match the scenario 
patterns. Analysts use multiple scenarios to query the data warehouse 
in order to identify program participants who are potentially involved 
in fraudulent activities, resulting in a "spot check list."

Table 9 lists (1) the names and attributes of the scenarios developed 
by RMA and CAE and (2) the agency-reported summary of potentially 
fraudulent claims reported by producers whose behavior was identified 
as anomalous on the 2002 spot check list. According to RMA officials, 
the eight scenarios listed in table 9 have been the most successful in 
generating program savings. 

Table 9: Scenarios Used to Identify Potential Abusers: 

Dollars in millions. 

Scenario name: Triplets; 
Scenario characteristics: Agents, adjusters, and producers linked by 
anomalous behavior that is suggestive of collusion; 
Summary of the 2002 spot check list: potentially fraudulent claims: 
$4.3. 

Scenario name: Rare big losses; 
Scenario characteristics: Producers who make claims much too often 
compared to other producers of the same crop in the same area; 
Summary of the 2002 spot check list: potentially fraudulent claims: 
$32.8. 

Scenario name: Under-reported harvest production; 
Scenario characteristics: Producers who hide part of their production 
by reporting it under someone else's name or by growing a crop on land 
hidden from inspectors. They are compared only to other producers who 
experienced the same weather conditions; 
Summary of the 2002 spot check list: potentially fraudulent claims: 
$23.5. 

Scenario name: Frequent filers; 
Scenario characteristics: Anomalous producers reporting consecutive 
multiyear losses. They make claims for seven consecutive years and 
their indemnities each year are at least as high as their insurance 
premiums; 
Summary of the 2002 spot check list: potentially fraudulent claims: 
$21.7. 

Scenario name: Yield switching; 
Scenario characteristics: Producers whose yield difference (the 
difference between their rate yield and actual reported yield) is--over 
a period of years--significantly above or significantly below other 
producers in the same area for the same crop; 
Summary of the 2002 spot check list: potentially fraudulent claims: 
$15.5. 

Scenario name: All or nothing; 
Scenario characteristics: Insurance agents whose losses on their 
policyholders' crop insurance policies are disproportionately higher 
than those of agents in the same area; 
Summary of the 2002 spot check list: potentially fraudulent claims: 
$12.2. 

Scenario name: Prevented planting; 
Scenario characteristics: Producers who grow crops outside the planting 
schedule required by the Federal Crop Insurance Handbook[A] and file a 
claim for not being able to produce the crop; 
Summary of the 2002 spot check list: potentially fraudulent claims: 
$7.0. 

Scenario name: Excessive yield; 
Scenario characteristics: Producers with crop units that have excessive 
reported yields when compared to those of agents in the same area; 
Summary of the 2002 spot check list: potentially fraudulent claims: 
$36.2. 

Source: RMA. 

[A] The Federal Crop Insurance Handbook contains underwriting standards 
for administering crop insurance policies under RMA's oversight. 

[End of table]

RMA's six regional compliance offices use the data mining query 
results, including the spot check list, to determine which producers 
should be inspected for potential abuse. Once the regional compliance 
offices review the list, they forward it to employees of USDA's Farm 
Service Agency who send notification letters to the producers on the 
list, alerting them to pending inspections. According to RMA officials, 
the notice of a pending inspection is often enough to discourage the 
producers from filing fraudulent claims. Figure 2 depicts this process. 

Figure 2: An Overview of the RMA System: 

[See PDF for image]

[End of figure]

Inputs: 

The RMA data mining effort uses government data covered by systems of 
records notices, including crop insurance data. Data in the RMA system 
not from systems of records include public land, weather, and soils 
data. In addition to government data, RMA uses other publicly available 
information on an as-needed basis. 

Government Data from Systems of Records: 

Crop Insurance Information. Insurance companies participating in the 
program provide crop insurance information to RMA on program 
participants, including producers, insurance agents, and loss 
adjusters. The crop insurance data contains personal identifiers that 
can be linked to program participants, including names, addresses, 
phone numbers, and Social Security numbers. 

Government Data Not from Systems of Records: 

Land Survey Data. The system uses digital maps from the Public Land 
Survey System--regulated by the Bureau of Land Management[Footnote 21]-
-that depict public survey information, such as township locations 
referred to in legal land descriptions. Analysts use this information 
to determine whether there is a discrepancy between a producer's claim 
and land records. 

Weather Data. RMA uses information from public weather records from the 
National Oceanic and Atmospheric Administration to assist in validating 
specific causes of loss for further investigation. 

Soils Data. RMA plans to uses soils data from USDA's Natural Resources 
Conservation Service when determining whether soil on a producer's land 
is acceptable for growing an insured crop. 

Public Data: 

The agency also uses other publicly available information including 
information found on public Web sites. 

Outputs: 

RMA's data mining system produces reports for program investigators on 
producers whose behavior patterns are anomalous. The system also 
produces reports for program managers that include programmatic 
information--such as how a procedural change in the federal crop 
insurance program's policy manual would affect the overall 
effectiveness of the program--and other information on data quality and 
program performance. 

[End of section]

Appendix III: The Citibank Custom Reporting System Used by the 
Department of State: 

The U.S. Department of State (State) contracts with Citibank through 
the General Services Administration's GSA SmartPay[Footnote 22] 
contract to provide State employees with purchase cards.[Footnote 23] 
Under the contract, Citibank provides State and other contracting 
agencies access to the Citibank Custom Reporting System (CCRS)--a 
proprietary tool designed by Citibank. State uses this system to 
analyze transaction data and help prevent fraud, waste, and abuse in 
its purchase card program. The system's inputs include account 
information from State employees and commercial data from transactions 
made by State employees. System outputs include summaries of card 
account holder information and purchases. 

Purpose and Uses: 

The purpose of State's data mining effort is to prevent fraud, waste, 
and abuse in the purchase card program by using CCRS to ensure that 
credit and purchase limits are in place and to conduct spot checks of 
individual purchase card expenditures.[Footnote 24] Officials also use 
the system to improve program performance through the results of simple 
subject-and pattern-based queries.[Footnote 25]

According to State officials, the department uses reports containing 
information on agency purchase card accounts and suspended or cancelled 
accounts. State officials also regularly review a CCRS report that 
summarizes single transaction and monthly spending limits for all 
cardholders to ensure that they are accurate. According to State 
officials, one of the most important tasks accomplished through system 
reports is ensuring that the ratio of cardholders to approving 
officials--a cardholder's immediate supervisor--is low enough for 
expenditures to be effectively reviewed. 

According to State officials, the department also uses reports to 
assist with overall purchase card program management functions. These 
reports provide the ability to track overall purchase card expenditures 
by a number of data elements, including spending by region or embassy, 
or by vendors used by State employees. State also uses CCRS to collect 
and compile statistical information about the program for quarterly 
reports submitted to the Office of Management and Budget. These reports 
include information on the number of current accounts, dollars spent, 
rebate amounts earned, and single purchase and monthly expenditure 
limits for cardholders. 

How It Works: 

The CCRS electronic reporting tool is a Citibank proprietary system. 
The system interfaces with Citibank's Global Data Repository, which 
stores account and transaction data for an 18-month period. A portion 
of the data resulting from the transaction process is replicated in the 
primary system database for use in analysis and report preparation. 
Figure 3 illustrates the transaction process. Reports can be printed or 
downloaded from the system; the presentation of the data can be edited 
within the system, or the data can be downloaded to be analyzed in an 
outside program. 

When using the system, State users can access reports developed in the 
system, including reports of purchase card accounts, suspended or 
cancelled accounts, and summary reports on the vendors State employees 
purchase from. Reports not already established in the system can be 
created by Citibank at the request of agency officials. Figure 3 
illustrates this process. 

Figure 3: An Overview of the Citibank Custom Reporting System: 

[See PDF for image] 

[End of figure] 

Inputs: 

CCRS includes transaction and account data. Account data are collected 
from agency employees, with an account number issued by Citibank; 
transaction data consist of records of purchase card transactions 
conducted by State employees. 

Government Data Not from Systems of Records: 

Account Data. State collects personal information, including name, last 
four digits of the Social Security number, and the cardholder's office 
phone number and mailing and e-mail addresses as part of the purchase 
card application process. According to agency officials, State 
retrieves records by cardholder name. State supplies that information 
to Citibank. State also supplies required account parameters--such as 
single transaction and monthly spending limits--and assigns a unique 
identifying number. Other account information is assigned by 
Citibank.[Footnote 26]

Commercial Data: 

Transaction Data. The amount and level of detail available in the 
transaction data varies depending on the technical capabilities of the 
vendor from whom products are purchased. For example, vendors with the 
most basic capabilities transfer standard commercial transaction data, 
including the total purchase amount, date of purchase, vendor's name 
and location, date the charge or credit was processed, and a reference 
number for each charge or credit. Vendors with more advanced technology 
can provide additional information including, among other things, unit 
cost and quantity, vendor's category code, and sales tax amount. 

Outputs: 

CCRS provides reports on purchase card transactions and account 
information, including a list of all purchase card accounts, a report 
on suspended or cancelled accounts, and reports summarizing 
expenditures by region or by vendor. Many reports in the CCRS system 
are available in a summary form that does not contain personal 
identifiers and in a detailed form containing personal identifiers, 
including account number and name. 

According to State officials, CCRS reports are used within State's 
purchase card office to ensure adequacy and accuracy of compensating 
controls such as credit limits. Reports are also used to track 
expenditures and are supplied to other State offices, such as State's 
Inspector General, for use in analyzing purchases. 

[End of section]

Appendix IV: Internal Revenue Service's Reveal System: 

The Internal Revenue Service[Footnote 27] (IRS) uses the Reveal system 
to detect patterns of criminal activity, analyze intelligence, and 
detect terrorist activities. According to agency officials, IRS uses 
the system to identify financial crime, including individual and 
corporate tax fraud, and terrorist activity. Inputs for Reveal include 
Bank Secrecy Act data, tax information, and counterterrorism 
information. Its outputs include reports containing names, Social 
Security numbers, addresses, and other personal information of 
individuals suspected of financial crime or terrorist activity. 

Purpose and Uses: 

The purpose of the Reveal data mining system is to detect criminal 
activities and patterns in support of IRS's work in investigating 
potential criminal violations of the Internal Revenue Code and related 
financial crimes. This work is conducted by IRS's Criminal 
Investigation unit. According to agency officials, Reveal is used to 
analyze available databases to support ongoing investigations relating 
to financial crime, including individual and corporate tax fraud, and 
terrorist activity. 

The system provides the capability to query data from multiple sources 
in an effort to identify links in the data. System users develop 
reports that include query results and graphical depictions of the 
data. The reports are then provided to field offices, which conduct 
investigations based on the reports' results. 

The system allows users to establish a profile of the actions and 
persons associated with the search subject by allowing the user to 
trace numerous financial transactions between individuals and 
institutions. 

How It Works: 

Reveal uses commercial software to query multiple databases. The system 
provides Criminal Investigation users with a visual depiction of the 
results, and allows them to search on names, Social Security numbers, 
and other information to help narrow their search. Reveal consists of 
(1) a data retrieval and manipulation tool that performs queries and 
(2) a software tool that provides a visual depiction of the query 
results. The retrieval and manipulation tool queries and gathers 
information on large sets of data that reside locally on a relational 
database on the system's database server. This tool allows users to 
sort, group, and export data from multiple information repositories 
simultaneously, including combinations of databases. It also can 
perform two kinds of queries: reactive and proactive. To perform a 
reactive query, the user must provide a known value of an individual or 
entities. To perform a proactive query, the user narrows the search 
criteria to identify groups of individuals and patterns of suspicious 
activity. 

When users narrow their search criteria using the query tool, they can 
use the visualization component to refine and assess the results of the 
queries. The software visualization tool shows relationships between 
data in the queries, and facilitates the discovery of relationships 
among entities, patterns, and trends in the data. It also organizes and 
presents the information in a variety of graphical formats. Figure 4 
depicts this process. 

Inputs: 

Reveal currently uses government system of records data as its only 
type of input. These inputs include (1) Bank Secrecy Act data, (2) tax 
data, and (3) counterterrorism data. These three types of data all 
contain personal information, such as address, Social Security number, 
and date of birth. Data sets are copied and stored locally. 

Figure 4: An Overview of the Reveal Data Mining System: 

[See PDF for image] 

[End of figure] 

Government Data from Systems of Records: 

Bank Secrecy Act Data. Bank Secrecy Act (BSA)[Footnote 28] data are 
accessed remotely from databases owned by the Financial Crimes 
Enforcement Network (FinCEN).[Footnote 29] It consists of Suspicious 
Activity Reports submitted for a transaction related to a possible 
violation of a law or regulation.[Footnote 30] BSA data also include 
Currency Transaction Reports which are filed by casinos for cash 
transactions in excess of $10,000 and by financial institutions for 
payments or transfers in excess of $10,000. 

Tax Data. Tax data used by Reveal include information from IRS's 
Schedule K-1, corporate and individual tax information, and 
applications for employer and tax identification numbers. It is used to 
report a beneficiary's share of income, deductions, and credits from a 
trust or a decedent's estate. 

Counterterrorism Data. Reveal uses counterterrorism data from various 
sources on individuals. 

Outputs: 

Reveal's outputs include reports that contain names, Social Security 
numbers, addresses, and other personal identifiers of individuals 
suspected of financial crimes, including corporate and tax fraud, and 
of terrorist activity. Reports are shared with IRS agents who conduct 
investigations based on the report's results. 

[End of section]

Appendix V: FBI's Foreign Terrorist Tracking Task Force Data Mining 
Effort: 

The data mining effort used by the Federal Bureau of Investigation's 
(FBI) Foreign Terrorist Tracking Task Force analyzes intelligence and 
detects terrorist activities. In support of its responsibilities, the 
task force operates two information systems--one unclassified and one 
classified--that form the basis of its data mining activities. 

Purpose and Uses: 

The purpose of the task force's data mining effort is to analyze 
intelligence and detect terrorist activities.[Footnote 31] The task 
force supports ongoing investigations in law enforcement agencies and 
the intelligence community by using its data mining effort to respond 
to requests for information about foreign terrorists from FBI agents or 
officials from a partner agency.[Footnote 32] For example, task force 
program officials informed us that they occasionally receive 
information about specific threats from the intelligence community or 
law enforcement partners. When such threat information is received, 
they identify potential sources of information that may reveal persons 
capable and motivated to carry out the threat. They then connect this 
information with persons listed in other databases linked to terrorist 
information. The task force then provides the names of high risk 
individuals whose characteristics match the threat profile to FBI field 
agencies and to Joint Terrorist Task Force(s). 

According to task force officials, analysts conduct research and 
analysis based on requests and provide a report of the results to the 
requesters and to affected agencies, as appropriate. For example, 
according to agency officials, the task force received a list of 
possible suicide bombers from a foreign government. Through analysis, 
the task force determined that several of the bombers had names and 
other identifiers that were similar to those of individuals currently 
in the United States. The task force provided the information to law 
enforcement investigators to determine whether the individuals 
identified were the same as those on the list of suicide bombers 
provided by the foreign government. 

How It Works: 

Task force analysts use two systems together in their data mining 
effort: one sensitive but unclassified, and one classified. After 
receiving a request for information about a threat or person of 
interest, task force leadership routes the information to an 
appropriate analyst. Analysts initially search within the task force's 
existing data, including certain immigration records, to determine 
whether they already have information relevant to the request. 

Task force analysts use several analytical tools to help search for and 
analyze information in the systems. According to task force officials, 
the analysts' primary query tool is the Query Tracking and Initiation 
Program. FBI developed this program to allow users to search the 
systems using, among other things, multiple variants or 
transliterations of names. It also allows analysts to search within and 
between different data sets. 

The unclassified system serves as the initial repository for 
unclassified data. Through this system, task force analysts can use the 
query tracking program to submit queries on individuals to commercial 
databases to find any relevant information. The resulting information 
is returned to the unclassified system, where analysts can conduct 
analysis using query tracking and other tools. 

The classified system contains law enforcement and intelligence data, 
including FBI case files. Information initially collated in the 
unclassified system is loaded into the classified system daily. 
However, if analysts need expedited results, they can perform an 
initial analysis using data contained in the unclassified system and 
then conduct a more detailed analysis once data are loaded into the 
classified system. The two systems are illustrated in figure 5. 

Figure 5: An Overview of FBI's Foreign Terrorist Tracking Task Force 
Data Mining Effort: 

[See PDF for image] 

[End of figure] 

Inputs: 

FBI officials reported that the task force's systems contain multiple 
sets of data from multiple government and nongovernment sources, some 
of which were acquired on a one-time basis and others that are 
regularly updated. Data from outside sources, including nonpartner 
government agencies and commercial entities, are typically acquired on 
an as-needed basis. 

Government Data from Systems of Records: 

Twenty-nine of the task force's government data sets are part of a 
system of records. Many of these data sets come from within the 
Department of Justice. Other agencies also supply the task force with 
data, including information from immigration records, from the Federal 
Aviation Administration, and from Customs and Border Protection. 
According to program officials, most data that come from sources 
outside the Department of Justice are acquired under a provision of the 
Privacy Act that allows a law enforcement agency to request certain 
data from a government entity for law enforcement purposes. According 
to agency officials, outside agencies provided their data sets to FBI 
on the basis of formal requests. 

Government Data Not from Systems of Records: 

The task force's data mining effort receives one set of government data 
that is not part of a system of records because the information does 
not contain personal identifiers. 

The task force data mining system also contains 15 data sets that 
include information on criminal aliens, intelligence data and alerts, 
and various watchlists. FBI officials responsible for the task force 
were unaware of whether these data are part of a system of records, but 
said that the data were supplied to the task force under the same 
conditions as other government data. 

Commercial Data: 

The task force data mining effort uses data from several commercial 
sources,[Footnote 33] many of which are updated frequently. According 
to FBI officials, analysts can query commercial sources during the 
course of an investigation, if needed. Program officials noted that 
analysts request information from commercial sources using personal 
identifiers. 

Data from International Entities: 

The task force received 4 data sets from Interpol (an international 
police organization) on wanted persons, stolen property and other 
intelligence. 

Outputs: 

The task force's outputs include reports that contain personal 
identifiers and other information that is relevant to the initial 
request. Reports are shared with the requesting entity or agent and as 
needed with partner agencies. Agents conduct investigations based on 
the results of the reports. 

[End of section]

Appendix VI: Small Business Administration's Loan/Lender Monitoring 
System: 

The Small Business Administration (SBA) contracted with Dun & 
Bradstreet to provide information and analytical capabilities that 
assist SBA in managing credit risks in two major business loan 
guarantee programs. The Loan/Lender Monitoring System (L/LMS) combines 
SBA data with private sector data on businesses and consumers to 
predict future performance of outstanding business loans. 

Purpose and Uses: 

The purpose of L/LMS is to identify, measure, and manage risk in two of 
its business loan programs. It does this specifically by developing 
predictive ratings that allow SBA to improve the performance of two of 
its business loan programs--the 7(a) loan program[Footnote 34] and 504 
program[Footnote 35]--using risk management principles. The system 
analyzes SBA loan data, Dun & Bradstreet business data, and data 
provided by subcontractors, including consumer credit bureau 
information and business credit scores. It uses a commercially 
available suite of scorecards to produce business credit scores that 
predict the likelihood of an SBA loan becoming severely delinquent over 
the next 18 to 24 months--a leading indicator of default.[Footnote 36] 
It also contains trends databases that provide historical data on 
approximately one dozen performance and credit risk fields on each 
outstanding loan. 

Finally, the system contains lender databases that provide information 
about individual lenders that can be compared to the information about 
a lender's peers. 

How it Works: 

Dun & Bradstreet and Fair Isaac use the input data in a proprietary 
scoring process to generate a predictive risk score for each 
outstanding loan. In addition, Dun & Bradstreet appends its commercial 
demographic and risk data to the electronic records of all outstanding 
SBA business loans, after removing any personal identifiers. Dun & 
Bradstreet then transfers this information to a module where it can be 
accessed by SBA. None of the data transferred from Dun & Bradstreet to 
SBA contains personal identifiers. 

SBA can use the L/LMS to view its entire business loan or lender 
portfolio and can perform analysis by various data elements, including 
dollars outstanding, lender, lender corporate family, SBA region, 
industry sector, and loan type. According to SBA officials, the agency 
uses system-produced reports to help them determine which lenders' SBA 
business loan portfolios are most at risk of default, driving the 
selection of lenders for further review. Figure 6 depicts this process. 

Inputs: 

The L/LMS uses two kinds of input data: data from government systems of 
records and data from commercial sources. The data include information 
on businesses and individuals. 

Government Data from Systems of Records: 

SBA Loan Records. SBA electronically transfers about 10 data files 
monthly to Dun & Bradstreet. These files contain existing data on 
individual 7(a) and 504 SBA business loans and on the lending 
institutions that manage the loans and include information on small 
businesses; names, addresses, and phone numbers, as well as limited 
information about business principals, including personal identifiers. 

Figure 6: An Overview of the Loan/Lender Monitoring System: 

[See PDF for image] 

[End of figure] 

Commercial Data: 

Credit Evaluation Data. The L/LMS uses several sources of commercial 
data, including Dun & Bradstreet demographic and risk data from its 
global business database, consumer bureau data on the business 
principals (e.g., information relating to recent delinquencies), and 
predictive risk scores developed by Dun & Bradstreet and Fair 
Isaac.[Footnote 37] This information can contain personal identifiers. 

Outputs: 

The L/LMS analyzes the data to generate reports on each lender's 
portfolio. SBA also creates aggregate reports that evaluate loans by 
portfolio value, projected risk, and historical performance trends. 
According to SBA officials, system reports are currently used by 
program officials to support business loan, lender, and portfolio 
monitoring efforts. 

[End of section]

Appendix VII: Detailed Assessments of Agency Actions to Address 
Security Requirements in Data Mining Efforts: 

The Privacy Act requires agencies to establish appropriate 
administrative, technical, and physical safeguards to ensure the 
security of records and to protect against any anticipated threats or 
hazards to their security that could result in substantial harm, 
embarrassment, inconvenience, or unfairness to any individual about 
whom information is maintained. Although the act does not specify the 
procedures agencies should employ to ensure information security, 
subsequent legislation and guidance from the Office of Management and 
Budget (OMB) and the National Institute of Standards and Technology 
(NIST) provide specific procedures that agencies should take to protect 
the security of information. 

For example, the Federal Information Security Management Act (FISMA) 
requires that agencywide information security programs include detailed 
plans for providing adequate information security for networks, 
facilities, and systems or groups of information systems, as 
appropriate. OMB requires that agencies prepare IT system security 
plans consistent with NIST guidance, and that these plans contain 
specific elements, including rules of behavior for system use, required 
training in security responsibilities, personnel controls, technical 
security techniques and controls, continuity of operations, incident 
response, and system interconnection.[Footnote 38] In addition, OMB 
requires that agency management officials formally authorize their 
information systems to process information and thereby accept the risk 
associated with their operation. This management authorization 
(accreditation) is to be supported by a formal technical evaluation 
(certification) of the management, operational, and technical controls 
established in an information system's security plan. NIST guidelines 
detail the requirements for certification and accreditation, including 
the requirement that the certification documents include the system 
security plan, risk assessment, and tested contingency plan.[Footnote 
39] In addition, NIST guidance on recommended security controls for 
federal information systems requires agencies to develop, implement, 
and test contingency plans for their systems and risk assessments. 

Table 10 lists each of the security requirements that we evaluated and 
the results of our evaluation for each of the five data mining efforts 
included in this report. 

Table 10: Questions Related to Agency Actions Safeguarding and Ensuring 
the Quality of Records Containing Personal Information: 

Question: Has the agency performed a risk assessment to determine the 
information system vulnerabilities, identify threats, and develop 
countermeasures to those threats? 
Yes: ACDE; 
No: B. 

Question: Has the agency developed a security plan for each system? 
Yes: CD; 
Partial: AE; 
No: B. 

Question: Does the plan address--rules of the system? 
Yes: ACDE; 
No: B. 

Question: Does the plan address--training? 
Yes: ACDE; 
No: B. 

Question: Does the plan address--personnel controls? 
Yes: ACDE; 
No: B. 

Question: Does the plan address--incident response capability? 
Yes: CD; 
Partial: AE; 
No: B. 

Question: Does the plan address--system interconnection? 
Yes: ACDE; 
No: B. 

Question: Has the agency had the system certified and accredited by 
management? 
Yes: ADE; 
No: B; 
Exempt: C[A]. 

Question: Did the certification documentation include an approval 
document including a statement of risk acceptance? 
Yes: ADE; 
No: B; 
Exempt: C[A]. 

Question: Has the agency performed testing and evaluation of the data- 
mining system(s)? 
Yes: DE; 
Partial: AC[A]; 
No: B. 

Question: Was the testing and evaluation--conducted no less than 
annually? 
Yes: DE; 
Partial: AC[A]; 
No: B. 

Question: Was the testing and evaluation--conducted using NIST Special 
Publication 800-26 or appropriate alternative? 
Yes: DE; 
Partial: A; 
No: BC. 

Question: Was the testing and evaluation--conducted using an element of 
internal penetration or vulnerability testing? 
Yes: CDE; 
No: AB. 

Question: Does the agency have a tested contingency plan for the 
system? 
Yes: CE; 
Partial: AD; 
No: B. 

Question: Did the agency take steps to ensure the accuracy, relevance, 
timeliness, and completeness of the data it maintains? 
Yes: B; 
Partial: E; 
No: A; 
Exempt: CD. 

Legend: 

A: RMA's data mining effort: 

B: State's Citibank Custom Reporting System: 

C: IRS's Reveal effort: 

D: FBI's Foreign Terrorist Tracking Task Force effort: 

E: SBA's Lender/Loan Monitoring System: 

Source: GAO analysis of agency information. 

[A] The IRS Reveal effort became operational in February 2005 and has 
interim authority to operate-not full certification and accreditation. 
IRS is currently testing the system. 

[End of table]

[End of section]

Appendix VIII: Comments from the U.S. Department of Agriculture: 

USDA: 

United States Department of Agriculture:
Risk Management Agency:
1400 Independence Avenue, SW:
Stop 0806: 
Washington, DC 20250-0806: 

Ms. Linda Koontz:
Director, Information Management: 
Government Accountability Office: 
441 G Street, NW Rm. 4075: 
Washington, DC 20548: 

JUL 22 2005: 

Dear Ms. Koontz: 

Attached is the Risk Management Agency's (RMA) response to your draft 
report titled, "Data Mining: Agencies Have Taken Key Steps to Protect 
Privacy in Selected Efforts, but Significant Compliance Issues Remain." 
In addition to the attached written response, RMA also provided 
technical comments to GAO via email. RMA appreciates the opportunity to 
provide comments. If you have any questions regarding our response, 
please contact Heather Manzano at 202-690-5886. 

Sincerely, 

Signed by: 

Ross J. Davidson, Jr.: 
Administrator:
Risk Management Agency: 

Attachment: 

The Risk Management Agency Administers And Oversees All Programs 
Authorized Under The Federal Crop Insurance Corporation: 

An Equal Opportunity Employer: 

U.S. Department of Agriculture Statement of Action on the U.S. 
Government Accountability Office Draft Report GAO-05-866 "DATA MINING: 
Agencies Have Taken Key Steps to Protect Privacy in Selected Efforts, 
but Significant Compliance Issues Remain"

July 22, 2005: 

Data mining is an effort that is being used increasingly by the federal 
government. This effort involves the use of personal information, which 
can originate from various sources. GAO was asked to describe the 
characteristics of five federal data mining efforts and to determine 
whether agencies are providing adequate privacy and security protection 
for the information systems and the individuals potentially affected by 
these data mining efforts. 

As a result of the study, GAO developed six recommendations for the 
United States Department of Agriculture (USDA) specific to the Risk 
Management Agency (RMA). The following addresses those recommendations. 

GAO Recommendation 1: 

Provide the required Privacy Act notices to individuals, including 
producers, insurance agents, and adjusters, when personal information 
is collected from them. 

USDA Response: 

RMA will issue an Informational Memorandum to private insurance 
companies who deliver the Federal crop insurance program to ensure they 
are aware of their responsibilities regarding notification at the point 
of personal information data collection, as required in the Privacy 
Act. 

GAO Recommendation 2: 

Apply the appropriate information security measures defined in OMB and 
NIST guidance to the systems used in the RMA data mining effort, 
including the development of a system security plan, a tested 
contingency plan, and regular testing and evaluation of the systems 
used in this effort. 

USDA Response: 

RMA has applied the appropriate information security measures defined 
in OMB and NIST guidance to the systems used in the RMA data mining 
effort. This project has a system security plan, and RMA performs 
regular testing and evaluation of the system. RMA will be testing their 
existing data mining contingency plan before the end of the year. 

GAO Recommendation 3: 

Develop and implement procedures that ensure the accuracy, relevance, 
timeliness, and completeness of personal information used in the RMA 
data mining effort to make determinations about individuals. 

USDA Response: 

RMA has procedures in place to ensure accuracy of information used in 
the data mining effort. RMA performs a series of edits and validations 
on data submitted by the insurance companies. Accepted data is then 
sent to the Data Warehouse on a monthly basis. This data is used in 
various scenario analyses. Reports are generated from these analyses 
and are provided to RMA for review and action in accordance with 
procedures. Any data observations that appear anomalous through the 
data mining effort, are reviewed by RMA prior to a report being issued. 
If appropriate, RMA may make changes to the edit/validation process 
based on these observations. The goal of this data review process is to 
clarify and, if possible, resolve any data discrepancies prior to 
generating a report. 

GAO Recommendation 4: 

Revise the privacy impact assessment for the RMA data mining effort to 
comply with OMB guidance, including analyses of the intended use of the 
information it collects, with whom the information will be shared, how 
the information is to be secured, opportunities for impacted 
individuals to comment, and the choices made by the agency as a result 
of the assessment. 

USDA Response: 

RMA is finalizing the template that will be used for all of its privacy 
impact analyses (PIA). All new PIAs and updated reviews will utilize 
this new format. 

GAO Recommendation 5: 

Have the completed privacy impact assessment approved by the Chief 
Information Officer or equivalent official. 

USDA Response: 

As in the past, the RMA Chief Information Officer and his designated 
reviewers will evaluate and review the completed PIAs. The new PIA 
format requires the signatures of the CIO, Freedom of Information Act 
(FOIA) Officer, system owner, and project manager. 

GAO Recommendation 6: 

Make the completed privacy impact assessment available to the public, 
as appropriate. 

USDA Response: 

The RMA CIO, in cooperation with the FOIA Officer, will make the PIAs 
available to the public, as appropriate. 

General Comments: 

RMA agrees with the majority of GAO's recommendations and believes that 
the agency has taken steps to put many of those recommendations into 
action. However, RMA does not agree with the numerous statements that 
indicate that the agency did not take steps to ensure the accuracy, 
relevance, timeliness, and completeness of the data it maintains and 
uses to make determinations about individuals. Program officials are 
aware of the need to ensure the quality of data, and RMA's upfront edit 
and validation does provide reasonable assurance of the accuracy and 
adequacy of the data prior to sending it to the data warehouse. 

In addition, RMA respectfully disagrees with GAO's assessment that RMA 
has only partially achieved the completion of a plan that addresses 
incident response capability. During the on-site review, GAO was 
provided a copy of BMA's policy that addresses this subject. It was 
also addressed in the data mining system security plan. 

[End of section]

Appendix IX: Comments from the Department of the Treasury: 

DEPARTMENT OF THE TREASURY: 
WASHINGTON, D.C. 20220: 

JUL 29 2005: 

Ms. Linda D. Koontz:
Director, Information Management: 
Government Accountability Office: 
Washington, DC: 

Dear Ms. Koontz: 

Thank you for the opportunity to review Government Accountability 
Office (GAO) Draft Report GAO-05-866, Data Mining: Agencies Have Taken 
Steps to Protect Privacy in Selected Efforts, but Significant 
Compliance Issues Remain. The Department's response to the specific 
recommendations made in the Report to the Secretary of the Treasury 
follows: 

Recommendation 1: Apply the appropriate information security measures 
defined in the Office of Management and Budget (OMB) and the National 
Institute of Standards and Technology (KIST) guidance to the systems 
used in the Reveal data mining effort: 

The Department of the Treasury's Internal Revenue Service (IRS) 
security procedures are in compliance with OMB, NIST, and Treasury 
guidance. Reveal is a Commercial Off-the-Shelf (COTS) software product 
and is a pilot system which resides on the Criminal Investigation 
infrastructure or System Domain General Support System (GSS). For the 
Reveal System, IRS granted an Interim Authorization to Operate (IATO), 
following the guidance outlined in the NIST 800-37, Guide for the 
Security Certification and Accreditation of Federal Information 
Systems. In addition, IRS granted an IATO for the infrastructure which 
is currently in the accreditation phase of the NIST compliant 
Certification & Accreditation (C&A) process. 

Recommendation 2: Revise the privacy impact assessment for the IRS 
Reveal system to comply with OMB guidance, including analyses of the 
information to be collected, the purposes of the collection, the 
intended use of the information, how the information is to be secured, 
and opportunities for impacted individuals to comment. 

Since the Reveal system is in pilot, a new PIA is required by IRS 
policy before launching the system into full deployment. At that time, 
the IRS will assess and document changes or modifications to the system 
as a combination of the pilot results, the PIA, and the security 
reviews and certification. Prior to conducting this new PIA, the IRS 
will be revising the current PIA to incorporate all OMB guidance, in 
particular adding the question of what choices were made as a result of 
conducting the PIA. 

However, it is also important to note that the IRS PIA is far more 
comprehensive in its questions and assessments than the OMB guidance. 
Since 1995, the IRS has been completing PIA for its systems. The IRS 
PIA is more comprehensive in its questions and assessments than the OMB 
guidance (19 questions compared with 8 on the OMB PIA). 

Finally, we refer you to the IRS Reveal PIA question 15 which describes 
and analyzes why the information was collected, the purpose of the 
collection, and the intended use of the information. Reveal is an IRS 
Criminal Investigation Division analytical application that provides 
users with an enhanced capability to access, analyze, and interpret 
large volumes of disparate data sources, through a single-point of 
access, for the purpose of identifying and developing criminal cases. 
The system is used to identify potential criminal investigations of 
individuals or groups in support of the overall IRS Mission. Reveal 
supports the Criminal Investigation mission by identifying persons or 
organizations involved in potential criminal violations of the Internal 
Revenue Code and related financial crimes in a manner that fosters 
confidence in the tax system and compliance with the law. 

We also direct you to the IRS PIA questions 8 through 13 in response to 
how the information was secured. These questions established a strong 
framework of administrative and technical controls. In addition, the 
IRS PIA is an integral component of the security certification of all 
new IRS systems. 

Recommendation 3: Make the completed privacy impact assessment 
available to the public, as appropriate. 

The current Reveal PIA is available on the IRS Website. Revisions to 
the PIA will be posted to the public website as well. 

Sincerely, 

Signed by: 

Ira L. Hobbs:
Chief Information Officer: 

[End of section]

Appendix X: Comments from the Department of State: 

United States Department of State: 
Assistant Secretary and Chief Financial Officer: 
Washington, D.C. 20520: 

Ms. Jacquelyn Williams-Bridgers:
Managing Director:
International Affairs and Trade: 
Government Accountability Office: 
441 G Street, N.W.
Washington, D.C. 20548-0001: 

JUL 25 2005: 

Dear Ms. Williams-Bridgers: 

We appreciate the opportunity to review your draft report, "DATA 
MINING: Agencies Have Taken Key Steps to Protect Privacy in Selected 
Efforts, but Significant Compliance Issues Remain," GAO Job Code 
310715. 

The enclosed Department of State comments are provided for 
incorporation with this letter as an appendix to the final report. 

If you have any questions concerning this response, please contact 
Margaret Colaianni, Procurement Analyst, Bureau of Administration, at 
(202) 736-4985. 

Sincerely,

Signed by: 

Sid Kaplan (Acting): 

cc: GAO - Marcia Washington; 
A - Frank Coulter; 
State/OIG - Mark Duda: 

Department of State Comments on GAO Draft Report Data Mining: Agencies 
Have Taken Key Steps to Protect Privacy in Selected Efforts, but 
Significant Compliance Issues Remain (GAO-05-866, GAO Code 310715): 

1. Introduction: 

Thank you for allowing us to comment on your draft report entitled, 
"Data Mining: Agencies Have Taken Key Steps to Protect Privacy in 
Selected Efforts, but Significant Compliance Issues Remain". We have 
responded below to the single recommendation to State. In addition, we 
have also recommended some changes to the text of the draft report 
(highlighted in bold with italics) that we believe will enhance the 
consistency between the report and its recommendations. 

We appreciate that you have not recommended that the Department conduct 
a Privacy Impact Assessment with respect to the Citibank purchase card 
system. As your report points out, OMB's E-Gov implementing guidelines 
specify that agencies need not prepare Privacy Impact Assessments for 
systems "where information relates to internal government operations." 
(Similarly, the OMB guidelines clarify that Privacy Impact Assessments 
are required only when an agency is (a) "developing or procuring IT 
systems .. that collect, maintain or disseminate information in 
identifiable form about members of the public" or (b) collecting 
information "for 10 or more persons excluding .. employees of the 
federal government. . . ." (Emphasis added.)) Many of our recommended 
changes reflect our efforts to clarify that the Department is not 
required to conduct a Privacy Impact Assessment of the Citibank system. 

We also appreciate that you have not made any recommendations about the 
Department's compliance with the Federal Information Security 
Management Act of 2002 (FISMA) vis-a-vis the Citibank purchase card. It 
is not clear that FISMA necessarily applies to the Citibank system. 

II. Department of State action in response to GAO recommendation: 

Notify purchase card participants of the legal basis under which the 
Department collects their personal information, as rewired. In response 
to this recommendation, the Department of State will take the necessary 
steps to notify purchase card participants of the legal basis under 
which the Department collects their personal information necessary for 
the operation and management of our worldwide Purchase Card program. 

[End of section]

Appendix XI: Comments from the Small Business Administration: 

U.S. SMALL BUSINESS ADMINISTRATION: 
WASHINGTON, D.C. 20416: 

Linda D. Koontz: 
Director: 
Information Management Issues:
U.S. Government Accountability Office: 
Washington, DC 20548-0001: 

Dear Ms. Koontz: 

Thank you for the opportunity to review and comment on the Government 
Accountability Office's (GAO) draft report on Data Mining: Agencies 
Have Taken Key Steps to Protect Privacy in Selected Efforts, but 
Significant Compliance Issues Remain (GAO-05-866). We appreciate GAO's 
acknowledgement that the U.S. Small Business Administration (SBA) has 
substantially complied with existing guidance and regulatory 
requirements governing privacy and information security in operating 
our Loan and Lender Monitoring System. 

With regard to the three recommendations contained in the draft report, 
SBA provides the following response: 

1. GAO Recommendation: Amend the system of records notice regarding its 
data mining effort to clearly identify the individual responsible for 
the effort, the process by which individuals can request notification 
that the system includes records about them, and the procedures 
individuals should use to review records pertaining to them. 

SBA Response: SBA believes the Agency System of Records is 
comprehensive but will review the system of records for the Loan System 
to determine if clarifications are necessary. 

GAO Recommendation: Complete a privacy impact assessment for the data 
mining effort that complies with OMB guidance, including analyses of 
the information to be collected, the purposes of the collection, the 
intended use of the information, how the information is to be secured, 
opportunities for impacted individuals to comment, and the choices made 
by the agency as a result of the assessment. 

SBA Response: As noted in the draft report, SBA plans to issue a 
revised privacy impact assessment (PIA) for the Loan and Lender 
Monitoring System later this fiscal year that will address GAO's 
recommendation. 

3. GAO Recommendation: Make the completed privacy impact assessment 
available to the public, as appropriate. 

SBA Response: As with the current PIA for SBA's Loan and Lender 
Monitoring System, the revised assessment will be available to the 
public, as appropriate. 

In addition, certain factual clarifications were identified. They are 
summarized in the enclosure with this letter. 

We appreciate the opportunity to work with your staff during the 
conduct of this audit. Should you have any questions, please contact C. 
Edward Rowe, Assistant Administrator for Congressional and Legislative 
Affairs at (202) 205-6700. 

Sincerely,

Signed for: 

Michael W. Hager: 
Associate Deputy Administrator for Office of Capital Access: 

Enclosure: 

[End of section]

Appendix XII GAO Contact and Staff Acknowledgments: 

GAO Contact: 

Linda D. Koontz (202) 512-6240: 

Acknowledgments: 

In addition to the contact named above, Barbara Collier, Neil Doherty, 
Mirko Dolak, Nancy Glover, Alison Jacobs, Kathleen S. Lovett, David 
Plocher, James R. Sweetman, Jr., and Marcia Washington made key 
contributions to this report. 

(310715): 

FOOTNOTES

[1] For purposes of this report, we define "personal information" 
consistent with the Privacy Act's definition of a "record," which 
includes all information associated with an individual and includes 
both identifying information and nonidentifying information. 
Identifying information, which can be used to locate or identify an 
individual, includes name, aliases, Social Security number, e-mail 
address, driver's license identification number, and agency-assigned 
case number. In this report, we refer to identifying personal 
information as personal identifiers. Nonidentifying personal 
information includes age, education, finances, criminal history, 
physical attributes, and gender. 

[2] We selected efforts that were intended to meet at least one of the 
following purposes: improving service or performance; detecting fraud, 
waste, and abuse; detecting criminal activities or patterns; or 
analyzing intelligence and detecting terrorist activities. 

[3] GAO, Data Mining: Federal Efforts Cover a Wide Range of Uses, GAO- 
04-548 (Washington, D.C.: May 4, 2004). 

[4] For more information on the uses of data mining in GAO audits, see 
GAO, Data Mining: Results and Challenges for Government Programs, 
Audits, and Investigations, GAO-03-591T (Washington, D.C.: Mar. 25, 
2003). 

[5] GAO-04-548. 

[6] U.S. Department of Health, Education, and Welfare, Records, 
Computers and the Rights of Citizens, Report of the Secretary's 
Advisory Committee on Automated Personal Data Systems (July 1973). 

[7] Markle Foundation, Creating a Trusted Network for Homeland Security 
(New York: December 2003). 
http://www.markletaskforce.org/Report2_Full_Report.pdf (downloaded Mar. 
28, 2005). 

[8] 5 U.S.C. § 552a (a)(5). 

[9] Federal Information Security Management Act of 2002, Title III, E- 
Government Act of 2002, Pub. L. No. 107-347 (Dec. 17, 2002). 

[10] E-Government Act of 2002, Pub. L. No. 107-347 (Dec. 17, 2002), 
sec. 208. 

[11] Office of Management and Budget, Memorandum M-03-22, Guidance for 
Implementing the Privacy Provisions of the E-Government Act of 2002 
(Washington, D.C.: Sept. 26, 2003). 

[12] GAO, Privacy Act: OMB Leadership Needed to Improve Agency 
Compliance, GAO-03-304 (Washington, D.C.: June 30, 2003). 

[13] The agency rules claiming exemptions from designated provisions of 
the Privacy Act are published in the Code of Federal Regulations at 7 
CFR §1.123 (RMA), 28 CFR §16.96 (FBI), and 31 CFR §1.36 (IRS). 

[14] As indicated in table 3, SBA's effort also uses information 
provided by commercial sources. However, the commercial information 
provided to SBA does not include personal information on individuals. 

[15] The Office of the Comptroller of the Currency, a component of the 
Department of the Treasury, is responsible for oversight of nationally 
chartered banks and state and federally chartered savings associations. 
The office is responsible for auditing federally insured institutions 
under its jurisdiction annually. The audit, in part, evaluates the 
institution's safety and soundness; determines compliance with 
applicable laws, rules, and regulations; and ensures that it maintains 
capital commensurate with its risk. 

[16] The recent incident involved Bank of America's loss of data 
regarding the government travel card program. 

[17] Under OMB guidance, an agency may decide not to make the PIA 
document or summary publicly available to the extent that publication 
would raise security concerns or reveal classified (i.e., national 
security) or sensitive information (e.g., potentially damaging to a 
national interest, law enforcement effort, or competitive business 
interest) contained in an assessment. 

[18] See GAO, Data Mining: Federal Efforts Cover a Wide Range of Uses, 
GAO-04-548 (Washington, D.C.: May 4, 2004). 

[19] The Risk Management Agency is a component of the U.S. Department 
of Agriculture. 

[20] The federal crop insurance program is designed to protect farmers 
from financial losses caused by events such as droughts, floods, 
hurricanes, and other natural disasters as well as losses resulting 
from a drop in crop prices. RMA administers and oversees the federal 
crop insurance program. 

[21] The Bureau of Land Management is a Department of the Interior 
agency that manages 264 million surface acres of public lands located 
primarily in 12 western states, including Alaska. 

[22] In 1998, GSA awarded contracts to five major banks through the GSA 
SmartPay program to provide federal agencies with purchase cards as 
well as travel cards and cards for fleet-related expenses. The 
participating banks are Bank of America, Bank One (now J.P. Morgan 
Chase), Citibank, Mellon Bank, and U.S. Bank. Individual agencies 
select one of the participating banks and issue a task order to the 
bank based on the terms of the master contract with GSA. 

[23] Purchase cards are bank charge cards used primarily for purchases 
totaling less than $2,500. 

[24] State is the lead U.S. foreign affairs agency and operates more 
than 250 posts around the world. State employees use purchase cards to 
make work-related purchases in support of State's mission. 

[25] Users can use subject-based queries to receive reports on an 
individual account's expenditures and can use pattern-based queries to 
determine, among other things, which vendors employees make purchases 
from. 

[26] Account data from the purchase card program are not covered by a 
system of records notice. See p. 17 for more information. 

[27] The Internal Revenue Service is a bureau of the Department of the 
Treasury. 

[28] The Bank Secrecy Act requires banks and other financial 
institutions to keep records and file reports that are useful in 
criminal, tax, and regulatory investigations or proceedings. 

[29] FinCEN's mission is to safeguard the financial system from 
financial crime, and abuses including terrorist financing, money 
laundering, and other illicit activity. 

[30] Suspicious Activity Reports are filed by (1) financial 
institutions, (2) money service businesses, (3) security and futures 
industries, and (4) casinos and card clubs. 

[31] The task force's mission is to assist federal law enforcement and 
intelligence agencies in locating foreign terrorists and their 
supporters who are in or have visited the United States, and to provide 
information to other law enforcement and intelligence community 
agencies that can lead to their surveillance, prosecution, or removal. 

[32] The task force's partner agencies include Immigration and Customs 
Enforcement, the Department of Defense Counterintelligence Field 
Activity office, the Office of Personnel Management, and members of the 
intelligence community. 

[33] Commercial data are maintained by private companies and can 
include personally identifiable information that either identifies an 
individual or is directly attributed to an individual, such as name, 
address, and telephone number. 

[34] Under the 7(a) loan program, SBA can provide guarantees on loans 
made by participating lenders authorized by SBA. The 7(a) program is 
intended for small business borrowers who could not otherwise obtain 
credit under suitable terms and conditions from the private sector 
without an SBA guarantee. SBA guarantees approximately $14 to $16 
billion lender-originated 7(a) loans each year, of which SBA guarantees 
only approximately $9 to $10 billion each year. Upon default by a 
borrower, the participating lender may request that SBA purchase the 
guaranteed portion of a loan. 

[35] The 504 program provides long-term, fixed-rate financing to small 
businesses for expansion or modernization, primarily for real estate 
and major assets such as heavy equipment. The 504 financing is 
delivered through nonprofit corporations established to contribute to 
the economic development of their communities. SBA guarantees about $4 
billion in 504 loans annually. 

[36] A loan is severely delinquent when payments on the loan are past 
due by 60 or more days. 

[37] Fair Isaac is a company that provides business and consumer 
analytical services, including credit ratings. 

[38] NIST, The Security Certification and Accreditation of Federal 
Information Systems, Special Publication 800-37 (May 2004) and Office 
of Management and Budget, Management of Federal Information Resources, 
Circular No. A-130, Revised, Transmittal Memorandum No. 4, Appendix 
III, "Security of Federal Automated Information Resources" (Nov. 28, 
2000). 

[39] NIST, Guide for the Security Certification and Accreditation of 
Federal Information Systems, Special Publication 800-37 (May 2004). 

GAO's Mission: 

The Government Accountability Office, the investigative arm of 
Congress, exists to support Congress in meeting its constitutional 
responsibilities and to help improve the performance and accountability 
of the federal government for the American people. GAO examines the use 
of public funds; evaluates federal programs and policies; and provides 
analyses, recommendations, and other assistance to help Congress make 
informed oversight, policy, and funding decisions. GAO's commitment to 
good government is reflected in its core values of accountability, 
integrity, and reliability. 

Obtaining Copies of GAO Reports and Testimony: 

The fastest and easiest way to obtain copies of GAO documents at no 
cost is through the Internet. GAO's Web site ( www.gao.gov ) contains 
abstracts and full-text files of current reports and testimony and an 
expanding archive of older products. The Web site features a search 
engine to help you locate documents using key words and phrases. You 
can print these documents in their entirety, including charts and other 
graphics. 

Each day, GAO issues a list of newly released reports, testimony, and 
correspondence. GAO posts this list, known as "Today's Reports," on its 
Web site daily. The list contains links to the full-text document 
files. To have GAO e-mail this list to you every afternoon, go to 
www.gao.gov and select "Subscribe to e-mail alerts" under the "Order 
GAO Products" heading. 

Order by Mail or Phone: 

The first copy of each printed report is free. Additional copies are $2 
each. A check or money order should be made out to the Superintendent 
of Documents. GAO also accepts VISA and Mastercard. Orders for 100 or 
more copies mailed to a single address are discounted 25 percent. 
Orders should be sent to: 

U.S. Government Accountability Office

441 G Street NW, Room LM

Washington, D.C. 20548: 

To order by Phone: 

Voice: (202) 512-6000: 

TDD: (202) 512-2537: 

Fax: (202) 512-6061: 

To Report Fraud, Waste, and Abuse in Federal Programs: 

Contact: 

Web site: www.gao.gov/fraudnet/fraudnet.htm

E-mail: fraudnet@gao.gov

Automated answering system: (800) 424-5454 or (202) 512-7470: 

Public Affairs: 

Jeff Nelligan, managing director,

NelliganJ@gao.gov

(202) 512-4800

U.S. Government Accountability Office,

441 G Street NW, Room 7149

Washington, D.C. 20548: