Big Data 101: Using Large-Scale Data Mining to Find Fraud
You may have heard the term “big data” or “data mining,” but what do those terms mean? Today’s WatchBlog sheds light on how GAO analyzes large amounts of data to identify instances of potential improper payments or fraud.
What Is Data Mining and How Does GAO Use It? Data mining allows us to quickly identify relevant patterns in large databases, typically compiled from multiple sources. We’ve used this technique multiple times to identify potential improper payments or fraud on a large scale. For example, we’ve used data mining to
- Identify people who may be receiving multiple federal payments. For example, by comparing data from different sources—such as benefit rolls for various programs and death files—we identified 59,251 individuals who received concurrent disability payments totaling $3.5 billion in fiscal year 2013, and over 2,600 people with potentially invalid identifying information who received $21 million in relief following Hurricane Sandy;
- Identify outliers or other particular patterns. For example, we found that about 83,000 Department of Defense employees and contractors who held or were determined eligible for secret, top secret, or other clearances had more than $730 million in unpaid federal tax debt as of June 30, 2012; and
- Create maps that allow us to easily determine whether there are suspect patterns. For example, the map below shows an expected distribution of greater numbers of people living along the coast receiving relief funds following Hurricane Sandy.
(Excerpted from interactive graphic in GAO-15-15)
What’s Next for Big Data?
In January 2013, we hosted a forum on using data analytics for oversight and law enforcement agencies. Participants discussed how data mining can also be used to help prevent fraud. For example, predictive analytics—a type of sophisticated data mining—can identify fraudulent claims before they are paid. This may help end the “pay-and-chase” model, where agencies and law enforcement must track down and try to recover fraudulently obtained funds after the money has already gotten into fraudsters’ hands.
Interested in learning more about big data and data mining? Check out our Government Data Sharing Community of Practice where you can find the minutes from prior meetings, register for upcoming events, and sign up to receive e-mails.
Comments on GAO’s WatchBlog? Contact blog@gao.gov.