Anatomy of a Data Leak - Capital One edition

On July 17, 2019 an unknown individual sent the following email to responsibledisclosure@capitalone.com, informing them of what appeard to be leaked data from Capital One published in a public GitHub gist.

Screenshot of email to Capital One by tipster

Capital One inspected the gist, and found that in fact it referenced an IP address of a server they owned. After thorough investigation, with involvement of the FBI, it was revealed that an intruder had made off with personal data of over 100 million Capital One customers, including names, addresses, phone numbers, email addresses, dates of birth and self-reported income, as well as some social security and bank account numbers.

In this post we’ll examine the technical exploits that facilitated the attack and how it relates to security of data in the public cloud. What is remarkable about this incident is that it only took a few hours for the attacker to steal all this data, and despite Capital One having years of experience building on AWS, an outside tipster had to alert them of the breach.

A brief history of Capital One and AWS

In 2012, Capital One began an 8-year long infrastructure overhaul, moving from physical data centers managed by the bank to the public cloud on Amazon Web Services.

“We are truly all in on the cloud, and AWS has been instrumental in enabling us to take full advantage of the benefits of being in the cloud,” says Chris Nims, senior vice president of cloud and productivity engineering at Capital One. “Going all in on the cloud has enabled both instant provisioning of infrastructure and rapid innovation. We are able to manage data at a much larger scale and unlock the power of machine learning to deliver enhanced customer experiences.” [1]

They are distinguished as the first US bank to adopt public cloud infrastructure, and grew their technology organization to over 11,000 people, giving engineers time and space to get trained on the job for AWS certifications and engage directly with AWS subject matter experts during company events.

Outside of AWS, Capital One has a strong reputation in data engineering, has open sourced many of their projects [2], and invests heavily in machine learning and data operations. Suffice to say, they are not a novice organization when it comes to AWS and managing data.

And yet a single individual, Paige Thompson, was able to break through “bank grade” security to steal data on a hundred million customers, without anyone at Capital One even noticing.

The “bomb vest”

Paige had a habit of scanning servers on AWS looking for weaknesses. She probed tens of millions of servers, looking for vulnerabilities in security groups (firewalls) or software. In Capital One’s account, Paige found an ec2 instance running Apache, acting as a Web Application Firewall (WAF) [3]. WAFs inspect HTTP requests for common attack signatures like those in the OWASP top-10 [4], before forwarding the request to a backend for processing.

“Ive basically strapped myself with a bomb vest, fucking dropping capital ones dox and admitting it” - from a Slack post by Paige to a friend

Paige was able to “trick” this server into issuing HTTP requests against AWS’ internal APIs, instead of the request going to one of CapOne’s backend servers. Although calling this a trick is an overstatement - WAFs are basically reverse proxies, and unless they are configured with allow/block lists, forwarding requests is part of their normal operation.

This type of attack is categorized as “Server Side Request Forgery” (SSRF), due to the exploit vector of being able to make a server issue a request on behalf of the attacker. From a security point of view, the request appears to be coming from a server owned by an organization, instead of a potentially malicious source. Since most security is still based on a “perimeter”, a server you own has a higher level of trust, and is often granted greater privileges to access other internal services than random requests from the Internet. (FWIW “zero trust” is aiming to change this paradigm).

What is unique about servers running on AWS is that they all have access to an internal API managed by Amazon called the “EC2 metadata service”. This service receives HTTP requests coming from an EC2 instance (server), and simply returns information about the instance itself. This is machine information about the instance size, the virtual image it is running, when it was started, etc. among other things. Crucial to this attack is that the metadata service also returns AWS credentials (API key & secret) that processes on the EC2 instance can use to access other AWS services. This is useful for running applications on an EC2 instance, as they can inherit the credentials assigned to the instance itself, rather than having to package credentials as part of the application. Since credentials don’t need to be managed for applications and instances independently, teams can now treat them as a single entity, and this simplifies infrastructure security.

In the case of CapOne’s WAF instance, this allowed Paige to retrieve a set of AWS credentials for the WAF server instance, and these credentials had permissions assigned by role called *****-WAF-Role. These permissions happened to allow access to S3 buckets, which CapOne used to store all kinds of data, including documents corresponding to customers’ applications for a credit card. Based on my experience, it’s pretty typical to permission access to S3 buckets very broadly, such as give the ability to access all buckets in an AWS account, or all objects in a single S3 bucket.

On March 22, 2019, Paige used these credentials to issue commands directly against AWS APIs, starting with List Buckets to find where data was being stored. There happened to be over 700 buckets accessible by this *****-WAF-Role. She then used a Sync command to download all of the data from single buckets at a time. And so as fast as her VPN connection would allow, she exfiltrated around 30 Gigabytes of data about CapOne’s customers. This probably didn’t take more than a few hours.

Paige also had a habit of posting about her exploits on Twitter and Slack groups. On June 18 she DM’ed a follower saying “Ive basically strapped myself with a bomb vest, fucking dropping capital ones dox and admitting it”. This follower reported the leak to Capital One with the email posted above, 117 days after the attack. At no time were any security alerts or alarms tripped that notified CapOne to this leak, and it’s not clear how long this vulnerability existed prior to being exploited.

Main steps involved in attack

A thin line of defense

It’s easy to piece together scenarios where unintentional mistakes were made that allowed this vulnerability to exist. A maintainer of the WAF, which itself is a security control against malicious HTTP requests, might not have known about AWS metadata service and the credentials it responds with to HTTP requests. A developer of the *****-WAF-Role policy might not have know which S3 buckets contained sensitive data. The owner of the S3 bucket which contained credit card applications might not have known which internal CapOne servers needed access to these documents.

And this wasn’t a case where this data was unintentionally left exposed in a public database or S3 bucket. However it does highlight the difficulty of designing comprehensive data security in the public cloud with complicated interplay of services, servers, roles and permissions. Security controls intended to be simple and scalable to deploy may offer no practical protection against data leaks. Server-side encryption of S3 data, for example, would not have changed the outcome of this attack.

Because S3 buckets are schemaless, they become a dumping ground for data of all types - database backups, document storage, “big data” warehousing, etc. Due to the nature of S3 APIs being publicly available, there is inherently a single layer of security sitting in front of most companies’ precious data - access control. Experienced developers working in the cloud overlook this fact, and fail to add additional safeguards to sensitive data directly.

Epilogue

Capital One ultimately agreed to pay $80 million to settle federal bank regulators’ claims, and $190 million to people whose data had been exposed in the breach, settling a class-action lawsuit. The regulator highlighted “the bank’s failure to establish effective risk assessment processes prior to migrating significant information technology operations to the public cloud environment and the bank’s failure to correct the deficiencies in a timely manner.” [5]

News reports on the data leak highlighted the fact that Paige had previously worked as a developer for AWS for a little over a year. This created an impression that she used some insider information or internal exploit to “hack” AWS. But the reality is that this attack didn’t require special knowledge, and anyone who has professional experience working with cloud has enough skill to pull this off.

In fact Amazon upgraded the security for accessing the EC2 metadata service due to the fallout from the attack, calling out protections for WAFs specifically. [6]

Paige’s criminal trial was held in June, 2022 and she was found guilty of wire fraud, unauthorized access to a protected computer and damaging a protected computer. Capital One continues to leverage AWS for all of their infrastructure.

Footnotes

Much of this article was sourced from the official FBI report: https://www.justice.gov/usao-wdwa/press-release/file/1188626/download

As well as a Krebs on security article with reports from insider investigators: https://krebsonsecurity.com/2019/07/capital-one-data-theft-impacts-106m-people/

[1] https://aws.amazon.com/solutions/case-studies/capital-one-all-in-on-aws/

[2] https://github.com/capitalone

[3] https://krebsonsecurity.com/2019/08/what-we-can-learn-from-the-capital-one-hack/

[4] https://owasp.org/www-project-top-ten/

[5] https://www.occ.gov/news-issuances/news-releases/2020/nr-occ-2020-101.html

[6] https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/