Personal or financial information is one of the most sought after goods for cyber criminals. Leaking such information can make a company’s customers vulnerable to identity theft and fraud. Further, it will also damage the company’s reputation and trust of their customers. Find out how Amazon Macie can help within this context.
A DevOps Engineer or Manager, may not know which data contains personally identifiable information (PII) and might not be aware of public accessible data. In AWS, the fully managed service Amazon Macie helps to identify and protect data containing personal or other sensitive information. Macie uses a large list of sensitive data types, including names, addresses, credit card numbers. Macie automatically and continually generates an inventory of S3 buckets and categorizes the buckets into unencrypted buckets, publicly accessible buckets, or buckets shared with AWS accounts outside those you have defined in the AWS Organizations.
Ideally, Macie sets up in organizations mode, where a central account acts as a Macie administrator account. All member accounts within your organization show up in the Macie administrator account list. This enables an easy add to Macie with a click.
Macie is a regional service. Hence, requires deployment and enablement in every region where you want to analyze S3 buckets.
When Macie activates, it automatically creates an inventory of all S3 buckets. Buckets categorization occurs for access, encryption, and sharing. These categories are policy findings aiming to inform about S3 policy issues. Scanning of data will not happen automatically.
While the inventory generates automatically, you need to create specific jobs for the actual analysis of data contained in your S3 buckets. These jobs have various attributes you have to configure:
- Buckets: You can choose specific buckets to analyze, or you can use filter criteria like “all public accessible buckets”
- Schedule: run once, daily, weekly or monthly
- Analyze data already seen and new data, or just analyze new data
- optional: restrict to specific file extensions
- Managed Data Identifiers: do you want to use all of them, some of them, or none of them (these are the predefined rules to detect PII like credit card numbers or credentials)
- optional: Custom Data Identifier – your own rules to identify PII, which primarily use a regex for detection
To avoid false positives and speed up the analysis, you can restrict the Managed Data Identifiers to those that you actually care for.
When you have a job configured, you will get a cost estimation before you finally submit it.
After activation of Macie in your organization, a new role is created which allows Macie to read all unencrypted data or data that is encrypted with AWS-managed keys (SSE-S3).
Data encrypted with a customer-managed key, requires cross-account access for Macie, see https://docs.aws.amazon.com/macie/latest/user/discovery-supported-encryption-types.html#discovery-supported-encryption-cmk-configuration.
Security Hub Integration
Macie can publish its findings on Security Hub. Publishing the policy findings from the S3 inventory is somewhat redundant though because standard controls within Security Hub already covers these issues. You can choose if you want to publish policy and/or sensitive data findings on Security Hub, and set the update frequency.
Terraform Support for Amazon Macie
A popular Infrastructure as Code (IaC) tool is Terraform. Currently Terraform support for Macie is only basic. In essence, you can only enable Macie, and deploy some simple Jobs.
Deployment of Macie in the Organization context is easy. You only need to deploy two resources on your Organization management account, and that will set up the Macie administrator account in your target AWS account.
While all your Organization accounts will be added to Macie automatically, they won’t be enabled as active members. No support in Terraform is available, therefor you have to add them manually to Macie in the UI.
You can define Macie jobs in Terraform, but you probably don’t want to. You would have to generate a list of all accounts and their buckets you want to scan and use that in the job definition in Terraform. There is no support for filters like “all buckets with public access”. There is no support for disabling or using specific Managed Data Identifiers either.
Also, there is no support for other configuration choices like what kind of findings you want to publish on Security Hub, or which S3 bucket to use to keep findings longer than 30 days. So overall, deploying Macie is pretty much the only action you can do in Terraform.
Real cost estimation is only possible when the number of buckets and the amount of data is known.
The automatic inventory and policy evaluation of S3 buckets has a cost of $0.10 per bucket.
Calculation of costs for jobs occur for each execution. If you don’t configure recurring jobs, there are no recurring costs for jobs either. The actual cost calculates based on the amount of analyzed data. The first 1GB a month is for free; up to 50,000GB is $1.25 per GB. The price goes down after that to $0.63 per GB until you reach 450,000GB. Another rebate kicks in after you reach 500,000GB. Prices vary slightly between different regions, you can find a detailed overview with examples here: Macie Pricing
Cost estimation is also available when setting up Jobs, and there is also a total cost estimation in the Usage Tab.
Conclusion usage of Amazon Macie
Macie is easy to set up, with or without an IaC tool like Terraform. Most of its configuration takes place manually in the GUI though.
There is a free trial period of 30 days. This is very helpful to get insight into estimated costs for running Macie without any investment.
You have to make yourself familiar with the Managed Data Identifiers for data analysis and select those that fit your use case. You might need to define Custom Data Identifiers if the managed ones don’t work for you.
With that in mind, you can already define meaningful jobs in the free trial period, and you’ll be able to answer these questions about cost and additional effort:
- What are my base costs for inventory and policy evaluation?
- What are my estimated costs for the jobs I intend to run, and how much data am I going to analyze?
- How many key policies do I have to change in other organizational accounts, because of customer-managed keys usage for S3 encryption?