Configuring Census to Use an S3 Bucket You Control

How Census Uses Amazon S3

When syncing data from Amazon Redshift, Snowflake, PostgreSQL, or Panoply, Census uses Amazon S3 as temporary storage for the data that is unloaded from your warehouse before it is sent to the destination service or application.

Our recommended apporach is to let Census manage this S3 bucket for you. This includes managing credentials with narrow permissions and short lifetimes, encrypting data in the bucket at rest and in transit, and automatically removing old data from the bucket once it is no longer in use. This approach is secure and is used by the vast majority of Census customers. If you would like Census to manage your S3 storage automatically, no action is required - this is the default setting for new Census organizations.

If you must manage your own S3 bucket for legal or regulatory reasons, this article will describe how to set that up.

This feature is not currently available at all Census subscription levels or for all data warehouses. Please contact your Census account representative for details before proceeding with these steps.

Initial Setup

This guide will take you through the steps to prepare a bucket for use with Census. We will use the aws CLI to create and configure you bucket - you should install this tool and configure it to use your credentials before starting. Similar configuration is also possible using the AWS Console UI but those instructions are not provided here.

In order to proceed, you will also need the Census AWS Account ID and your customer-owned bucket External ID - these will be provided to you by your Census representative.

Choose a bucket name

Choose a name for your bucket that is memorable and clear. For example, if your company name is "FooCorp", you might call your bucket "foocorp-census-sync-temporary-files".

Set up shell variables

In a shell, set up a few variables we'll use thorughout the rest of this guide:

export BUCKET_NAME=<your bucket name>
export CENSUS_AWS_ACCOUNT_ID=<provided by Census>
export CENSUS_CUSTOMER_BUCKET_EXTERNAL_ID=<provided by Census>

Create and configure the bucket

Create the S3 bucket. Currently us-east-1 is the only supported region, as this is where the Census processing infrastructure is located. Note that it's okay if your data warehouse is located in a different region.

aws s3api create-bucket --acl private --bucket $BUCKET_NAME --region=us-east-1

Remove all public access from the bucket:

aws s3api put-public-access-block --bucket $BUCKET_NAME \
  --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

Encrypt all data in the bucket:

aws s3api put-bucket-encryption --bucket $BUCKET_NAME \
  --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'

Automatically remove data from the bucket that is more than 30 days old:

aws s3api put-bucket-lifecycle-configuration --bucket $BUCKET_NAME --lifecycle-configuration \
  '{"Rules":[{"Expiration":{"Days":30},"ID":"expire-census-temp-files","Prefix":"","Status":"Enabled","AbortIncompleteMultipartUpload":{"DaysAfterInitiation":30}}]}'

Create an IAM role granting Census access

Census needs to be able to list, read, write, and delete items from the S3 bucket. In order to do so, you'll create an IAM role defining the allowed permissions and give the Census AWS account the ability to assume that role.

Feel free to adjust the names of the role and the inline policy to match your organization's naming scheme if you have one.

First create an empty role with a policy that allows Census to assume it via AWS Account ID and External ID:

aws iam create-role --role-name census-data-warehouse-client --assume-role-policy-document \
  '{"Version":"2012-10-17","Statement":{"Effect":"Allow","Action":"sts:AssumeRole","Principal":{"AWS":"'$CENSUS_AWS_ACCOUNT_ID'"},"Condition":{"StringEquals":{"sts:ExternalId":"'$CENSUS_CUSTOMER_BUCKET_EXTERNAL_ID'"}}}}'

This command should print out a role document that looks something like this:

{
  "Role": {
    "Path": "/",
    "RoleName": "census-data-warehouse-client",
    "RoleId": "...",
    "Arn": "arn:aws:iam::12345678:role/census-data-warehouse-client",
    "CreateDate": "2020-09-22T20:48:21+00:00",
    "AssumeRolePolicyDocument": {
      "Version": "2012-10-17",
      "Statement": {
        "Effect": "Allow",
        "Action": "sts:AssumeRole",
        "Principal": {
          "AWS": "77665544"
        },
        "Condition": {
          "StringEquals": {
            "sts:ExternalId": "CustomerOwnedS3Bucket/333444555"
          }
        }
      }
    }
  }
}

Important: Take note of the Arn from the role document - you'll need to give this value to your Census representative to finish configuration.

Then add a policy to the role granting limited access to the S3 bucket:

aws iam put-role-policy --role-name census-data-warehouse-client \
  --policy-name census-data-warehouse-client --policy-document \
  '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":["s3:GetBucketLocation","s3:GetObject","s3:HeadBucket","s3:DeleteObject","s3:ListBucket","s3:PutObject"],"Resource":["arn:aws:s3:::'$BUCKET_NAME'/*","arn:aws:s3:::'$BUCKET_NAME'"]}]}'

Finishing up

Provide the bucket name you chose and your Role ARN (it should look like arn:aws:iam::12345678:role/census-data-warehouse-client to your Census representative and they will complete the configuration on your behalf. Once Census is configured to use your S3 bucket, you can verify that it is working correctly in Census by going to Connections, then clicking "Test" next to your warehouse to run a test sync using your configured S3 bucket.

Ongoing Bucket Maintenance

No maintenance of your Census data storage bucket is required. Census will use the credentials you provided in the "Initial Setup" step to automatically create and remove files as needed, and the S3 retention policies you defined in the previous step will automatically remove any stale data in the unlikely event that Census fails to do so.

We strongly recommend that once set up, you do not modify the bucket or any of the keys it contains in any way. Adding, removing, modifying, or renaming data in your Census bucket is not supported and will cause Census syncs to fail. In addition, the Census bucket should not be simultaneously employed for any other purposes. If you need to rename your bucket or the IAM roles you have granted to the bucket, please contact Census Support.