Skip to main content

Amazon S3 Integration Tutorial

This tutorial guides you through setting up and using the Corsano S3 sync solution, which allows customers to connect their AWS S3 bucket for automatic data synchronization from the Corsano cloud to their storage.

Overview

The Corsano S3 sync solution provides a seamless way to automatically export your study data to your own AWS S3 bucket. This enables you to:

  • Maintain full control over your data storage
  • Integrate Corsano data with your existing data pipelines
  • Automate data processing workflows
  • Ensure compliance with data residency requirements
  • Scale storage according to your needs

Prerequisites

Before setting up S3 integration, ensure you have:

  1. Active Corsano Study Portal Account: Access to study.corsano.com
  2. AWS Account: An active AWS account with appropriate permissions
  3. S3 Bucket: An existing S3 bucket or the ability to create one
  4. API Access: Valid API tokens for the Corsano platform

Setting Up S3 Integration

Step 1: Access Integration Settings

  1. Log into your Corsano Study Portal at study.corsano.com
  2. Navigate to SettingsIntegration tab
  3. Locate the Amazon S3 section

Amazon S3 Integration Settings

Step 2: Configure S3 Credentials

Fill in the following required fields:

  • AWS Access Key: Your AWS access key ID
  • AWS Secret Key: Your AWS secret access key
  • AWS Region: Select the region where your S3 bucket is located
  • Endpoint: The S3 endpoint (default: https://s3.amazonaws.com)
  • Bucket Name: The name of your S3 bucket

Step 3: Choose Data Export Format

Select your preferred data export format:

  • CSV Format: Universal format readable by Excel and most applications
  • Apache Avro: Binary format with schema, efficient for large datasets

Step 4: Required AWS Permissions

The provided access key must have full object manipulation permissions for the specified bucket:

  • List bucket contents (s3:ListBucket)
  • Get objects (s3:GetObject, s3:GetObjectVersion)
  • Put/Upload objects (s3:PutObject)
  • Delete objects (s3:DeleteObject)
  • Get object metadata (s3:GetObjectMetadata)
  • Multipart upload permissions

Integration Test Details

When you click "Test Connection", we'll attempt to create a file named corsano_test in your S3 bucket to verify permissions.

Step 5: Test and Save Configuration

  1. Click Test Connection to verify your credentials
  2. If successful, click Save Configuration to store your settings
  3. Use Clear Configuration if you need to reset the form

API Integration

Authentication

All API calls require authentication using your researcher API token. Include it as a query parameter:

token=YOUR_RESEARCHER_API_TOKEN

Sync History API

Monitor the synchronization status and history using the sync history endpoint:

curl --request GET \
--url 'https://api.integration.corsano.com/v1/groups/{GROUP_CODE}/sync-history' \
--header 'accept: application/json' \
--header 'content-type: application/json'

Parameters

  • group_code: Your study group identifier
  • changed_from: Start date for filtering (ISO 8601 format)
  • changed_to: End date for filtering (ISO 8601 format)
  • token: Your researcher API token

Example Request

curl --request GET \
--url 'https://api.integration.corsano.com/v1/groups/EZTJD/sync-history?changed_from=2025-08-07T10%3A00%3A00&changed_to=2025-08-19T22%3A00%3A00&token=YOUR_RESEARCHER_API_TOKEN' \
--header 'User-Agent: insomnia/11.4.0' \
--header 'accept: application/json' \
--header 'content-type: application/json'

Data Synchronization Process

Automatic Sync

Once configured, the S3 integration automatically:

  1. Monitors Data Changes: Continuously watches for new or updated data in your study
  2. Processes Data: Formats data according to your selected export format
  3. Uploads to S3: Securely transfers data to your specified S3 bucket
  4. Maintains History: Logs all synchronization activities for audit purposes

Sync Frequency

  • Real-time: Data is synced every hour.
  • Batch Processing: Large datasets are processed in efficient batches
  • Retry Logic: Automatic retry mechanisms for failed uploads

Data Organization

Data is organized in your S3 bucket with the following structure:

your-bucket/
├── group_code_1/
│ ├── patient_uuid_1/
│ │ ├── 2024-01-01/
│ │ │ ├── 00/
│ │ │ │ ├── data_file_1.csv
│ │ │ │ └── data_file_2.csv
│ │ │ ├── 01/
│ │ │ │ └── data_file_3.csv
│ │ │ └── ... (24 hour folders)
│ │ ├── 2024-01-02/
│ │ └── ...
│ ├── patient_uuid_2/
│ └── ...
├── group_code_2/
└── ...

Structure breakdown:

  • Root level: Group folders (patient groups)
  • Group level: Patient UUID folders
  • Patient level: Date folders (YYYY-MM-DD format)
  • Date level: 24 hour folders (00-23, UTC timezone)
  • Hour level: Data files

Monitoring and Troubleshooting

Common Issues and Solutions

1. Authentication Errors

Problem: "Access denied" or "Invalid credentials" Solution: Verify your AWS access keys and ensure they have S3 permissions

2. Bucket Access Issues

Problem: "Bucket not found" or "Access denied to bucket" Solution:

  • Verify bucket name spelling
  • Ensure bucket exists in the specified region
  • Check bucket permissions and policies

3. Network Connectivity

Problem: "Connection timeout" or "Network error" Solution:

  • Check your network connectivity
  • Verify firewall settings
  • Ensure AWS endpoints are accessible

Security Best Practices

AWS IAM Configuration

Create a dedicated IAM user with the required permissions:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObjectMetadata"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}

Data Encryption

  • Enable server-side encryption on your S3 bucket
  • Use AWS KMS for additional encryption control
  • Ensure data is encrypted in transit (HTTPS)

Performance Optimization

Bucket Configuration

  • Choose the appropriate S3 storage class for your use case
  • Enable S3 Transfer Acceleration for faster uploads
  • Use appropriate bucket policies for cost optimization

Data Processing

  • Consider using Apache Avro format for large datasets
  • Implement data lifecycle policies for cost management

Integration Examples

Python Integration

import requests
from datetime import datetime, timedelta

def get_sync_history(group_code, api_token, days_back=7):
base_url = "https://api.integration.corsano.com/v1"

# Calculate date range
end_date = datetime.now()
start_date = end_date - timedelta(days=days_back)

# Format dates for API
start_str = start_date.strftime("%Y-%m-%dT%H:%M:%S")
end_str = end_date.strftime("%Y-%m-%dT%H:%M:%S")

url = f"{base_url}/groups/{group_code}/sync-history"
params = {
"changed_from": start_str,
"changed_to": end_str,
"token": api_token
}

headers = {
"accept": "application/json",
"content-type": "application/json"
}

response = requests.get(url, params=params, headers=headers)
return response.json()

# Usage
history = get_sync_history("EZTJD", "YOUR_RESEARCHER_API_TOKEN")
print(history)

Node.js Integration

const axios = require('axios');

async function getSyncHistory(groupCode, apiToken, daysBack = 7) {
const baseUrl = 'https://api.integration.corsano.com/v1';

// Calculate date range
const endDate = new Date();
const startDate = new Date(endDate.getTime() - (daysBack * 24 * 60 * 60 * 1000));

// Format dates for API
const startStr = startDate.toISOString().split('.')[0];
const endStr = endDate.toISOString().split('.')[0];

const url = `${baseUrl}/groups/${groupCode}/sync-history`;
const params = {
changed_from: startStr,
changed_to: endStr,
token: apiToken
};

const headers = {
'accept': 'application/json',
'content-type': 'application/json'
};

try {
const response = await axios.get(url, { params, headers });
return response.data;
} catch (error) {
console.error('Error fetching sync history:', error.message);
throw error;
}
}

// Usage
getSyncHistory('EZTJD', 'YOUR_RESEARCHER_API_TOKEN')
.then(history => console.log(history))
.catch(error => console.error('Failed:', error));

Support and Resources

Getting Help

  • Documentation: Refer to the main Corsano API documentation
  • Support Portal: Contact support through your study portal
  • Community: Join the Corsano developer community

Additional Resources

Conclusion

The Corsano S3 sync solution provides a robust, secure, and efficient way to integrate your study data with your existing AWS infrastructure. By following this tutorial, you can set up automated data synchronization that scales with your needs while maintaining full control over your data storage and processing workflows.

For additional assistance or advanced configuration options, please contact the Corsano support team through your study portal.