Skip to main content

Amazon S3 Integration Tutorial

This tutorial guides you through setting up and using the Corsano S3 sync solution, which allows customers to connect their AWS S3 bucket for automatic data synchronization from the Corsano cloud to their storage.

Overview

The Corsano S3 sync solution provides a seamless way to automatically export your study data to your own AWS S3 bucket. This enables you to:

  • Maintain full control over your data storage
  • Integrate Corsano data with your existing data pipelines
  • Automate data processing workflows
  • Ensure compliance with data residency requirements
  • Scale storage according to your needs

Prerequisites

Before setting up S3 integration, ensure you have:

  1. Active Corsano Study Portal Account: Access to study.corsano.com
  2. AWS Account: An active AWS account with appropriate permissions
  3. S3 Bucket: An existing S3 bucket or the ability to create one
  4. API Access: Valid API tokens for the Corsano platform

Setting Up S3 Integration

Step 1: Access Integration Settings

  1. Log into your Corsano Study Portal at study.corsano.com
  2. Navigate to SettingsIntegration tab
  3. Locate the Amazon S3 section

Amazon S3 Integration Settings

Step 2: Configure S3 Credentials

Fill in the following required fields:

  • AWS Access Key: Your AWS access key ID
  • AWS Secret Key: Your AWS secret access key
  • AWS Region: Select the region where your S3 bucket is located
  • Endpoint: The S3 endpoint (default: https://s3.amazonaws.com)
  • Bucket Name: The name of your S3 bucket

Step 3: Choose Data Export Format

Select your preferred data export format:

  • CSV Format: Universal format readable by Excel and most applications
  • Apache Avro: Binary format with schema, efficient for large datasets

Step 4: Required AWS Permissions

The provided access key must have full object manipulation permissions for the specified bucket:

  • List bucket contents (s3:ListBucket)
  • Get objects (s3:GetObject, s3:GetObjectVersion)
  • Put/Upload objects (s3:PutObject)
  • Delete objects (s3:DeleteObject)
  • Get object metadata (s3:GetObjectMetadata)
  • Multipart upload permissions

Integration Test Details

When you click "Test Connection", we'll attempt to create a file named corsano_test in your S3 bucket to verify permissions.

Step 5: Test and Save Configuration

  1. Click Test Connection to verify your credentials
  2. If successful, click Save Configuration to store your settings
  3. Use Clear Configuration if you need to reset the form

API Integration

Authentication

All API calls require authentication using your researcher API token. Include it as a query parameter:

token=YOUR_RESEARCHER_API_TOKEN

Sync History API

Monitor the synchronization status and history using the sync history endpoint:

curl --request GET \
--url 'https://api.integration.corsano.com/v1/groups/{GROUP_CODE}/sync-history' \
--header 'accept: application/json' \
--header 'content-type: application/json'

Parameters

  • group_code: Your study group identifier
  • changed_from: Start date for filtering (ISO 8601 format)
  • changed_to: End date for filtering (ISO 8601 format)
  • token: Your researcher API token

Example Request

curl --request GET \
--url 'https://api.integration.corsano.com/v1/groups/EZTJD/sync-history?changed_from=2025-08-07T10%3A00%3A00&changed_to=2025-08-19T22%3A00%3A00&token=YOUR_RESEARCHER_API_TOKEN' \
--header 'User-Agent: insomnia/11.4.0' \
--header 'accept: application/json' \
--header 'content-type: application/json'

Manual Export Trigger API

Trigger a manual export for a specific group, allowing you to re-export data for a specific time range, user, or data types:

curl --request POST \
--url 'https://api.integration.corsano.com/v1/s3-sync/continuous-export/{GROUP_CODE}/manual-run?token=YOUR_RESEARCHER_API_TOKEN&timestamp_from=1704067200000&timestamp_to=1704153600000' \
--header 'accept: application/json' \
--header 'content-type: application/json'

Parameters

ParameterTypeLocationRequiredDescription
GROUP_CODEstringpathYesYour study group identifier
tokenstringqueryYesYour researcher API token
timestamp_fromnumberqueryNoStart timestamp in milliseconds (Unix epoch)
timestamp_tonumberqueryNoEnd timestamp in milliseconds (Unix epoch)
user_uuidstringqueryNoSpecific patient UUID to export data for
typesstringqueryNoComma-separated list of data types to export

Example Request

# Export all data for a group within a time range
curl --request POST \
--url 'https://api.integration.corsano.com/v1/s3-sync/continuous-export/EZTJD/manual-run?token=YOUR_RESEARCHER_API_TOKEN&timestamp_from=1704067200000&timestamp_to=1704153600000' \
--header 'accept: application/json' \
--header 'content-type: application/json'

# Export data for a specific patient
curl --request POST \
--url 'https://api.integration.corsano.com/v1/s3-sync/continuous-export/EZTJD/manual-run?token=YOUR_RESEARCHER_API_TOKEN&user_uuid=abc123-def456&timestamp_from=1704067200000&timestamp_to=1704153600000' \
--header 'accept: application/json' \
--header 'content-type: application/json'

Use Cases

  • Data Recovery: Re-export data that may have failed during automatic sync
  • Historical Export: Export data from a specific time period
  • Selective Export: Export only specific data types or patients
  • Testing: Verify S3 integration is working correctly

Patient Sync Logs API

Retrieve synchronization logs for a specific patient to monitor export activity and troubleshoot issues:

curl --request GET \
--url 'https://api.integration.corsano.com/v1/s3-sync/continuous-export/patients/{USER_UUID}/sync-logs?token=YOUR_RESEARCHER_API_TOKEN' \
--header 'accept: application/json' \
--header 'content-type: application/json'

Parameters

ParameterTypeLocationRequiredDescription
USER_UUIDstringpathYesThe patient's unique identifier
tokenstringqueryYesYour researcher API token
fromnumberqueryNoStart timestamp in milliseconds (Unix epoch)
tonumberqueryNoEnd timestamp in milliseconds (Unix epoch)
limitnumberqueryNoMaximum number of log entries to return (default: 100, max: 1000)

Example Request

# Get recent sync logs for a patient
curl --request GET \
--url 'https://api.integration.corsano.com/v1/s3-sync/continuous-export/patients/abc123-def456-ghi789/sync-logs?token=YOUR_RESEARCHER_API_TOKEN&limit=50' \
--header 'accept: application/json' \
--header 'content-type: application/json'

# Get sync logs within a specific time range
curl --request GET \
--url 'https://api.integration.corsano.com/v1/s3-sync/continuous-export/patients/abc123-def456-ghi789/sync-logs?token=YOUR_RESEARCHER_API_TOKEN&from=1704067200000&to=1704153600000&limit=100' \
--header 'accept: application/json' \
--header 'content-type: application/json'

Response

Returns an array of sync log entries for the specified patient. Each entry contains details about the synchronization event, including timestamps, status, and any error information.

Use Cases

  • Monitoring: Track sync activity for specific patients
  • Troubleshooting: Identify failed syncs or issues with data export
  • Auditing: Review data export history for compliance purposes

Data Synchronization Process

Automatic Sync

Once configured, the S3 integration automatically:

  1. Monitors Data Changes: Continuously watches for new or updated data in your study
  2. Processes Data: Formats data according to your selected export format
  3. Uploads to S3: Securely transfers data to your specified S3 bucket
  4. Maintains History: Logs all synchronization activities for audit purposes

Sync Frequency

  • Real-time: Data is synced every hour.
  • Batch Processing: Large datasets are processed in efficient batches
  • Retry Logic: Automatic retry mechanisms for failed uploads

Data Organization

Data is organized in your S3 bucket with the following structure:

your-bucket/
├── group_code_1/
│ ├── patient_uuid_1/
│ │ ├── 2024-01-01/
│ │ │ ├── 00/
│ │ │ │ ├── data_file_1.csv
│ │ │ │ └── data_file_2.csv
│ │ │ ├── 01/
│ │ │ │ └── data_file_3.csv
│ │ │ └── ... (24 hour folders)
│ │ ├── 2024-01-02/
│ │ └── ...
│ ├── patient_uuid_2/
│ └── ...
├── group_code_2/
└── ...

Structure breakdown:

  • Root level: Group folders (patient groups)
  • Group level: Patient UUID folders
  • Patient level: Date folders (YYYY-MM-DD format)
  • Date level: 24 hour folders (00-23, UTC timezone)
  • Hour level: Data files

Monitoring and Troubleshooting

Common Issues and Solutions

1. Authentication Errors

Problem: "Access denied" or "Invalid credentials" Solution: Verify your AWS access keys and ensure they have S3 permissions

2. Bucket Access Issues

Problem: "Bucket not found" or "Access denied to bucket" Solution:

  • Verify bucket name spelling
  • Ensure bucket exists in the specified region
  • Check bucket permissions and policies

3. Network Connectivity

Problem: "Connection timeout" or "Network error" Solution:

  • Check your network connectivity
  • Verify firewall settings
  • Ensure AWS endpoints are accessible

Security Best Practices

AWS IAM Configuration

Create a dedicated IAM user with the required permissions:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObjectMetadata"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}

Data Encryption

  • Enable server-side encryption on your S3 bucket
  • Use AWS KMS for additional encryption control
  • Ensure data is encrypted in transit (HTTPS)

Performance Optimization

Bucket Configuration

  • Choose the appropriate S3 storage class for your use case
  • Enable S3 Transfer Acceleration for faster uploads
  • Use appropriate bucket policies for cost optimization

Data Processing

  • Consider using Apache Avro format for large datasets
  • Implement data lifecycle policies for cost management

Integration Examples

Python Integration

import requests
from datetime import datetime, timedelta

def get_sync_history(group_code, api_token, days_back=7):
base_url = "https://api.integration.corsano.com/v1"

# Calculate date range
end_date = datetime.now()
start_date = end_date - timedelta(days=days_back)

# Format dates for API
start_str = start_date.strftime("%Y-%m-%dT%H:%M:%S")
end_str = end_date.strftime("%Y-%m-%dT%H:%M:%S")

url = f"{base_url}/groups/{group_code}/sync-history"
params = {
"changed_from": start_str,
"changed_to": end_str,
"token": api_token
}

headers = {
"accept": "application/json",
"content-type": "application/json"
}

response = requests.get(url, params=params, headers=headers)
return response.json()

# Usage
history = get_sync_history("EZTJD", "YOUR_RESEARCHER_API_TOKEN")
print(history)

Node.js Integration

const axios = require('axios');

async function getSyncHistory(groupCode, apiToken, daysBack = 7) {
const baseUrl = 'https://api.integration.corsano.com/v1';

// Calculate date range
const endDate = new Date();
const startDate = new Date(endDate.getTime() - (daysBack * 24 * 60 * 60 * 1000));

// Format dates for API
const startStr = startDate.toISOString().split('.')[0];
const endStr = endDate.toISOString().split('.')[0];

const url = `${baseUrl}/groups/${groupCode}/sync-history`;
const params = {
changed_from: startStr,
changed_to: endStr,
token: apiToken
};

const headers = {
'accept': 'application/json',
'content-type': 'application/json'
};

try {
const response = await axios.get(url, { params, headers });
return response.data;
} catch (error) {
console.error('Error fetching sync history:', error.message);
throw error;
}
}

// Usage
getSyncHistory('EZTJD', 'YOUR_RESEARCHER_API_TOKEN')
.then(history => console.log(history))
.catch(error => console.error('Failed:', error));

Support and Resources

Getting Help

  • Documentation: Refer to the main Corsano API documentation
  • Support Portal: Contact support through your study portal
  • Community: Join the Corsano developer community

Additional Resources

Conclusion

The Corsano S3 sync solution provides a robust, secure, and efficient way to integrate your study data with your existing AWS infrastructure. By following this tutorial, you can set up automated data synchronization that scales with your needs while maintaining full control over your data storage and processing workflows.

For additional assistance or advanced configuration options, please contact the Corsano support team through your study portal.