Amazon S3 Integration Tutorial

This tutorial guides you through setting up and using the Corsano S3 sync solution, which allows customers to connect their AWS S3 bucket for automatic data synchronization from the Corsano cloud to their storage.

Overview

The Corsano S3 sync solution provides a seamless way to automatically export your study data to your own AWS S3 bucket. This enables you to:

Maintain full control over your data storage
Integrate Corsano data with your existing data pipelines
Automate data processing workflows
Ensure compliance with data residency requirements
Scale storage according to your needs

Prerequisites

Before setting up S3 integration, ensure you have:

Active Corsano Study Portal Account: Access to study.corsano.com
AWS Account: An active AWS account with appropriate permissions
S3 Bucket: An existing S3 bucket or the ability to create one
API Access: Valid API tokens for the Corsano platform

Setting Up S3 Integration

Step 1: Access Integration Settings

Log into your Corsano Study Portal at study.corsano.com
Navigate to Settings → Integration tab
Locate the Amazon S3 section

Amazon S3 Integration Settings

Step 2: Configure S3 Credentials

Fill in the following required fields:

AWS Access Key: Your AWS access key ID
AWS Secret Key: Your AWS secret access key
AWS Region: Select the region where your S3 bucket is located
Endpoint: The S3 endpoint (default: https://s3.amazonaws.com)
Bucket Name: The name of your S3 bucket

Step 3: Choose Data Export Format

Select your preferred data export format:

CSV Format: Universal format readable by Excel and most applications
Apache Avro: Binary format with schema, efficient for large datasets

Step 4: Required AWS Permissions

The provided access key must have full object manipulation permissions for the specified bucket:

List bucket contents (s3:ListBucket)
Get objects (s3:GetObject, s3:GetObjectVersion)
Put/Upload objects (s3:PutObject)
Delete objects (s3:DeleteObject)
Get object metadata (s3:GetObjectMetadata)
Multipart upload permissions

Integration Test Details
When you click "Test Connection", we'll attempt to create a file named corsano_test in your S3 bucket to verify permissions.

Step 5: Test and Save Configuration

Click Test Connection to verify your credentials
If successful, click Save Configuration to store your settings
Use Clear Configuration if you need to reset the form

API Integration

Authentication

All API calls require authentication using your researcher API token. Include it as a query parameter:

token=YOUR_RESEARCHER_API_TOKEN

Sync History API

Monitor the synchronization status and history using the sync history endpoint:

curl --request GET \
  --url 'https://api.integration.corsano.com/v1/groups/{GROUP_CODE}/sync-history' \
  --header 'accept: application/json' \
  --header 'content-type: application/json'

Parameters

group_code: Your study group identifier
changed_from: Start date for filtering (ISO 8601 format)
changed_to: End date for filtering (ISO 8601 format)
token: Your researcher API token

Example Request

curl --request GET \
  --url 'https://api.integration.corsano.com/v1/groups/EZTJD/sync-history?changed_from=2025-08-07T10%3A00%3A00&changed_to=2025-08-19T22%3A00%3A00&token=YOUR_RESEARCHER_API_TOKEN' \
  --header 'User-Agent: insomnia/11.4.0' \
  --header 'accept: application/json' \
  --header 'content-type: application/json'

Data Synchronization Process

Automatic Sync

Once configured, the S3 integration automatically:

Monitors Data Changes: Continuously watches for new or updated data in your study
Processes Data: Formats data according to your selected export format
Uploads to S3: Securely transfers data to your specified S3 bucket
Maintains History: Logs all synchronization activities for audit purposes

Sync Frequency

Real-time: Data is synced every hour.
Batch Processing: Large datasets are processed in efficient batches
Retry Logic: Automatic retry mechanisms for failed uploads

Data Organization

Data is organized in your S3 bucket with the following structure:

your-bucket/
├── group_code_1/
│   ├── patient_uuid_1/
│   │   ├── 2024-01-01/
│   │   │   ├── 00/
│   │   │   │   ├── data_file_1.csv
│   │   │   │   └── data_file_2.csv
│   │   │   ├── 01/
│   │   │   │   └── data_file_3.csv
│   │   │   └── ... (24 hour folders)
│   │   ├── 2024-01-02/
│   │   └── ...
│   ├── patient_uuid_2/
│   └── ...
├── group_code_2/
└── ...

Structure breakdown:

Root level: Group folders (patient groups)
Group level: Patient UUID folders
Patient level: Date folders (YYYY-MM-DD format)
Date level: 24 hour folders (00-23, UTC timezone)
Hour level: Data files

Monitoring and Troubleshooting

Common Issues and Solutions

1. Authentication Errors

Problem: "Access denied" or "Invalid credentials" Solution: Verify your AWS access keys and ensure they have S3 permissions

2. Bucket Access Issues

Problem: "Bucket not found" or "Access denied to bucket" Solution:

Verify bucket name spelling
Ensure bucket exists in the specified region
Check bucket permissions and policies

3. Network Connectivity

Problem: "Connection timeout" or "Network error" Solution:

Check your network connectivity
Verify firewall settings
Ensure AWS endpoints are accessible

Security Best Practices

AWS IAM Configuration

Create a dedicated IAM user with the required permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:GetObjectMetadata"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

Data Encryption

Enable server-side encryption on your S3 bucket
Use AWS KMS for additional encryption control
Ensure data is encrypted in transit (HTTPS)

Performance Optimization

Bucket Configuration

Choose the appropriate S3 storage class for your use case
Enable S3 Transfer Acceleration for faster uploads
Use appropriate bucket policies for cost optimization

Data Processing

Consider using Apache Avro format for large datasets
Implement data lifecycle policies for cost management

Integration Examples

Python Integration

import requests
from datetime import datetime, timedelta

def get_sync_history(group_code, api_token, days_back=7):
    base_url = "https://api.integration.corsano.com/v1"
    
    # Calculate date range
    end_date = datetime.now()
    start_date = end_date - timedelta(days=days_back)
    
    # Format dates for API
    start_str = start_date.strftime("%Y-%m-%dT%H:%M:%S")
    end_str = end_date.strftime("%Y-%m-%dT%H:%M:%S")
    
    url = f"{base_url}/groups/{group_code}/sync-history"
    params = {
        "changed_from": start_str,
        "changed_to": end_str,
        "token": api_token
    }
    
    headers = {
        "accept": "application/json",
        "content-type": "application/json"
    }
    
    response = requests.get(url, params=params, headers=headers)
    return response.json()

# Usage
history = get_sync_history("EZTJD", "YOUR_RESEARCHER_API_TOKEN")
print(history)

Node.js Integration

const axios = require('axios');

async function getSyncHistory(groupCode, apiToken, daysBack = 7) {
    const baseUrl = 'https://api.integration.corsano.com/v1';
    
    // Calculate date range
    const endDate = new Date();
    const startDate = new Date(endDate.getTime() - (daysBack * 24 * 60 * 60 * 1000));
    
    // Format dates for API
    const startStr = startDate.toISOString().split('.')[0];
    const endStr = endDate.toISOString().split('.')[0];
    
    const url = `${baseUrl}/groups/${groupCode}/sync-history`;
    const params = {
        changed_from: startStr,
        changed_to: endStr,
        token: apiToken
    };
    
    const headers = {
        'accept': 'application/json',
        'content-type': 'application/json'
    };
    
    try {
        const response = await axios.get(url, { params, headers });
        return response.data;
    } catch (error) {
        console.error('Error fetching sync history:', error.message);
        throw error;
    }
}

// Usage
getSyncHistory('EZTJD', 'YOUR_RESEARCHER_API_TOKEN')
    .then(history => console.log(history))
    .catch(error => console.error('Failed:', error));

Support and Resources

Getting Help

Documentation: Refer to the main Corsano API documentation
Support Portal: Contact support through your study portal
Community: Join the Corsano developer community

Additional Resources

Conclusion

The Corsano S3 sync solution provides a robust, secure, and efficient way to integrate your study data with your existing AWS infrastructure. By following this tutorial, you can set up automated data synchronization that scales with your needs while maintaining full control over your data storage and processing workflows.

For additional assistance or advanced configuration options, please contact the Corsano support team through your study portal.

Amazon S3 Integration Tutorial

Overview​

Prerequisites​

Setting Up S3 Integration​

Step 1: Access Integration Settings​

Step 2: Configure S3 Credentials​

Step 3: Choose Data Export Format​

Step 4: Required AWS Permissions​

Step 5: Test and Save Configuration​

API Integration​

Authentication​

Sync History API​

Parameters​

Example Request​

Data Synchronization Process​

Automatic Sync​

Sync Frequency​

Data Organization​

Monitoring and Troubleshooting​

Common Issues and Solutions​

1. Authentication Errors​

2. Bucket Access Issues​

3. Network Connectivity​

Security Best Practices​

AWS IAM Configuration​

Data Encryption​

Performance Optimization​

Bucket Configuration​

Data Processing​

Integration Examples​

Python Integration​

Node.js Integration​

Support and Resources​

Getting Help​

Additional Resources​

Conclusion​