Aws Data Sync FAQs Page Flashcards
A: AWS DataSync can transfer data between Network File System (NFS), Server Message Block (SMB) file servers, or AWS Snowcone, and Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, and Amazon FSx for Windows File Server file systems.
Q: Where can I transfer data to and from?
A: You can use AWS DataSync to accelerate and schedule the transfers from on-premises systems into or out of AWS for processing. It can help speed up critical hybrid cloud storage workflows in industries that need to move active files into AWS quickly, including video production in media and entertainment, seismic research in oil and gas, machine learning in life science, and big data analytics in finance. DataSync provides timely delivery to ensure dependent processes are not delayed. You can specify exclude filters, include filters, or both, to determine which files, folders or objects gets transferred each time your task runs.
Q: How can I use AWS DataSync for recurring transfers between on-premises and AWS for ongoing workflows?
A: With AWS DataSync, you can periodically replicate files into all Amazon S3 storage classes, or send the data to Amazon EFS or Amazon FSx for Windows File Server for a standby file system. Use the built-in task scheduling functionality to ensure that changes to your dataset are regularly copied to your destination storage. Read this AWS storage blog to learn more about data protection using AWS DataSync.
Q: How can I use AWS DataSync to replicate data to AWS for business continuity?
A: AWS DataSync is part of the Amazon WorkDocs Migration Service. DataSync makes it easier and faster to migrate home directories and department shares to WorkDocs. To learn more about using DataSync for migrations to WorkDocs, read the blog 'Migrating network file shares to Amazon WorkDocs using AWS DataSync.'
Q: How can I use AWS DataSync to migrate to Amazon WorkDocs?
A: You can transfer data using AWS DataSync with a few clicks in the AWS Management Console or through the AWS Command Line Interface (CLI). To get started, follow these 3 steps:
Q: How do I get started with AWS DataSync?
A: You can find the minimum required resources to run the agent here.
Q: What are the resource requirements for the AWS DataSync agent?
A: As AWS DataSync transfers and stores data, it performs integrity checks to ensure the data written to the destination matches the data read from the source. Additionally, an optional verification check can be performed to compare source and destination at the end of the transfer. DataSync will calculate and compare full-file checksums of the data stored in the source and in the destination. You can check either the entire dataset or just the files or objects that DataSync transferred.
Q: How does AWS DataSync ensure my data is copied correctly?
A: You can use the AWS Management Console or CLI to monitor the status of data being transferred. Using Amazon CloudWatch Metrics, you can see the number of files and amount of data which has been copied. You can also enable logging of individual files to CloudWatch Logs, to identify what was transferred at a given time, as well as the results of the content integrity verification performed by DataSync. This simplifies monitoring, reporting, and troubleshooting, and enables you to provide timely updates to stakeholders. You can find additional information, such as transfer progress, in the AWS Management Console or CLI.
Q: How can I monitor the status of data being transferred by AWS DataSync?
A: Yes. You can specify an exclude filter, an include filter, or both, to limit which files, folders, or objects gets transferred each time a task runs. When creating a task, you configure the file paths or object keys that should always be excluded from being copied. Then, when you start a task, you configure the file paths or object keys that should be included for that execution of the task. If no filters are configured, each time a task runs it will transfer all changes from the source to the destination. Read this AWS storage blog to learn more about using common filters with DataSync.
Q: Can I filter the files and folders that AWS DataSync transfers?
A: Yes. You can schedule your tasks using the AWS DataSync Console or AWS Command Line Interface (CLI), without needing to write and run scripts to manage repeated transfers. Task scheduling automatically runs tasks on the schedule you configure, with hourly, daily, or weekly options provided directly in the Console. This enables you to ensure that changes to your dataset are automatically detected and copied to your destination storage.
Q: Can I configure AWS DataSync to transfer on a schedule?
A: Yes. When transferring files, AWS DataSync creates a director structure on the destination that is similar to the source location's structure.
Q: Does AWS DataSync preserve the directory structure when copying files?
A: If a task is interrupted, for instance, if the network connection goes down or the AWS DataSync agent is restarted, the next run of the task will transfer missing files, and the data will be complete and consistent at the end of this run. Each time a task is started it performs an incremental copy, transferring only the changes from the source to the destination.
Q: What happens if an AWS DataSync task is interrupted?
A: Yes. You can use AWS DataSync with your Direct Connect link to access public service endpoints or private VPC endpoints. When using VPC endpoints, data transferred between the DataSync agent and AWS services does not traverse the public internet or need public IP addresses, increasing the security of data as it is copied over the network.
Q: Can I use AWS DataSync with AWS Direct Connect?
A: Yes. You can use VPC endpoints to ensure data transferred between your AWS DataSync agent, either deployed on-premises or in-cloud, doesn't traverse the public internet or need public IP addresses. Using VPC endpoints increases the security of your data by keeping network traffic within your Amazon Virtual Private Cloud (Amazon VPC). VPC endpoints for DataSync are powered by AWS PrivateLink, a highly available, scalable technology that enables you to privately connect your VPC to supported AWS services.
Q: Does AWS DataSync support VPC endpoints or AWS PrivateLink?
A: Yes. When configuring an S3 bucket for use with AWS DataSync, you can select the S3 storage class that DataSync uses to store objects. DataSync supports storing data directly into S3 Standard, S3 Intelligent-Tiering, S3 Standard-Infrequent Access (S3 Standard-IA), S3 One Zone-Infrequent Access (S3 One Zone-IA), Amazon S3 Glacier (S3 Glacier), and Amazon S3 Glacier Deep Archive (S3 Glacier Deep Archive). More information on Amazon S3 storage classes can be found in the Amazon Simple Storage Service Developer Guide.
Q: Can I copy my data into Amazon S3 Glacier, Amazon S3 Glacier Deep Archive, or other S3 storage classes?
considerations when working with Amazon S3 storage classes
Q: Can I copy data out of S3 Standard-IA and S3 One Zone-IA storage classes?
considerations when working with Amazon S3 storage classes
Q: Can I copy data out of S3 Glacier and Amazon S3 Glacier Deep Archive?
manually configure a role
Q: How does AWS DataSync access my Amazon S3 bucket?
S3 user metadata
Q: How does AWS DataSync convert files and folders to or from objects in Amazon S3?
To avoid minimum capacity charge per object, AWS DataSync automatically stores small objects in S3 Standard. To minimize data retrieval fees, you can configure DataSync to verify only files that were transferred by a given task. To avoid minimum storage duration charges, DataSync has controls for overwriting and deleting objects. Read about considerations when working with Amazon S3 storage classes in our documentation.
Q: Which Amazon S3 request and storage costs apply when using S3 storage classes with AWS DataSync?
A: AWS DataSync accesses your Amazon EFS file system using the NFS protocol. The DataSync service mounts your file system from within your VPC from Elastic Network Interfaces (ENIs) managed by the DataSync service. DataSync fully manages the creation, use, and deletion of these ENIs on your behalf.
Q: How does AWS DataSync access my Amazon EFS file system?
A: Yes. You can use AWS DataSync to copy files into EFS and configure EFS Lifecycle Management to migrate files that have not been accessed for a set period of time to the Infrequent Access (IA) storage class.
Q: Can I use AWS DataSync with all EFS storage classes?
A: AWS DataSync accesses your Amazon FSx file system using the SMB protocol, authenticating with the username and password you configure in the AWS Console or CLI. The DataSync service mounts your file system from within your VPC from Elastic Network Interfaces (ENIs) managed by the DataSync service. DataSync fully manages the creation, use, and deletion of these ENIs on your behalf.
Q: How does DataSync access my Amazon FSx file system?
documentation
Q: What Windows metadata is transferred when copying from SMB shares to Amazon FSx?
A: The DataSync agent is pre-installed on your Snowcone device as an AMI. To transfer data online to AWS, connect the AWS Snowcone device to the external network and use AWS OpsHub or the CLI to launch the DataSync agent AMI. Activate the agent using the AWS Management Console or CLI, and set up your online data transfer task between AWS Snowcone’s NFS store, and Amazon S3, Amazon EFS, or Amazon FSx for Windows File Server.
Q: How do I transfer data between AWS Snowcone and AWS storage services?
A: The rate at which AWS DataSync can copy a given dataset is a function of amount of data, I/O bandwidth achievable from the source and destination storage, network bandwidth available, and network conditions. A single DataSync agent is capable of saturating a 10 Gbps network link.
Q: How fast can AWS DataSync copy my file system to AWS?
A: Yes. You can control the amount of network bandwidth that AWS DataSync will use by configuring the built-in bandwidth throttle. This can help to minimize impact on other users or applications who rely on the same network connection.
Q: Can I control the amount of network bandwidth that an AWS DataSync task uses?
A: AWS DataSync generates Amazon CloudWatch Metrics to provide granular visibility into the transfer process. Using these metrics, you can see the number of files and amount of data which has been copied, as well as file discovery and verification progress. You can see CloudWatch Graphs with these metrics directly in the DataSync Console.
Q: How can I monitor the performance of AWS DataSync?
A: Depending on the capacity of your on-premises file store, and the quantity and size of files to be transferred, AWS DataSync may affect the response time of other clients when accessing the same source data store, because the agent reads or writes data from that storage system. Configuring a bandwidth limit for a task will reduce this impact by limiting the I/O against your storage system.
Q: Will AWS DataSync affect the performance of my source file system?
A: Yes. All data transferred between the source and destination is encrypted via Transport Layer Security (TLS), which replaced Secure Sockets Layer (SSL). Data is never persisted in AWS DataSync itself. The service supports using default encryption for S3 buckets, Amazon EFS file system encryption of data at rest, and Amazon FSx For Windows File Server encryption at rest and in transit.
Q: Is my data encrypted while being transferred and stored?
A: AWS DataSync uses an agent that you deploy into your IT environment or into Amazon EC2 to access your files through the NFS or SMB protocol. This agent connects to DataSync service endpoints within AWS, and is securely managed from the AWS Management Console or CLI.
Q: How does AWS DataSync access my NFS server or SMB file share?
A: No. When copying data to or from your premises, there is no need to setup a VPN/tunnel or allow inbound connections. Your AWS DataSync agent can be configured to route through a firewall using standard network ports. You can also deploy DataSync within your Amazon Virtual Private Cloud (Amazon VPC) using VPC endpoints. When using VPC endpoints, data transferred between the DataSync agent and AWS services does not need to traverse the public internet or need public IP addresses.
Q: Does AWS DataSync require setting up a VPN to connect to my destination storage?
A: Your AWS DataSync agent connects to DataSync service endpoints within your chosen AWS Region. You can choose to have the agent connect to public internet facing endpoints, Federal Information Processing Standards (FIPS) validated endpoints, or endpoints within one of your VPCs. Activating your agent securely associates it with your AWS account. To learn more, see Choose a Service Endpoint and Activate Your Agent.
Q: How do my AWS DataSync agents securely connect to AWS?
A: Updates to the agent VM, including both the underlying operating system and the AWS DataSync software packages, are managed by the service once the agent is activated. Updates are applied non-disruptively when the agent is idle and not executing a data transfer task.
Q: How is my AWS DataSync agent patched and updated?
A: Yes. AWS DataSync is PCI-DSS compliant, which means you can use it to transfer payment information. You can download the PCI Compliance Package in AWS Artifact to learn more about how to achieve PCI Compliance on AWS.
Q: Is AWS DataSync PCI compliant?
A: Yes. AWS DataSync is HIPAA eligible, which means if you have a HIPAA BAA in place with AWS, you can use DataSync to transfer protected health information (PHI).
Q: Is AWS DataSync HIPAA eligible?
A: AWS DataSync has received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) at the Federal Risk and Authorization Management Program (FedRAMP) Moderate baseline in the US East/West Regions. If you are a federal or commercial customer, you can use AWS DataSync in the AWS East/West Region's authorization boundary with data up to the moderate impact level.
Q: Does AWS DataSync have FedRAMP JAB Moderate Provisional Authorization in the AWS US East/West?
A: AWS DataSync has received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) at the Federal Risk and Authorization Management Program (FedRAMP) High baseline in the US GovCloud Region. If you are a federal or commercial customer, you can use AWS DataSync in the AWS GovCloud (US) Region’s authorization boundary with data up to the high impact level.
Q: Does AWS DataSync have FedRAMP JAB High Provisional Authorization in the AWS GovCloud (US) Regions?
A: AWS DataSync fully automates and accelerates moving large active datasets to AWS, up to 10 times faster than command line tools. It is natively integrated with Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, Amazon CloudWatch, and AWS CloudTrail, which provides seamless and secure access to your storage services, as well as detailed monitoring of the transfer.
Q: How is AWS DataSync different from using command line tools such as rsync or the Amazon S3 command line interface?
A: AWS DataSync is ideal for online data transfers. You can use DataSync to migrate active data to AWS, transfer data to the cloud for analysis and processing, archive data to free up on-premises storage capacity, or replicate data to AWS for business continuity.
Q: When do I use AWS DataSync and when do I use AWS Snowball Edge?
A: Use AWS DataSync to migrate existing data to Amazon S3, and then use the File Gateway configuration of AWS Storage Gateway to retain access to the migrated data and for ongoing updates from your on-premises file-based applications.
Q: When do I use AWS DataSync and when do I use AWS Storage Gateway?
A: If your applications are already integrated with the Amazon S3 API, and you want higher throughput for transferring large files to S3, you can use S3 Transfer Acceleration. If you want to transfer data from existing storage systems (e.g. Network Attached Storage), or from instruments that cannot be changed (e.g. DNA sequencers, video cameras), or if you want multiple destinations, you use AWS DataSync. DataSync also automates and simplifies the data transfer by providing additional functionality, such as built-in retry and network resiliency mechanisms, data integrity verification, and flexible configuration to suit your specific needs, including bandwidth throttling, etc.
Q: When do I use AWS DataSync, and when do I use Amazon S3 Transfer Acceleration?
A: If you currently use SFTP to exchange data with third parties, AWS Transfer for SFTP provides a fully managed SFTP transfer directly into and out of Amazon S3, while reducing your operational burden.
Q: When do I use AWS DataSync and when do I use AWS Transfer for SFTP?