Amazon Kinesis Data Streams FAQs Data Streaming Service Amazon Web Services Flashcards
Amazon Kinesis Data Streams manages the infrastructure, storage, networking, and configuration needed to stream your data at the level of your data throughput. You do not have to worry about provisioning, deployment, ongoing-maintenance of hardware, software, or other services for your data streams. In addition, Amazon Kinesis Data Streams synchronously replicates data across three availability zones, providing high availability and data durability.
Q: What does Amazon Kinesis Data Streams manage on my behalf?
Amazon Kinesis Data Streams is useful for rapidly moving data off data producers and then continuously processing the data, be it to transform the data before emitting to a data store, run real-time metrics and analytics, or derive more complex data streams for further processing. The following are typical scenarios for using Amazon Kinesis Data Streams:
Q: What can I do with Amazon Kinesis Data Streams?
After you sign up for Amazon Web Services, you can start using Amazon Kinesis Data Streams by:
Q: How do I use Amazon Kinesis Data Streams?
The throughput of an Amazon Kinesis data stream is designed to scale without limits via increasing the number of shards within a data stream. However, there are certain limits you should keep in mind while using Amazon Kinesis Data Streams:
Q: What are the limits of Amazon Kinesis Data Streams?
Amazon Kinesis Data Streams enables real-time processing of streaming big data. It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis data stream (for example, to perform counting, aggregation, and filtering).
Q: How does Amazon Kinesis Data Streams differ from Amazon SQS?
We recommend Amazon Kinesis Data Streams for use cases with requirements that are similar to the following:
Q: When should I use Amazon Kinesis Data Streams, and when should I use Amazon SQS?
After you sign up for Amazon Web Services, you can create an Amazon Kinesis data stream through either Amazon Kinesis Management Console or CreateStream operation.
Q: How do I create an Amazon Kinesis data stream?
The throughput of an Amazon Kinesis data stream is determined by the number of shards within the data stream. Follow the steps below to estimate the initial number of shards your data stream needs. Note that you can dynamically adjust the number of shards within your data stream via resharding.
Q: How do I decide the throughput of my Amazon Kinesis data stream?
The throughput of an Amazon Kinesis data stream scales by unit of shard. One single shard is the smallest throughput of a data stream, which provides 1MB/sec data input and 2MB/sec data output.
Q: What is the minimum throughput I can request for my Amazon Kinesis data stream?
The throughput of an Amazon Kinesis data stream is designed to scale without limits. By default, each account can provision 10 shards per region. You can use the Amazon Kinesis Data Streams Limits form to request more than 10 shards within a single region.
Q: What is the maximum throughput I can request for my Amazon Kinesis data stream?
A shard provides 1MB/sec data input rate and supports up to 1000 PUT records per sec. Therefore, if the record size is less than 1KB, the actual data input rate of a shard will be less than 1MB/sec, limited by the maximum number of PUT records per second.
Q: How can record size affect the throughput of my Amazon Kinesis data stream?
You can add data to an Amazon Kinesis data stream via PutRecord and PutRecords operations, Amazon Kinesis Producer Library (KPL), or Amazon Kinesis Agent.
Q: How do I add data to my Amazon Kinesis data stream?
PutRecord operation allows a single data record within an API call and PutRecords operation allows multiple data records within an API call. For more information about PutRecord and PutRecords operations, see PutRecord and PutRecords.
Q: What is the difference between PutRecord and PutRecords?
Amazon Kinesis Producer Library (KPL) is an easy to use and highly configurable library that helps you put data into an Amazon Kinesis data stream. KPL presents a simple, asynchronous, and reliable interface that enables you to quickly achieve high producer throughput with minimal client resources.
Q: What is Amazon Kinesis Producer Library (KPL)?
Amazon Kinesis API is available in Amazon Web Services SDKs. For a list of programming languages or platforms for Amazon Web Services SDKs, see Tools for Amazon Web Services.
Q: What programming languages or platforms can I use to access Amazon Kinesis API?
Amazon Kinesis Producer Library (KPL)'s core is built with C++ module and can be compiled to work on any platform with a recent C++ compiler. The library is currently available in a Java interface. We are looking to add support for other programming languages.
Q: What programming language is Amazon Kinesis Producer Library (KPL) available in?
Amazon Kinesis Agent currently supports Amazon Linux or Red Hat Enterprise Linux.
Q: What platforms do Amazon Kinesis Agent support?
You can download and install Amazon Kinesis Agent using the following command and link:
Q: Where do I get Amazon Kinesis Agent?
After installing Amazon Kinesis Agent on your servers, you configure it to monitor certain files on the disk and then continuously send new data to your Amazon Kinesis data stream. For more information, see Writing with Agents.
Q: How do I use Amazon Kinesis Agent?
The capacity limits of an Amazon Kinesis data stream are defined by the number of shards within the data stream. The limits can be exceeded by either data throughput or the number of PUT records. While the capacity limits are exceeded, the put data call will be rejected with a ProvisionedThroughputExceeded exception. If this is due to a temporary rise of the data stream’s input data rate, retry by the data producer will eventually lead to completion of the requests. If this is due to a sustained rise of the data stream’s input data rate, you should increase the number of shards within your data stream to provide enough capacity for the put data calls to consistently succeed. In both cases, Amazon CloudWatch metrics allow you to learn about the change of the data stream’s input data rate and the occurrence of ProvisionedThroughputExceeded exceptions.
Q: What happens if the capacity limits of an Amazon Kinesis data stream are exceeded while the data producer adds data to the data stream?
Your data blob, partition key, and data stream name are required parameters of a PutRecord or PutRecords call. The size of your data blob (before Base64 encoding) and partition key will be counted against the data throughput of your Amazon Kinesis data stream, which is determined by the number of shards within the data stream.
Q: What data is counted against the data throughput of an Amazon Kinesis data stream during a PutRecord or PutRecords call?
Consumers utilize enhanced fan-out by retrieving data with the SubscribeToShard API. The name of the registered consumer is used within the SubscribeToShard API, which leads to utilization of the enhanced fan-out benefit provided to the registered consumer.
Q: How is enhanced fan-out utilized by a consumer?
Yes, you can have multiple consumers using enhanced fan-out and others not using enhanced fan-out at the same time. The use of enhanced fan-out does not impact the limits of shards for traditional GetRecords usage.
Q: Can I have consumers using enhanced fan-out, and others not?
There is a default limit of 20 consumers using enhanced fan-out per data stream. If you need more than 20, please submit a limit increase request though AWS support. Keep in mind that you can have more than 20 total consumers reading from a stream by having 20 consumers using enhanced fan-out and other consumers not using enhanced fan-out at the same time.
Q: Is there a limit on the number of consumers using enhanced fan-out on a given stream?
We recommend using KCL 2.x, which will automatically register your consumer and use both enhanced fan-out and the HTTP/2 SubscribeToShard API. Otherwise, you can manually register a consumer using the RegisterStreamConsumer API and then you can use the SubscribeToShard API with the name of the consumer you registered.
Q: How do consumers register to use enhanced fan-out and the HTTP/2 SubscribeToShard API?
Yes, there is an on-demand hourly cost for every combination of shard in a stream and consumer (a consumer-shard hour) registered to use enhanced fan-out, in addition to a data retrieval cost for every GB retrieved. See the Kinesis Data Streams pricing page for more details.
Q: Is there a cost associated with the use of enhanced fan-out?
Yes, you will only pay for the prorated portion of the hour the consumer was registered to use enhanced fan-out.
Q: Does consumer-shard hour billing for enhanced fan-out automatically prorate if I terminate or start a consumer within the hour?
You pay a low per GB rate that is metered per byte of data retrieved by consumers using enhanced fan-out. There is no payload roundup or delivery minimum.
Q: How does billing for enhanced fan-out data retrievals work?
No, enhance fan-out can be activated without impacting data producers or data streams.
Q: Do I need to change my producers or my data stream to use enhanced fan-out?
An Amazon Kinesis Application is a data consumer that reads and processes data from an Amazon Kinesis data stream. You can build your applications using either Amazon Kinesis Data Analytics, Amazon Kinesis API or Amazon Kinesis Client Library (KCL).
Q: What is an Amazon Kinesis Application?
Amazon Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET is a pre-built library that helps you easily build Amazon Kinesis Applications for reading and processing data from an Amazon Kinesis data stream.
Q: What is Amazon Kinesis Client Library (KCL)?
Visit the Kinesis Data Streams user documentation to learn how to upgrade from KCL 1.x to KCL 2.x.
Q: How do I upgrade from KCL 1.x to 2.x to use SubscribeToShard and enhanced fan-out?
No, SubscribeToShard requires the use of enhanced fan-out, which means you also need to register your consumer with the Kinesis Data Streams service before you can use SubscribeToShard.
Q: Can I use SubscribeToShard without using enhanced fan-out?
The persistent connection can last up to 5 minutes.
Q: How long does the SubscribeToShard persistent connection last?
Yes, version 2.x of the KCL uses SubscribeToShard and enhanced fan-out to retrieve data with high performance from a Kinesis data stream.
Q: Does the Kinesis Client Library (KCL) support SubscribeToShard?
No, there is no additional cost associated with SubscribeToShard, but you must use SubscribeToShard with enhanced fan-out which does have an additional hourly cost for each consumer-shard combination and per GB of data delivered by enhanced fan-out.
Q: Is there a cost associated with using SubscribeToShard?
Yes, to use SubscribeToShard you need to register your consumers, and registration activates enhanced fan-out. By default, your consumer will utilize enhanced fan-out automatically when data is retrieved via SubscribeToShard.
Q: Do I need to use enhanced fan-out if I want to use SubscribeToShard?
Amazon Kinesis Client Library (KCL) is currently available in Java, Python, Ruby, Node.js, and .NET. Amazon Kinesis Connector Library and Amazon Kinesis Storm Spout are currently available in Java. We are looking to add support for other programming languages.
Q: What programming language are Amazon Kinesis Client Library (KCL), Amazon Kinesis Connector Library, and Amazon Kinesis Storm Spout available in?
No, you can also use Amazon Kinesis API to build your Amazon Kinesis Application. However, we recommend using Amazon Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET if applicable because it performs heavy-lifting tasks associated with distributed stream processing, making it more productive to develop applications.
Q: Do I have to use Amazon Kinesis Client Library (KCL) for my Amazon Kinesis Application?
Amazon Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET acts as an intermediary between Amazon Kinesis Data Streams and your Amazon Kinesis Application. KCL uses the IRecordProcessor interface to communicate with your application. Your application implements this interface, and KCL calls into your application code using the methods in this interface.
Q: How does Amazon Kinesis Client Library (KCL) interact with an Amazon Kinesis Application?
An Amazon Kinesis Application can have multiple application instances and a worker is the processing unit that maps to each application instance. A record processor is the processing unit that processes data from a shard of an Amazon Kinesis data stream. One worker maps to one or more record processors. One record processor maps to one shard and processes records from that shard.
Q: What is a worker and a record processor generated by Amazon Kinesis Client Library (KCL)?
Amazon Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET automatically creates an Amazon DynamoDB table for each Amazon Kinesis Application to track and maintain state information such as resharding events and sequence number checkpoints. The DynamoDB table shares the same name with the application so that you need to make sure your application name doesn’t conflict with any existing DynamoDB tables under the same account within the same region.
Q: How does Amazon Kinesis Client Library (KCL) keep tracking data records being processed by an Amazon Kinesis Application?
You can create multiple instances of your Amazon Kinesis Application and have these application instances run across a set of Amazon EC2 instances that are part of an Auto Scaling group. While the processing demand increases, an Amazon EC2 instance running your application instance will be automatically instantiated. Amazon Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET will generate a worker for this new instance and automatically move record processors from overloaded existing instances to this new instance.
Q: How can I automatically scale up the processing capacity of my Amazon Kinesis Application using Amazon Kinesis Client Library (KCL)?
One possible reason is that there is no record at the position specified by the current shard iterator. This could happen even if you are using TRIM_HORIZON as shard iterator type. An Amazon Kinesis data stream represents a continuous stream of data. You should call GetRecords operation in a loop and the record will be returned when the shard iterator advances to the position where the record is stored.
Q: Why does GetRecords call return empty result while there is data within my Amazon Kinesis data stream?
Each record includes a value called ApproximateArrivalTimestamp. It is set when the record is successfully received and stored by Amazon Kinesis. This timestamp has millisecond precision and there are no guarantees about the timestamp accuracy. For example, records in a shard or across a data stream might have timestamps that are out of order.
Q: What is ApproximateArrivalTimestamp returned in GetRecords operation?
The capacity limits of an Amazon Kinesis data stream are defined by the number of shards within the data stream. The limits can be exceeded by either data throughput or the number of read data calls. While the capacity limits are exceeded, the read data call will be rejected with a ProvisionedThroughputExceeded exception. If this is due to a temporary rise of the data stream’s output data rate, retry by the Amazon Kinesis Application will eventually lead to completions of the requests. If this is due to a sustained rise of the data stream’s output data rate, you should increase the number of shards within your data stream to provide enough capacity for the read data calls to consistently succeed. In both cases, Amazon CloudWatch metrics allow you to learn about the change of the data stream’s output data rate and the occurrence of ProvisionedThroughputExceeded exceptions.
Q: What happens if the capacity limits of an Amazon Kinesis data stream are exceeded while Amazon Kinesis Application reads data from the data stream?
There are two ways to change the throughput of your data stream. You can use the UpdateShardCount API or the AWS Management Console to scale the number of shards in a data stream, or you can change the throughput of an Amazon Kinesis data stream by adjusting the number of shards within the data stream (resharding).
Q: How do I change the throughput of my Amazon Kinesis data stream?
Typical scaling requests should take a few minutes to complete. Larger scaling requests will take longer than smaller ones.
Q: How long does it take to change the throughput of my Amazon Kinesis data stream using UpdateShardCount or the AWS Management Console?
For information about limitations of UpdateShardCount, see the Amazon Kinesis Data Streams Service API Reference.
Q: What are the limitations of UpdateShardCount?
Yes. You can continue adding data to and reading data from your Amazon Kinesis data stream while you use UpdateShardCount or reshard to change the throughput of the data stream.
Q: Does Amazon Kinesis Data Streams remain available when I change the throughput of my Amazon Kinesis data stream using UpdateShardCount or via resharding?
A resharding operation such as shard split or shard merge takes a few seconds. You can only perform one resharding operation at a time. Therefore, for an Amazon Kinesis data stream with only one shard, it takes a few seconds to double the throughput by splitting one shard. For a data stream with 1000 shards, it takes 30K seconds (8.3 hours) to double the throughput by splitting 1000 shards. We recommend increasing the throughput of your data stream ahead of the time when extra throughput is needed.
Q: How often can I and how long does it take to change the throughput of my Amazon Kinesis data stream by resharding it?
Amazon Kinesis stores your data for up to 24 hours by default. You can raise data retention period to up to 7 days by enabling extended data retention.
Q: How do I change the data retention period of my Amazon Kinesis data stream?
Amazon Kinesis Data Streams Management Console displays key operational and performance metrics such as throughput of data input and output of your Amazon Kinesis data streams. Amazon Kinesis Data Streams also integrates with Amazon CloudWatch so that you can collect, view, and analyze CloudWatch metrics for your data streams and shards within those data streams. For more information about Amazon Kinesis Data Streams metrics, see Monitoring Amazon Kinesis Data Streams with Amazon CloudWatch.
Q: How do I monitor the operations and performance of my Amazon Kinesis data stream?
Amazon Kinesis Data Streams integrates with AWS Identity and Access Management (IAM), a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to add data to your Amazon Kinesis data stream. For more information about access management and control of your data stream, see Controlling Access to Amazon Kinesis Data Streams Resources using IAM.
Q: How do I manage and control access to my Amazon Kinesis data stream?
Amazon Kinesis integrates with Amazon CloudTrail, a service that records AWS API calls for your account and delivers log files to you. For more information about API call logging and a list of supported Amazon Kinesis API operations, see Logging Amazon Kinesis API calls Using Amazon CloudTrail.
Q: How do I log API calls made to my Amazon Kinesis data stream for security analysis and operational troubleshooting?
Amazon Kinesis Data Streams allows you to tag your Amazon Kinesis data streams for easier resource and cost management. A tag is a user-defined label expressed as a key-value pair that helps organize AWS resources. For example, you can tag your data streams by cost centers so that you can categorize and track your Amazon Kinesis Data Streams costs based on cost centers. For more information about Amazon Kinesis Data Streams tagging, see Tagging Your Amazon Kinesis Data Streams.
Q: How do I effectively manage my Amazon Kinesis data streams and the costs associated with these data streams?
You can understand how you’re utilizing your shard limit for an account using the DescribeLimits API. The DescribeLimits API will return the shard limit and the number of open shards in your account. If you need to raise your shard limit, please request a limit increase.
Q: How can I describe how I’m utilizing my shard limit?
Yes, you can privately access Kinesis Data Streams APIs from your Amazon Virtual Private Cloud (VPC) by creating VPC Endpoints. With VPC Endpoints, the routing between the VPC and Kinesis Data Streams is handled by the AWS network without the need for an Internet gateway, NAT gateway, or VPN connection. The latest generation of VPC Endpoints used by Kinesis Data Streams are powered by AWS PrivateLink, a technology that enables private connectivity between AWS services using Elastic Network Interfaces (ENI) with private IPs in your VPCs. To learn more about PrivateLink, visit the PrivateLink documentation.
Q: Can I privately access Kinesis Data Streams APIs from my Amazon Virtual Private Cloud (VPC) without using public IPs?
Yes, and there are two options for encrypting the data you put into a Kinesis data stream. You can use server-side encryption , which is a fully managed feature that automatically encrypts and decrypts data as you put and get it from a data stream. Or you can write encrypted data to a data stream by encrypting and decrypting on the client side.
Q: Can I encrypt the data I put into a Kinesis data stream?
Customers often choose server-side encryption over client-side encryption for one of the following reasons:
Q: Why should I use server-side encryption instead of client-side encryption?
Yes, there is a getting started guide in the user documentation.
Q: Is there a server-side encryption getting started guide?
Possibly. This depends on the key you use for encryption and the permissions governing access to the key.
Q: Does server-side encryption interfere with how my applications interact with Kinesis Data Streams?
Yes, however if you are using the AWS-managed CMK for Kinesis and are not exceeding the free tier KMS API usage costs, then your use of server-side encryption is free. The following describes the costs by resource:
Q: Is there an additional cost associated with the use of server-side encryption?
Kinesis Data Streams server-side encryption is available in the AWS GovCloud region and all public regions except the China (Beijing) region.
Q: Which AWS regions offer server-side encryption for Kinesis Data Streams?
All of these operations can be completed using the AWS management console or using the AWS SDK. To learn more, see the Kinesis Data Streams server-side encryption getting started guide.
Q: How do I start, update, or remove server-side encryption from a data stream?
Kinesis Data Streams uses an AES-GCM 256 algorithm for encryption.
Q: What encryption algorithm is used for server-side encryption?
No, only new data written into the data stream will be encrypted (or left decrypted) by the new application of encryption.
Q: If I encrypt a data stream that already has data written to it, either in plain text or ciphertext, will all of the data in the data stream be encrypted or decrypted if I update encryption?
Server-side encryption encrypts the payload of the message along with the partition key, which is specified by the data stream producer applications.
Q: What does server-side encryption for Kinesis Data Streams encrypt?
Server-side encryption is a stream specific feature.
Q: Is server-side encryption a shard specific feature or a stream specific feature?
Yes, using the AWS management console or the AWS SDK you can choose a new master key to apply to a specific data stream.
Q: Can I change the CMK that is used to encrypt a specific data stream?
The following walks you through how Kinesis Data Streams uses AWS KMS CMKs to encrypt a message before it is stored in the PUT path, and to decrypt it after it is retrieved in the GET path. Kinesis and AWS KMS perform the following actions (including decryption) when you call putRecord(s) or getRecords on a data stream with server-side encryption enabled.
Q: Can you walk me through the encryption lifecycle of my data from the point in time when I send it to a Kinesis data stream with server-side encryption enabled, and when I retrieve it?
No. Amazon Kinesis Data Streams is not currently available in AWS Free Tier. AWS Free Tier is a program that offers free trial for a group of AWS services. For more details about AWS Free Tier, see AWS Free Tier.
Q: Is Amazon Kinesis Data Streams available in AWS Free Tier?
Amazon Kinesis Data Streams uses simple pay as you go pricing. There is neither upfront cost nor minimum fees, and you only pay for the resources you use. The cost of Amazon Kinesis Data Streams has two core dimensions and three optional dimensions:
Q: How much does Amazon Kinesis Data Streams cost?
PUT Payload Unit charge is calculated based on the number of 25KB payload units added to your Amazon Kinesis data stream. PUT Payload Unit cost is consistent when using PutRecords operation or PutRecord operation.
Q: Does my PUT Payload Unit cost change by using PutRecords operation instead of PutRecord operation?
A shard could be in "CLOSED" state after resharding. You will not be charged for shards in "CLOSED" state.
Q: Am I charged for shards in "CLOSED" state?
If you use Amazon EC2 for running your Amazon Kinesis Applications, you will be charged for Amazon EC2 resources in addition to Amazon Kinesis Data Streams costs.
Q: Other than Amazon Kinesis Data Streams costs, are there any other costs that might incur to my Amazon Kinesis Data Streams usage?
Our Amazon Kinesis Data Streams SLA guarantees a Monthly Uptime Percentage of at least 99.9% for Amazon Kinesis Data Streams.
Q: What does the Amazon Kinesis Data Streams SLA guarantee?
You are eligible for a SLA credit for Amazon Kinesis Data Streams under the Amazon Kinesis Data Streams SLA if more than one Availability Zone in which you are running a task, within the same region has a Monthly Uptime Percentage of less than 99.9% during any monthly billing cycle.
Q: How do I know if I qualify for a SLA Service Credit?