Overview of Amazon Kinesis & Kinesis Firehose

Christian Talavera
2 min readMay 17, 2021

What is Data Streaming?

‘Data streaming’ is a real-time flow of data, which is generated by multiple sources. Data streams can be moved to a centralized location for ‘data ingestion’.

‘Data ingestion’ is when a data stream is moved to a destination where it’s stored and analyzed in some fashion. Data ingestion can produce analytic data for later use.

What is Amazon Kinesis?

Amazon Kinesis is a scalable ‘data streaming’ service for ingesting data; up to thousands of sources of data, coming from many different platforms and services can ‘stream’ data in real-time to Kinesis, and be sent to a location for analysis.

A real world example of this would be if a mobile app with millions of users tracks the touch inputs of their users, and a stream is sent to Kinesis. It then can be analysed for useful information, such as most often used areas of the screen.

Architecture of Using Amazon Kinesis

Producers — any device or computing instance that is able to use Kinesis APIs to gather data; this could be servers, EC2 instances, IoT devices, mobile devices, etc. This is the data that will be used for the data stream

Amazon Kinesis Service — this is the ‘data stream’ that is produced; it is able to ‘ingest’ data from ‘producer devices’, which is kept in a 24-hour ‘data stream’ for ‘consumption’ by the ‘consumer device’

Shard — capacity unit for the ‘data stream’ that allows the data stream to scale; each stream starts with ‘1’ shard that is shared by all attached ‘producers’ and ‘consumers’ by default. Each shard allows 1 MB of ‘ingestion’ and 2 MB of ‘consumption’. Shards can be added and removed as needed to control scaling.

Consumers — any device or computing instance that is able to use Kinesis APIs to consume data streams.

Creating a Data Stream

This can be done at AWS Console>Services>Amazon Kinesis>Data Streams>Create Data Stream.

Data stream configuration:

  • Data stream name — the name assigned to the created data stream

A ‘shard estimator’ is available to properly estimate needed capacity:

  • Writing to the stream:
    Average record size in KB — default is 1024
    Max records written per second — default is 1 second
  • Reading from the stream:
    Total number of consumers — default is 1
  • Estimated number of open shards — default is 1

Data stream capacity:

  • Number of open shards — how many shards are applied to the stream

What is Amazon Kinesis Firehose?

Amazon Kinesis Firehose is a service that takes data streams, and instead of the data being used by a ‘consumer’, is sent to be persistently stored in S3 or AWS Redshift.

Along with being sent to storage, Firehose also supports sending data to third-party services like Elasticsearch or Splunk.

--

--

Christian Talavera

DevOps Engineer writing about breaking into the industry