Overview of Amazon Kinesis & Kinesis Firehose

Christian Talavera
2 min readMay 17, 2021

What is Data Streaming?

‘Data streaming’ is a real-time flow of data, which is generated by multiple sources. Data streams can be moved to a centralized location for ‘data ingestion’.

‘Data ingestion’ is when a data stream is moved to a destination where it’s stored and analyzed in some fashion. Data ingestion can produce analytic data for later use.

What is Amazon Kinesis?

Amazon Kinesis is a scalable ‘data streaming’ service for ingesting data; up to thousands of sources of data, coming from many different platforms and services can ‘stream’ data in real-time to Kinesis, and be sent to a location for analysis.

A real world example of this would be if a mobile app with millions of users tracks the touch inputs of their users, and a stream is sent to Kinesis. It then can be analysed for useful information, such as most often used areas of the screen.

Architecture of Using Amazon Kinesis

Producers — any device or computing instance that is able to use Kinesis APIs to gather data; this could be servers, EC2 instances, IoT devices, mobile devices, etc. This is the data that will be used for the data stream

Amazon Kinesis Service — this is the ‘data stream’ that is produced; it is able to ‘ingest’ data from ‘producer devices’, which is kept in a 24-hour ‘data stream’ for ‘consumption’ by the ‘consumer device’

Shard — capacity unit for the ‘data stream’ that allows the data stream to scale; each stream starts with ‘1’ shard that is shared by all attached ‘producers’ and ‘consumers’ by default. Each shard allows 1 MB of ‘ingestion’ and 2 MB of ‘consumption’. Shards can be added and removed as needed to control scaling.

Consumers — any device or computing instance that is able to use Kinesis APIs to consume data streams.

Creating a Data Stream

This can be done at AWS Console>Services>Amazon Kinesis>Data Streams>Create Data Stream.

Data stream configuration:

  • Data stream name — the name assigned to the created data stream

A ‘shard estimator’ is available to properly estimate needed capacity:

  • Writing to the stream:
    Average record size in KB — default is 1024
    Max records written per second — default is 1 second
  • Reading from the stream:
    Total number of consumers — default is 1
  • Estimated number of open shards — default is 1

Data stream capacity:

  • Number of open shards — how many shards are applied to the stream

What is Amazon Kinesis Firehose?

Amazon Kinesis Firehose is a service that takes data streams, and instead of the data being used by a ‘consumer’, is sent to be persistently stored in S3 or AWS Redshift.

Along with being sent to storage, Firehose also supports sending data to third-party services like Elasticsearch or Splunk.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Christian Talavera
Christian Talavera

Written by Christian Talavera

DevOps Engineer writing about breaking into the industry

No responses yet

Write a response