Overview of Amazon Kinesis & Kinesis Firehose

What is Data Streaming?

‘Data streaming’ is a real-time flow of data, which is generated by multiple sources. Data streams can be moved to a centralized location for ‘data ingestion’.

‘Data ingestion’ is when a data stream is moved to a destination where it’s stored and analyzed in some fashion. Data ingestion can produce analytic data for later use.

What is Amazon Kinesis?

Amazon Kinesis is a scalable ‘data streaming’ service for ingesting data; up to thousands of sources of data, coming from many different platforms and services can ‘stream’ data in real-time to Kinesis, and be sent to a location for analysis.

A real world example of this would be if a mobile app with millions of users tracks the touch inputs of their users, and a stream is sent to Kinesis. It then can be analysed for useful information, such as most often used areas of the screen.

Architecture of Using Amazon Kinesis

Producers — any device or computing instance that is able to use Kinesis APIs to gather data; this could be servers, EC2 instances, IoT devices, mobile devices, etc. This is the data that will be used for the data stream

Amazon Kinesis Service — this is the ‘data stream’ that is produced; it is able to ‘ingest’ data from ‘producer devices’, which is kept in a 24-hour ‘data stream’ for ‘consumption’ by the ‘consumer device’

Shard — capacity unit for the ‘data stream’ that allows the data stream to scale; each stream starts with ‘1’ shard that is shared by all attached ‘producers’ and ‘consumers’ by default. Each shard allows 1 MB of ‘ingestion’ and 2 MB of ‘consumption’. Shards can be added and removed as needed to control scaling.

Consumers — any device or computing instance that is able to use Kinesis APIs to consume data streams.

Creating a Data Stream

This can be done at AWS Console>Services>Amazon Kinesis>Data Streams>Create Data Stream.

Data stream configuration:

  • Data stream name — the name assigned to the created data stream

A ‘shard estimator’ is available to properly estimate needed capacity:

  • Writing to the stream:
    Average record size in KB — default is 1024
    Max records written per second — default is 1 second
  • Reading from the stream:
    Total number of consumers — default is 1
  • Estimated number of open shards — default is 1

Data stream capacity:

  • Number of open shards — how many shards are applied to the stream

What is Amazon Kinesis Firehose?

Amazon Kinesis Firehose is a service that takes data streams, and instead of the data being used by a ‘consumer’, is sent to be persistently stored in S3 or AWS Redshift.

Along with being sent to storage, Firehose also supports sending data to third-party services like Elasticsearch or Splunk.

--

--

--

DevOps Engineer writing about breaking into the industry

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

From spare-time freelancer to full-time developer and beyond

AZURE KUBERNETES SERVICE

Java Collections Framework

Overriding Pandas DataFrame in an Import Statement

Enumerate like a pro 😎

Do you need a lot to get started in IT?

Hello Streaming, fancy meeting you here

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Christian Talavera

Christian Talavera

DevOps Engineer writing about breaking into the industry

More from Medium

CS371p Spring 2022: Malithy Wimalasooriya

JavaScripts Variables

Databricks with MLflow — Easy to use for implementing end-to-end data science pipeline

API3 AND THE FUTURE