Eager vs. Lazy Architectures
Choice: Eager or Lazy (a.k.a. Push vs. Pull)
Every data-driven software architecture must make a fundamental choice: Be Eager Or be Lazy
Streaming data is a movement to build less lazy data-driven applications. This debate is sometimes called push vs. pull.
Not a black and white decision (how eager or lazy)
Possible to be eager about some things, and lazy about others.
How eager is the state of the art?
The status quo is very lazy. Most applications do nothing until asked (queried)
Streaming data mostly used to eagerly truck data between lazy warehouses.
Stream processors will eagerly transform and count data as it’s moved.
What you get are data pipelines
Recreating the monoliths the industry spent the last two decades dismantling.
The oil and gas industry is not a good reference architecture for streaming data
Data may be the new oil economically.
But data is not oil.
The value of OIL is that it’s made up of CARBON atoms. The value of DATA is NOT that it’s made up of BITS
How do you control what you receive? (pub-sub)
Streaming data is about reducing latency
Use Case: Observability
Situational awareness Monitoring & alerting Experience scoring Early fault warning
Use Case: Automation
Root cause analysis Logistics routing Self-healing systems Performance optimization Cost optimization
Message brokers vs. distributed transaction logs
Stream processors are good at counting and conversions
None of the above use cases are counting or conversion problems.
kSQL can transform streams and enrich with static context
kSQL struggles to join multiple streams together. Most use cases involve more than one stream.
Flink stateful functions can transform streams and access limited global state
Global state doesn’t scale. Flink stateful functions are just a facade of statefulness. Not optimized for locality, just a hidden query.
Challenge: Real-Time Enrichment
Can’t get rich without context. Stream processors not built for context.
Challenge: Streaming Aggregation
Need to up-level streaming data without adding latency. Stream processors not built for this either.
Challenge: Real-time business logic
Business logic drives enterprise software. Business logic takes action, has side effects.
Business logic needs lots of context as input. Stream processors not designed to run business logic.
Challenge: Streaming APIs
Need to disseminate granular, derivative data without breaking the stream.
Without streaming APIs, you can’t built low latency service-oriented architectures.
Challenge: Real-Time UIs
Need to keep humans in the loop. Humans need to validate and oversee automation.