Event-driven architectures are ideal for improving agility and moving quickly. They’re commonly found in modern applications that has decoupled components. When adopting an event-driven architecture, you may need to rethink the way you view your application design.
The event source should be reliable and guarantee delivery if you need to process every single event. That is why we use Amazon SNS and Amazon SQS to hub the events. Also, it should be able to handle the asynchronous nature of events.
Amazon provides two other solutions for event-driven applications, Amazon EventBridge and Amazon MSK (Apache Kafka) but these solutions are way more expensive than SNS + SQS.
In a far past, I had worked on the former biggest royalty free stock image marketplace of Latin America, and, back in time, we do not used as much cloud as we could, getting back into my memories, I proposed myself a challenge to re-architecture the system using AWS Well Architected Framework and cloud best practices, making the entire system decoupled, serverless, event driven and extremely scalable at the lowest cost possible.
The system consists of three pillars, first, the homesite which sells packs and bundles, and allows the user to search and navigate through millions of images, filtered by tags, colors, contexts and similarity.
Second, we have the photographer panel, that one allows the user to upload a huge amount of large original-sized images. And last, but not least, the admin panel, to control and visualize everything that’s happening on the system.
The biggest issue and attention point is the file upload and categorization, that must be consistent, resilient, scalable and recoverable
My initial idea is to use S3 with CloudFront Distribution to reach edge locations with low-latency, speeding up the upload and download, triggering a Lambda that calls AWS Rekognition for image analysis.
AWS Rekognition is one of the cloud options for image analisys, we could use Google AI Vision also, but, for study purposes we will use every resource possible on Amazon Web Services.
Rekognition captures an image and breaks it down into objects, context and people — it could detect famous people also. These features will feed a AWS Cloudsearch and will be used to filter inappropriate content automatically.
We can also use AWS Rekognition to feed a AWS SageMaker model, and use it to train a model for an autonomous content moderation, and use AWS QuickSight to detect trends and demands.
As the main function inside the system, the photographer and admin panels architecture would be like:
Looking deep inside the VPC, we will see this flow for a file upload:
The main principle here is scalability and reliability, the system is event driven and is queue managed, so, every file will be processed flawlessly.
We are using Lambdas, so we can run concurrently a thousand requests. They are small functions in NodeJS, that run as fast as possible, since the Lambda cost goes way up as long it runs.
The flow starts on a S3 hosted static website, that upload a file to a private S3 bucket through Cloudfront`s Edge Locations, to minimize the time spent on uploading files. Once the file is completelly uploaded to this private S3 bucket, a Lambda is triggered and save the original metadata and filepath into DynamoDB. Another triggered Lambda will add this new file into a queue on AWS SQS.
The queue will call a AWS StepFunction called Publication Proccess, and inside it, the image is sent to analysis under AWS Rekognition, the captured metadata is passed through a Lambda and saved into DynamoDB, then, the original file is moved to a S3 Glacier, saving money and resources, and two other StepFunctions are called.
The first is the Public Image Treatment, in this step function the images are resized and watermarked, then put to a private S3 Bucket with a Public Cloudfront Distribution.
The second, is the Private Image, the purchasable ones, in this step, we will resize for downloadable sizes and put into a private S3 Bucket with a Encrypted Cloudfront Distribution, using KMS to store the key for decryption. Using Cloudfront Encrypted Distribution, each call on the image is a unique URL, providing security and reliability to the service.
All other parts of the system consists in combining S3 static sites with Lambda through API Gateway and persistence in DynamoDB, the public website is low latency edge located by a Cloudfront Distribution that uses Cloudsearch to index image’s metadata.
The purchase flow and logged in area are similar to the public website, but uses KMS to encrypt the users purchased images and generate encrypted URLs in Cloudfront, so, each time the user click in “Download” a purchased image, it will generate a new encrypted URL in Cloudfront that is only accessible once.
Each user has their own CMK on KMS, so no user can access another user image, and, as the URL is unique and only readable once, it cannot be published publicly.
When the user purchase a image, a Lambda will save this information on DynamoDB and set it as “Processing”, this insert on Dynamo, triggers another Lambda that insert this purchase request in a queue, and when processed, a Step Functions will treat the user informations, generate the user’s unique encrypted image, generate the unique URL and notificate the user through Amazon SNS, it can also push this purchase info in a Purchase AI Model, that will learn the user’s preference and feed a Cloudsearch algorithm.
This flow is kinda huge and might seem overengineered, but it allows you to grow, develop and include new items to it, without affecting the running process. We could add, for example, a Lambda inside the Step Function that execute a Blockchain Client on an EC2 instance to register the purchase as an NFT, without the need to change the entire flow.
The original project was a PHP monolith using Yii Framework, back in time, we didn’t have the knowledge we have now, so, today I would consider using Node.JS and Python in Lambdas to reduce as much as possible the execution time. Small sized Lambdas, that execute pretty much only one function, are the best way to be cost effective, and pure NodeJS and Python applications can deliver that.
For Lambdas that retrieve data for the static frontend, we should use NodeJS with some speedy framework, like HapiJS or NextJS. These frameworks come with a good support for caching and great tools for ease the development.
Lambdas that treat and resize images, apply watermarks, read metadata and move things from one bucket to another, could be written in Python, pure and strict. In this way, these jobs that runs for longer time will, at least, use the minimal amount of memory possible.
The frontend must obviously be written in React, as we could use static components from S3, generated dynamically through Lambdas. For example, we can have a page builder, that the Admin User could use a WYSIWYG to build a HTML component that will be dynamically added to the S3, and React will read it directly from there.
We will use US-EAST-1 as the region for our test, and distribute to specific places using Cloudfront Distribution.
About pricing, obviously the cost will go up as much as the product grows, but Lambda functions have a nice free tier, with A MILLION calls and 400gb of ram usage, monthly.
DynamoDB and EC2 instances t2.micro are also free tier eligible, and booths will be used in the project.
Our price was calculated using calculator.aws and it’s calculated using EXTREMELY HUGE NUMBERS, simulating a scenarium that the company is big enough.
- S3 and S3 Glacier Storage for 1TB Data — 33,40 USD
- API Gateway Rest API for 3M requests, cacheless — 10,50 USD
- Lambda servers 5M calls with an average runtime of 400ms — 27,47 USD
- Cloudfront Distributions, 10TB/m — 2,00 USD
- Step Functions — Free tiers cover it all.
- DynamoDB (without free tier) for a 5M read/write op. — 138,75 USD
- Amazon SQS, 5M calls — 2,00 USD
- AWS SNS, 5M distributed notifications — 63,00 USD
- KMS key storage — 20,00 USD
- CloudSearch c4.2xLarge — 428,00 USD
It gives us an estimate of 721,73 USD monthly, and it could be reduced by 10% ~ 20% if we provision some of the services using savings plans.
Remembering that, we did the estimate for a highly accessed application, a startup company with just a few requests/month wouldn’t pay a thing, because most of the services listed have a good free tier offer.
Beside the cost, this architecture would allow us to work decoupled, with clear delimitations of each service and each function. More effective than a Kubernetes or a Docker Service Mesh, a Lambda event-driven system can be scalable to the infinite and beyond.
Using decoupled architecture assures maintenance and sustainability, helping the development team to focus on the code and in delivering features regardless of the infrastructure.
Imagine if you build this entire environment using Kubernetes or Docker Mash, or in a far worse scenario, in Microsservices built under EC2 instances, you would need to run LoadBalancers, Beanstalks, CloudWatches, configure auto-scaling, Operational System management, Security Groups, and so on. Your team wouldn’t be capable to focus only on developing.
Also, decoupled event-driven applications allow the team to develop new features asynchronous, that run concurrently and alongside the currently working features, without the need to change its functionality.
For an event-driven application, could be used Apache Kafka also, but it cost is way higher than using SQS + SNS, which fulfills our need for this project. Kafka could be used in scenarios where you need to work more asynchronously, because Kafka’s event ledger is queryable and stored forever.
That is it, I think that approach would ease the development and allow the former company to grow, please, let me know if you have any insights, doubts or adjustments to this article.