Issue
Solution has the following request flow Client -> AWS CloudFront -> AWS ALB -> AWS EC2
.
AWS CloudFront with support to make POST requests.
- Client sends binary data to AWS CF endpoint (e.g.
curl -v -H "Connection: close" --data-binary @<path to file> http://example.com/api
) - Origin (Nodejs, Express app on AWS EC2 instance) receives a request and just pipes it to the response (pseudocode provided below)
- AWS CF stops reading the request body as soon as the first byte is written to the response stream.
As a result, the request stream does not produce the 'end' event (cuz CF stops sending request data), the app doesn't close the response stream, and, consequently, timeout on the client side.
Direct usage of AWS ALB (without AWS CloudFront) does not have this problem. So my idea, for now, is that AWS CF expects the Origin to consume all the request data before starting to send any response byte.
That's a huge and not obvious restriction...
Is it correct ? Or there are some headers/options for AWS CloudFront to handle this logic appropriately ?
Application example:
// no body-parser, etc
router.post(
"/api",
async (req: express.Request, res: express.Response) => {
return pipeline(req, res, error => console.log('done', error));
}
);
As a temporary solution, decided to use an intermediate memory buffer that accumulates the response body until I read all the request input data.
// no body-parser, etc
router.post(
"/api",
async (req: express.Request, res: express.Response) => {
return pipeline(req, new Buffered(), res, error => console.log('done', error));
}
);
// Buffered.ts
import { Transform, TransformCallback } from "stream";
export class Buffered extends Transform {
private _acc: Array<Buffer> = [];
_transform(
chunk: Buffer,
encoding: BufferEncoding,
callback: TransformCallback
) {
this._acc.push(chunk);
callback();
}
_flush(callback: TransformCallback) {
callback(null, Buffer.concat(this._acc));
}
}
Solution
Partially solved problem.
AWS CloudFront and AWS Application Load Balancer do not provide enough transparency in terms of requests/response flow. Initially, I thought that it's AWS CloudFront stopped reading request data, but it turns out AWS ALB has the same behavior (retested on much larger files).
Finally, I found a similar question AWS Loadbalancer terminating http call before complete http response is sent and tried to use AWS Classic Load Balancer, and...it worked - application still was able to read request data even after start writing response.
So, the entire solution was adjusted to
Client1 (api.example.com) -> AWS CLB -> EC2
^
| (only for some API)
Client2 (cdn.example.com) -> -> -> -> -> S3
All API endpoints are available via api.example.com
, but some of them (with a plain request/response logic) are also available via cdn.example.com
to speed it up.
The downside of this approach is that AWS CLB is a 'previous generation' balancer. Probably, AWS NLB is more suitable for this use case.
Answered By - Eduard Bondarenko Answer Checked By - Dawn Plyler (WPSolving Volunteer)