Wednesday, April 27, 2022

[SOLVED] CloudWatch Agent: batch size equal to "1" - is it a bad idea?

Issue

If I correctly understand, a CloudWatch Agent publishes events to CloudWatch by using a of kind of batching, the size of which is specified by href="http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html" rel="nofollow noreferrer">the two params:

batch_count:

Specifies the max number of log events in a batch, up to 10000. The default value is 1000.

batch_size

Specifies the max size of log events in a batch, in bytes, up to 1048576 bytes. The default value is 32768 bytes. This size is calculated as the sum of all event messages in UTF-8, plus 26 bytes for each log event.

I guess, that in order to eliminate a possibility of loosing any log data in case of a EC2 instance termination, the batch_count should be equal to 1 (because in case of the instance termination all logs will be destroyed). Am I right that this is only one way to achieve it, and how this can affect the performance? Will it have any noticeable side-effects?


Solution

Yes, it's a bad idea. You are probably more likely to lose data that way. The PutLogEvents API that the agent uses is limited to 5 requests per second per log stream (source). With a batch_count of 1, you'd only be able to publish 5 log events per second. If the application were to produce more than that consistently, the agent wouldn't be able to keep up.

If you absolutely can't afford to lose any log data, maybe you should be writing that data to a database instead. There will always be some risk of losing log data, even if with a batch_count of 1. The host could always crash before the agent polls the log file... which BTW is every 5 seconds by default (source).



Answered By - Daniel Vassallo
Answer Checked By - David Goodson (WPSolving Volunteer)