Thursday, October 28, 2021

[SOLVED] Python SMTP Server (Response Time) is unusually high PRTG

Issue

I'm a newbie so excuse my basic knowledge. I have a Python script to receive SMTP messages 'Server' and send it to SMS Service Provider's API, the script is running on Ubuntu Server OS 18.04 with nohup and & ... While I have no issue with functionality, every morning between 05:00 and 05:30 I am having abnormally late responses from the server (I can see that both from PRTG and the logs). From the code logs below you can see that the functionality still works but the it's making requests wait longer than normal 'doing so makes the sent code with SMS unusable.'

Your help will be highly appreciated.

OS info:

    Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-101-generic x86_64)

  System load:  0.08              Processes:             193
  Usage of /:   9.8% of 97.93GB   Users logged in:       0
  Memory usage: 9%                IP address for ens160: XXXX
  Swap usage:   0%

Python Code:

import asyncio
from aiosmtpd.controller import Controller
import asyncore
import time
import requests
import logging
from smtplib import SMTP as Client


logging.basicConfig(filename='smtpServer.log',level=logging.INFO, format='%(asctime)s %(message)s')


class DataHandler: #this is the server taking the data and processing it.
    async def handle_RCPT(self, server, session, envelope, address, rcpt_options):
        envelope.rcpt_tos.append(address)
        return '250 OK'

    async def handle_DATA(self, server, session, envelope):
        msg_content = envelope.content.decode('utf8', errors='replace')
        sent_to = envelope.rcpt_tos[0].split('@')[0][1:]
        api_key = "API_KEY"
        word_list = msg_content.split()
        code = " "
        if 'AuthCode:' in word_list:
            code_index = word_list.index('AuthCode:')
            code = word_list[code_index+1]
        msg= 'Your Token Code to access XXX VPN is ' + code +'. Do not share the password with anyone'
        headers = {'Content-Type': 'application/json'}


   --- code omitted ---

        elif sent_to[:2] == '90': #if it's a local number 'starting with 90'
            try:
                url = "SMS_ServiceProvider_URL"
                payload = '{"api_key":"'+api_key+'","title":"8507013986","text":"'+msg+'","sentto":"'+sent_to+'"}'
                print('{} Code has been sent to +{}'.format(code, sent_to))
                requests.request("POST", url, headers=headers, data = payload)
                logging.info('{} Code has been sent to +{} by local server'.format(code, sent_to))
            except Exception as e:
                raise e ('Failed sending to local number ERROR')

   --- code omitted ---

        return '250 Message accepted for delivery'

controller = Controller(DataHandler(), hostname='XXXX', port=25)
controller.start()

while True:
    time.sleep(60)
    print(end='')

Logging Messages:

-- Normal case:

2020-06-04 10:35:25,327 Peer: ('<IP>', 16750)
2020-06-04 10:35:25,327 ('<IP>', 16750) handling connection
2020-06-04 10:35:25,327 ('<IP>', 16750) Data: b'EHLO example.com'
2020-06-04 10:35:25,328 ('<IP>', 16750) Data: b'MAIL FROM:<[email protected]>'
2020-06-04 10:35:25,328 ('<IP>', 16750) sender: [email protected]
2020-06-04 10:35:25,328 ('<IP>', 16750) Data: b'RCPT TO:<+9*******[email protected]>'
2020-06-04 10:35:25,329 ('<IP>', 16750) recip: +9*******[email protected]
2020-06-04 10:35:25,329 ('<IP>', 16750) Data: b'DATA'
2020-06-04 10:35:26,568 043396 Code has been sent to +9*******9794 by local server
2020-06-04 10:35:26,569 ('<IP>', 16750) Data: b'QUIT'
2020-06-04 10:35:26,570 ('<IP>', 16750) connection lost

-- During Downtime:

2020-06-04 05:12:12,797 Peer: ('<IP>', 59450)
2020-06-04 05:12:12,797 Peer: ('<IP>', 6845)
2020-06-04 05:12:12,797 Peer: ('<IP>', 59798)
2020-06-04 05:12:12,797 Peer: ('<IP>', 10667)
2020-06-04 05:12:12,797 Peer: ('<IP>', 24380)
2020-06-04 05:12:12,797 Peer: ('<IP>', 60299)
2020-06-04 05:12:12,797 Peer: ('<IP>', 23987)
2020-06-04 05:12:12,797 ('<IP>', 7834) connection lost
2020-06-04 05:12:12,798 Connection lost during _handle_client()
2020-06-04 05:12:12,798 ('<IP>', 18527) Data: b'QUIT'
2020-06-04 05:12:12,798 ('<IP>', 9209) Data: b'QUIT'
2020-06-04 05:12:12,798 ('<IP>', 59450) handling connection
2020-06-04 05:12:12,798 ('<IP>', 6845) handling connection
2020-06-04 05:12:12,798 ('<IP>', 59798) handling connection
2020-06-04 05:12:12,797 ('<IP>', 7834) connection lost
2020-06-04 05:12:12,798 Connection lost during _handle_client()
2020-06-04 05:12:12,798 ('<IP>', 18527) Data: b'QUIT'
2020-06-04 05:12:12,798 ('<IP>', 9209) Data: b'QUIT'
2020-06-04 05:12:12,798 ('<IP>', 59450) handling connection
2020-06-04 05:12:12,798 ('<IP>', 6845) handling connection
2020-06-04 05:12:12,798 ('<IP>', 59798) handling connection
2020-06-04 05:12:12,798 ('<IP>', 10667) handling connection
2020-06-04 05:12:12,799 ('<IP>', 24380) handling connection
2020-06-04 05:12:12,799 ('<IP>', 60299) handling connection
2020-06-04 05:12:12,799 ('<IP>', 23987) handling connection
2020-06-04 05:12:12,799 ('<IP>', 18527) connection lost
2020-06-04 05:12:12,799 Connection lost during _handle_client()
2020-06-04 05:12:12,799 ('<IP>', 9209) connection lost
2020-06-04 05:12:12,799 Connection lost during _handle_client()
2020-06-04 05:12:12,799 ('<IP>', 59798) EOF received
2020-06-04 05:12:12,799 ('<IP>', 59450) connection lost
2020-06-04 05:12:12,799 Connection lost during _handle_client()
2020-06-04 05:12:12,800 ('<IP>', 59798) connection lost
2020-06-04 05:12:12,800 ('<IP>', 6845) Data: b'EHLO example.com'
2020-06-04 05:12:12,800 ('<IP>', 10667) Data: b'EHLO example.com'
2020-06-04 05:12:12,800 ('<IP>', 24380) Data: b'EHLO example.com'
2020-06-04 05:12:12,801 ('<IP>', 60299) Data: b'EHLO TWPRTGIST01'
2020-06-04 05:12:12,801 ('<IP>', 6845) Data: b'MAIL FROM:<[email protected]>'
2020-06-04 05:12:12,801 ('<IP>', 6845) sender: [email protected]
2020-06-04 05:12:12,801 ('<IP>', 10667) Data: b'MAIL FROM:<[email protected]>'
2020-06-04 05:12:12,802 ('<IP>', 10667) sender: [email protected]
2020-06-04 05:12:12,802 ('<IP>', 24380) Data: b'MAIL FROM:<[email protected]>'
2020-06-04 05:12:12,802 ('<IP>', 24380) sender: [email protected]
2020-06-04 05:12:12,802 ('<IP>', 23987) Data: b'MAIL FROM:<[email protected]>'
2020-06-04 05:12:12,802 ('<IP>', 23987) sender: [email protected]
2020-06-04 05:12:12,803 ('<IP>', 60299) Data: b'QUIT'
2020-06-04 05:12:12,803 ('<IP>', 6845) Data: b'RCPT TO:<+9**********@<IP>>'
2020-06-04 05:12:12,803 ('<IP>', 6845) recip: +9**********@<IP>
2020-06-04 05:12:12,803 ('<IP>', 10667) Data: b'RCPT TO:<+9**********@<IP>>'
2020-06-04 05:12:12,803 ('<IP>', 10667) recip: +9**********@<IP>
2020-06-04 05:12:12,803 ('<IP>', 24380) Data: b'RCPT TO:<+9**********@<IP>>'
2020-06-04 05:12:12,803 ('<IP>', 24380) recip: +9**********@<IP>
2020-06-04 05:12:12,804 ('<IP>', 60299) connection lost
2020-06-04 05:12:12,804 Connection lost during _handle_client()
2020-06-04 05:12:12,804 ('<IP>', 23987) Data: b'RCPT TO:<+9**********@<IP>>'
2020-06-04 05:12:12,804 ('<IP>', 23987) recip: +9**********@<IP>
2020-06-04 05:12:12,804 ('<IP>', 6845) Data: b'DATA'
2020-06-04 05:12:12,804 ('<IP>', 10667) Data: b'DATA'
2020-06-04 05:12:12,804 ('<IP>', 24380) Data: b'DATA'
2020-06-04 05:12:12,805 ('<IP>', 23987) Data: b'DATA'
2020-06-04 05:13:52,957 708496 Code has been sent to +9********** by local server
2020-06-04 05:15:33,051 619421 Code has been sent to +9********** by local server
2020-06-04 05:17:13,215 035670 Code has been sent to +9********** by local server
2020-06-04 05:18:53,341 861227 Code has been sent to +9********** by local server

Solution

Short answer:

thinking out of the box i can see that between each logged message in the end is 1 minute and 40 seconds - that cannot be coincidence.

Check what is the limit for your SMS Gateway - how many messages can you send in a minute and if there is any shaping algorithm being applied once some amount is sent.

Longer answer:

From the log i can see that python handles concurrent connections without any delays, but the final POST request is delayed. It means that your server should have enough CPU/Memory to work, but last operation is the bottleneck. Since you run it with async it should not be the issue on the Python side, but on the receiving side - SMS Gateway. This service is most probably taking all your requests and add them to some queue, so it doesn't matter if they are processed asynchronously on your end.

This issue has nothing to do with PRTG/monitoring as far i can say, but still you can prove me wrong - add few sensors to that server to watch for disk I/O, CPU load, Memory usage and see if the server has any performance issues during day or night



Answered By - jana