Issue
So 2 of my docker containers for this project are a Python image running a Scrapy project and a Postgres image.
docker-compose.yml
version: '3.8'
services:
app:
container_name: app
build:
context: ./app
dockerfile: dockerfile
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_HOST=${POSTGRES_HOST}
- POSTGRES_PORT=${POSTGRES_PORT}
- MAILTO=${MAILTO}
depends_on:
- db
db:
container_name: db
build:
context: ./db
dockerfile: dockerfile
args:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
ports:
- "${POSTGRES_PORT}:${POSTGRES_PORT}"
admin:
container_name: admin
image: dpage/pgadmin4
environment:
- PGADMIN_DEFAULT_EMAIL=${PGADMIN_DEFAULT_EMAIL}
- PGADMIN_DEFAULT_PASSWORD=${PGADMIN_DEFAULT_PASSWORD}
ports:
- "8888:80"
depends_on:
- db
visualizer:
container_name: visualizer
image: grafana/grafana
ports:
- "3000:3000"
depends_on:
- db
dockerfile for app
FROM python:3.10-bookworm
RUN apt-get update -q
RUN apt-get install -y cron
COPY . .
RUN pip3 install -r requirements.txt
COPY shell_scripts/scrape_cron /etc/cron.d/scrape_cron
RUN chmod 0744 /etc/cron.d/scrape_cron
RUN crontab /etc/cron.d/scrape_cron
RUN touch /var/log/cron.log
CMD cron && tail -f /var/log/cron.log
dockerfile for db
FROM postgres:15.0
USER postgres
ARG POSTGRES_USER
ARG POSTGRES_PASSWORD
ARG POSTGRES_DB
ENV POSTGRES_USER=$POSTGRES_USER
ENV POSTGRES_PASSWORD=$POSTGRES_PASSWORD
ENV POSTGRES_DB=$POSTGRES_DB
RUN pg_createcluster 15 main && \
/etc/init.d/postgresql start && \
psql --command "CREATE ROLE $POSTGRES_USER WITH SUPERUSER PASSWORD '$POSTGRES_PASSWORD';" && \
createdb -O $POSTGRES_USER $POSTGRES_DB
EXPOSE 5432
CMD ["postgres"]
The Scrapy project within the app container connects to the database in the db container though a standard psycopg connection.
pipeline.py
hostname = os.environ.get('POSTGRES_HOST', "Hostname not found")
username = os.environ.get('POSTGRES_USER', "Username not found")
password = os.environ.get('POSTGRES_PASSWORD', "Password not found")
database = os.environ.get('POSTGRES_DB', "Database name not found")
port = os.environ.get('POSTGRES_PORT', "Port not found")
logging.debug("Connecting to database...")
try:
self.connection = psycopg.connect(host=hostname, user=username, password=password, dbname=database, port=port)
self.cursor = self.connection.cursor()
logging.info("Connected to database.")
except:
logging.error("Could not connect to database.")
raise
The issue is occurring with the crontab I implemented in order to automate the project.
cron
30 5 * * 0 sh /shell_scripts/scrape.sh
scrape.sh
#!bin/bash
export PATH=$PATH:/usr/local/bin
export POSTGRES_USER=$POSTGRES_USER
export POSTGRES_PASSWORD=$POSTGRES_PASSW
export POSTGRES_DB=$POSTGRES_DB
export POSTGRES_HOST=$POSTGRES_HOST
export POSTGRES_PORT=$POSTGRES_PORT
cd "/scrape"
scrapy crawl spider
It took me a while to get this far with the corn. However, when it activates, the shell script executes successfully, but my Scrapy program fails to establish the database connection with the following message:
CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
result = context.run(gen.send, result)
File "/usr/local/lib/python3.10/site-packages/scrapy/crawler.py", line 134, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.10/site-packages/scrapy/crawler.py", line 148, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.10/site-packages/scrapy/core/engine.py", line 99, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.10/site-packages/scrapy/core/scraper.py", line 109, in __init__
self.itemproc: ItemPipelineManager = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.10/site-packages/scrapy/middleware.py", line 67, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.10/site-packages/scrapy/middleware.py", line 44, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "/usr/local/lib/python3.10/site-packages/scrapy/utils/misc.py", line 194, in create_instance
instance = objcls(*args, **kwargs)
File "/scrape/scrape/pipelines.py", line 26, in __init__
self.connection = psycopg.connect(host=hostname, user=username, password=password, dbname=database, port=port)
File "/usr/local/lib/python3.10/site-packages/psycopg/connection.py", line 738, in connect
raise ex.with_traceback(None)
psycopg.OperationalError: connection is bad: No such file or directory
Is the server running locally and accepting connections on that socket?
I believe this issue is caused by crontab using a different shell to process the job. Everything runs successfully when I run the shell script manually through the terminal. It seems that because of the crontab shell, the shell or the program no longer recognizes the Docker network that connects the services and believes that it's looking for something local.
I'm not sure how to solve this issue. I knew using Cron within Docker was tricky but this has been kinda nightmarish.
Values for environment variables:
- POSTGRES_USER = username
- POSTGRES_PASSWORD = password for the user
- POSTGRES_DB = name of the database
- POSTGRES_HOST = db (Using the name of the service for the hostname)
- POSTGRES_PORT = 5432
Solution
Finally solved the problem. Due to cron running in its own environment, I had to pass the values of the environment variables directly into the shell script in the form of build arguments through the following changes:
docker-compose.yml
version: '3.8'
services:
app:
container_name: app
build:
context: ./app
dockerfile: dockerfile
args:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_HOST=${POSTGRES_HOST}
- POSTGRES_PORT=${POSTGRES_PORT}
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_HOST=${POSTGRES_HOST}
- POSTGRES_PORT=${POSTGRES_PORT}
depends_on:
- db
links:
- db
dockerfile for app
FROM python:3.10-bookworm
RUN apt-get update -q
RUN apt-get install -y cron
COPY . .
ARG POSTGRES_USER
ARG POSTGRES_PASSWORD
ARG POSTGRES_DB
ARG POSTGRES_HOST
ARG POSTGRES_PORT
RUN sed -i "s/\$POSTGRES_USER/${POSTGRES_USER}/g" shell_scripts/scrape.sh
RUN sed -i "s/\$POSTGRES_PASSWORD/${POSTGRES_PASSWORD}/g" shell_scripts/scrape.sh
RUN sed -i "s/\$POSTGRES_DB/${POSTGRES_DB}/g" shell_scripts/scrape.sh
RUN sed -i "s/\$POSTGRES_HOST/${POSTGRES_HOST}/g" shell_scripts/scrape.sh
RUN sed -i "s/\$POSTGRES_PORT/${POSTGRES_PORT}/g" shell_scripts/scrape.sh
RUN pip3 install -r requirements.txt
COPY shell_scripts/scrape_cron /etc/cron.d/scrape_cron
RUN chmod 0744 /etc/cron.d/scrape_cron
RUN crontab /etc/cron.d/scrape_cron
RUN touch /var/log/cron.log
CMD cron && tail -f /var/log/cron.log
Answered By - clark_s Answer Checked By - Cary Denson (WPSolving Admin)