Friday, October 7, 2022

[SOLVED] How to add local python files as a library to python virtual environment the same way as pip install to Airflow Docker

Issue

  • How to install my own python package to python virtual environment?
  • Final goal would be to add that package to a Airflow Docker environment.
  • My Dockerfile:
FROM apache/airflow:latest-python3.8
COPY requirements.txt .
RUN pip install -r requirements.txt

Solution

First, check the answer here https://stackoverflow.com/a/56483981/11609051 for the installation of your package to the venv.

After that you need to extend your docker image. For this your Dockerfile.txt should be similar to:

FROM apache/airflow:2.4.0
WORKDIR /Users/Desktop/tools/airflow2.4-lite
COPY requirements.txt /requirements.txt
RUN pip install --user --upgrade pip
RUN pip install --no-cache-dir --user -r /requirements.txt 

then you need to run command:

docker build . -f Dockerfile.txt --tag updated_image   

Then you need to change your Airflow image name in your docker-compose.yaml file as updated_image :

image: ${AIRFLOW_IMAGE_NAME:-extending_airflow:latest}

and run the commands:

docker-compose up airflow-init
docker-compose up -d --build airflow-webserver airflow-scheduler

After this rebuilt you can use the standard command for standing containers up:

docker-compose up -d

You can check the if packages are installed via this simple dag:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator

default_args = {
    'owner': 'foo',
    'retry': 5,
    'retry_delay': timedelta(minutes=5)
}


with DAG(
    default_args=default_args,
    dag_id="checker_dag",
    start_date=datetime(2022, 8, 27),
    schedule_interval='@daily'
) as dag:
    task1 = BashOperator(
      task_id="pip_task",
      bash_command='pip freeze',
  )
    
    task2 = BashOperator(
      task_id="printenv_task",
      bash_command='printenv',
  )

    task3 = BashOperator(
      task_id="apt_task",
      bash_command='apt list --installed',
  )

    task1 >> task2 >> task3

However there are some points that need to be taken into account:

  • Your requirements.txt file should be in your Airflow Docker project file.
  • Any of your packages in your venv shouldn't have the dependency of Python greater than of your Airflow's which is Python 3.8 nor they shouldn't depend on a package that uses a dependent rely on greater than Python 3.8 .
  • For the above you can use the Homebrew's [email protected] and you can install the packages in site-packages directory with: pip3.8 install <package-name> command.


Answered By - tryagain
Answer Checked By - Timothy Miller (WPSolving Admin)