Thursday, September 1, 2022

[SOLVED] sre_constants.error: bad character range when building python from source code

Issue

I am setting a Docker service for bacnet using Debian10 and python2.7. When I install python by the command apt-get install -y python-pip my application works fine. Howerver when I try installing python2.7 by downloading and extracting, the docker service throws the next error:

  File "src/main.py", line 1, in <module>
    from bacpypes.core import run as runbacpypes
  File "/usr/local/lib/python2.7/site-packages/bacpypes/__init__.py", line 75, in <module>
    from . import local
  File "/usr/local/lib/python2.7/site-packages/bacpypes/local/__init__.py", line 7, in <module>
    from . import object
  File "/usr/local/lib/python2.7/site-packages/bacpypes/local/object.py", line 140, in <module>
    local_name_re = re.compile(u"^" + PN_LOCAL + u"$", re.UNICODE)
  File "/usr/local/lib/python2.7/re.py", line 194, in compile
    return _compile(pattern, flags)
  File "/usr/local/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

Portion of the Dockerfile that causes the error:

  • Command apt-get install -y python-pip that works
FROM debian:10

RUN apt-get update && apt-get install -y python-pip
  • Download and extract python (causes the error)
FROM debian:10

RUN apt-get update \
  && apt-get install -y wget gcc make openssl libffi-dev libgdbm-dev libsqlite3-dev libssl-dev zlib1g-dev \
  && apt-get clean
WORKDIR /tmp/

# Build python from source
ARG PYTHON_VERSION=2.7.18
RUN wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz \
  && tar --extract -f Python-$PYTHON_VERSION.tgz \
  && cd ./Python-$PYTHON_VERSION/ \
  && ./configure --with-ensurepip=install --prefix=/usr/local \
  && make && make install \
  && cd ../ \
  && rm -r ./Python-$PYTHON_VERSION*

Rest of the Dockerfile content (bacnet dependencies):

ARG PAHO_MQTT_VERSION=1.6.1
ADD bacnet/source-code-dependencies/paho_mqtt-$PAHO_MQTT_VERSION/ ./paho_mqtt-$PAHO_MQTT_VERSION/
RUN cd ./paho_mqtt-$PAHO_MQTT_VERSION/ \
  && python setup.py install \
  && cd ../ \
  && rm -r ./paho_mqtt-$PAHO_MQTT_VERSION*

# RUN pip install bacpypes==0.18.0
ARG BACPYPES_VERSION=0.18.0
ADD bacnet/source-code-dependencies/bacpypes-$BACPYPES_VERSION/ ./bacpypes-$BACPYPES_VERSION/
RUN cd ./bacpypes-$BACPYPES_VERSION/ \
  && pip install bacpypes-0.18.0-py2-none-any.whl \
  && cd ../ \
  && rm -r ./bacpypes*

WORKDIR /bacnet
ADD bacnet/site-packages ./site-packages
ADD bacnet/src ./src

WORKDIR /bacnet
ENV PYTHONPATH=/bacnet/src/Bacnet
CMD ["python", "src/main.py"]

What the reason for this error when building python from source code could be? I've tried different versions of python 2.7 and it always happens.

I have read in different threads the reason for the sre_constants.error: bad character range is caused by placing a hyphen between characters but no idea why just the way installing the same python version makes it happen or not.

Reproducing the procedure without Docker the same error happens, I printed the argument from the compile instruction where the error happens (local_name_re = re.compile(u"^" + PN_LOCAL + u"$", re.UNICODE)) getting:

u"^([A-Za-z\xc0-\xd6\xd8-\xf6\xf8-\u02ff\u0370-\u037d\u037f-\u1fff\u200c-\u200d\u2070-\u218f\u2c00-\u2fef\u3001-\ud7ff\uf900-\ufdcf\ufdf0-\ufffd\U00010000-\U000effff_:0-9]|(%[0-9A-Fa-f][0-9A-Fa-f]|[-\~.!$&'()*+,;=/?#@%]))(([-A-Za-z\xc0-\xd6\xd8-\xf6\xf8-\u02ff\u0370-\u037d\u037f-\u1fff\u200c-\u200d\u2070-\u218f\u2c00-\u2fef\u3001-\ud7ff\uf900-\ufdcf\ufdf0-\ufffd\U00010000-\U000effff_0-9\xb7\u0300-\u036f\u203f-\u2040.:]|(%[0-9A-Fa-f][0-9A-Fa-f]|[-\~.!$&'()+,;=/?#@%]))([-A-Za-z\xc0-\xd6\xd8-\xf6\xf8-\u02ff\u0370-\u037d\u037f-\u1fff\u200c-\u200d\u2070-\u218f\u2c00-\u2fef\u3001-\ud7ff\uf900-\ufdcf\ufdf0-\ufffd\U00010000-\U000effff_0-9\xb7\u0300-\u036f\u203f-\u2040:]|(%[0-9A-Fa-f][0-9A-Fa-f]|[-\_~.!$&'()*+,;=/?#@%])))?$"


Solution

Trimming off pieces of the regex easily reveals that the problematic expression is

>>> re.compile(u"^([\U00010000-\U000effff])")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tripleee/.pyenv/versions/2.7.16/lib/python2.7/re.py", line 194, in compile
    return _compile(pattern, flags)
  File "/Users/tripleee/.pyenv/versions/2.7.16/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range
It looks like Python 2.7 out of the box simply cannot accept a Unicode character range outside the BMP.

Depending on how Python is built, Python 2 needed special options to be used at build time to support this feature. Going back in time, it was unavailable in the re library because of a bug even when you had a "wide build".

https://bugs.python.org/issue12749

The Debian python source package reveals how exactly it is compiled but probably your easiest way forward is to simply install the precompiled package.

Quick Duck Duck Going suggests that --enable-unicode=ucs4 might be useful if you really need to compile your own version.

(For brief background, the \u0123 syntax specifies Unicode characters inside the BMP, i.e. code points with a value less than or equal to 0xFFFF, which can be represented within 16 bits, i.e. 2 bytes, and thus four hex digits after the lowercase \u. Code points outside this range require the uppercase U \U00012345 syntax with eight hex digits, which internally requires a representation with room for more than 2 bytes per code point, i.e. a "wide build".)



Answered By - tripleee
Answer Checked By - David Goodson (WPSolving Volunteer)