Issue
Before getting into the problem, I would like to inform that I saw a lot of StackOverflow questions and python bugs reported on this problem but I am unable to root cause the issue
I am getting UnicodeEncodingError in a centos machine. Python is not built in the machine but the virtual environment with the required python version (3.6.7) is built somewhere else and copied here. So while starting the server, we activate the virtual environment and start the server.
the issue is observed in two scenarios
- logging input request parameter which has Unicode characters in it
- we pipe print statements to a log file and i can see error there while trying to print this Unicode string through code
the error looks as follows
print("\u6211\u7684\u7535\u8111\u603b\u662f\u51fa\u73b0Windows\u9700\u8981\u6fc0\u6d3b")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 56-63: ordinal not in range(128)
I verified following through python terminal
- sys.getdefaultencoding() - utf-8
- sys.getfilesystemencoding() - utf-8
- sys.stdout.encoding
- LANG is set to en_us.utf-8
- LC_ALL is not set
I went through some solutions asking to modify LC_ALL or adding PYTHONIOENCODING in environment variables but I am not sure about modifying those without knowing side effects as the environment is a production environment.
Edit - I tried to print the same set of characters which are breaking the code on above attempts through console by opening python terminal and its printing them without any issue. Tried printing in this way
import sys
print("日本語")
sys.stdout.write("日本語\n")
but through code, it is raising UnicodeEncodingError
I would like to know how to resolve this?
Thanks
Solution
Finally got rid of this issue in this way
I observed the issue mentioned in question under two different circumstances
The first scenario - With all settings posted in the question, all language-related encodings are UTF-8, it worked after our prod server restart without any changes. Still don't know what made it not to work previously and work after restarting the machine.
The second scenario - All LC variables are set to POSIX in our client environment. I went through many solutions which suggested to modify LANG or LC_ALL to UTF-8. But changing all the encoding configurations may lead to problems like date time conversion etc... which are locale-based.
Fix - only changed LC_CTYPE to UTF-8 in our case it is
en_US.UTF-8
export LC_CTYPE="en_US.UTF-8"
and it worked.
Answered By - Satyaaditya Answer Checked By - Mildred Charles (WPSolving Admin)