Issue
To be clear, I'm not asking about setting permissions in HDFS, but rather in ext3 or whatever filesystem is being used on the individual datanode machines that HDFS is running on top of.
I know that we set sudo chown hduser:hadoop /app/hadoop/tmp
, so user hduser
is the file owner, but I'd like to know guidelines for the permissions bits (chmod) on these files.
Solution
If you set the permission to 755 (worse 777), the files in the underlying filesystem can be read by anyone and surely it's a security issue. A restrictive permission configuration such as 700 makes some sense. This prevents an unauthorized user from simply opening and reading the files from local disk rather that using HDFS API.
In a securely configured cluster as of Hadopo version 0.22, 0.23 fix, the permissions on datanode data directories (configured by dfs.datanode.data.dir.perm) now default to 0700. Upon startup, the datanode will automatically change the permissions to match the configured value.
In 1.0 the datanode checks for these values to be the same and refuses to start if they are different. You might get exceptions such as the following if the permissions already provided to where the data is stored violates the default permission configured for Hadoop.
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /disk1/datanode, expected: rwxr-xr-x, while actual: rwxrwxr-x
I'm quite not sure about what's happening in other versions though. You might want to have a look yourself though.
Answered By - SSaikia_JtheRocker