It explains how to manually configure Kerberos for HDFS and YARN.
1. Installing the MIT Kerberos 5 KDC.
Please follow all steps included in this article -- Installing the MIT Kerberos 5 KDC.Here is my test environment architecture.
HDFS: | |
NameNode | hdm |
Secondary NameNode | hdw3 |
DataNode | hdw1,hdw2,hdw3 |
YARN: | |
Resource Manager | hdm |
Node Manager | hdw1,hdw2,hdw3 |
Job History Server | hdm |
2. Install Kerberos Workstation and Libraries on Cluster Hosts
If you are using MIT krb5 run:yum install krb5-libs krb5-workstation
3. Distribute the Kerberos Client Configuration File to all Cluster Hosts
If you are using Kerberos 5 MIT, the file is /etc/krb5.conf. This file must exist on all cluster hosts.For PivotalHD you can use massh to push the files, and then to copy them to the proper place.
massh ~/hostfile_all push /etc/krb5.conf
massh ~/hostfile_all verbose "cp ~/krb5.conf /etc/krb5.conf"
massh ~/hostfile_all verbose "ls -altr /etc/krb5.conf"
4. Create the Principals
These instructions are for MIT Kerberos 5; command syntax for other Kerberos versions may be different.Principals (Kerberos users) are of the form: name/role@REALM. For our purposes the name will be a PivotalHD service name (for example, hdfs),
and the role will be a DNS resolvable fully-qualified hostname (host_fqdn); one you could use to connect to the host in question.
Important:
- Replace REALM with the KDC realm you are using for your PHD cluster, where it appears.
- The host names used MUST be resolvable to an address on all the cluster hosts and MUST be of the form host.domain, as some Hadoop components require at least one "." part in the host names used for principals.
- The names of the principals seem to matter, as some processes may throw exceptions if you change them. Hence, it is safest to use the specified Hadoop principal names.
- Hadoop supports an _HOST tag in the site XML that is interpreted as the host_fqdn but this must be used properly. In this test, I will use _HOST tag.
For YARN services, you will need to create a yarn/host_fqdn principal for each host running a YARN service (resource manager, node manager, proxy server).
For MapReduce services, you need to create a principal, mapred/host_fqdn for the Job History Server.
To create the required secure HD principals (running kadmin.local):
- For each cluster host (excepting client-only hosts) run:
addprinc -randkey HTTP/<host_fqdn>@<REALM>
eg:addprinc -randkey HTTP/hdw1.xxx.com@OPENKBINFO.COM addprinc -randkey HTTP/hdw2.xxx.com@OPENKBINFO.COM addprinc -randkey HTTP/hdw3.xxx.com@OPENKBINFO.COM addprinc -randkey HTTP/hdm.xxx.com@OPENKBINFO.COM
- HDFS (name node, secondary name node, data nodes), for each HDFS service host run:
addprinc -randkey hdfs/<host_fqdn>@<REALM>
eg:addprinc -randkey hdfs/hdw1.xxx.com@OPENKBINFO.COM addprinc -randkey hdfs/hdw2.xxx.com@OPENKBINFO.COM addprinc -randkey hdfs/hdw3.xxx.com@OPENKBINFO.COM addprinc -randkey hdfs/hdm.xxx.com@OPENKBINFO.COM
- YARN (resource manager, node managers, proxy server), for each YARN service host run:
addprinc -randkey yarn/<host_fqdn>@<REALM>
eg:addprinc -randkey yarn/hdw1.xxx.com@OPENKBINFO.COM addprinc -randkey yarn/hdw2.xxx.com@OPENKBINFO.COM addprinc -randkey yarn/hdw3.xxx.com@OPENKBINFO.COM addprinc -randkey yarn/hdm.xxx.com@OPENKBINFO.COM
- MAPRED (job history server): for each JobHistoryServer service host run:
addprinc -randkey mapred/<host_fqdn>@<REALM>
eg:addprinc -randkey mapred/hdm.xxx.com@OPENKBINFO.COMImportant
If you have 1000 cluster hosts running HDFS and YARN, you will need 2000 HDFS and YARN principals, and need to distribute their keytab files.
It is recommended that you use a cluster-local KDC for this purpose and configure cross-realm trust to your organizational Active Directory or other
Kerberos KDC.
5. Create the Keytab Files
Important: You MUST use kadmin.local (or the equivalent in your KDC) for this step on the KDC, as kadmin does not support -norandkey.Important: You can put the keytab files anywhere during this step. In this document, we created a directory /etc/security/phd/keytab/ and are using this directory on cluster hosts, and so, for consistency, are placing them in a similarly named directory on the KDC. If the node you are on already has files in /etc/security/phd/keytab/, it may be advisable to create a separate, empty, directory for this step.
Each service's keytab file for a given host will have the service principal for that host and the HTTP principal for that host in the file.
5.1 Create directory on all hosts to store the keytab files.
massh hostfile_all verbose "mkdir -p /etc/security/phd/keytab/"
5.2 Create HDFS key tabs.
For each host having an HDFS process (name node, secondary name node, data nodes), run:kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/hdfs-hostid.service.keytab hdfs/<host_fqdn>@<REALM> HTTP/<host_fqdn@<REALM>
where hostid is the short name for the host, for example, vm1, vm2, etc. This is to differentiate the files by host. You can use the hostname if desired.eg:
ktadd -norandkey -k /etc/security/phd/keytab/hdfs-hdw1.service.keytab hdfs/hdw1.xxx.com@OPENKBINFO.COM HTTP/hdw1.xxx.com@OPENKBINFO.COM ktadd -norandkey -k /etc/security/phd/keytab/hdfs-hdw2.service.keytab hdfs/hdw2.xxx.com@OPENKBINFO.COM HTTP/hdw2.xxx.com@OPENKBINFO.COM ktadd -norandkey -k /etc/security/phd/keytab/hdfs-hdw3.service.keytab hdfs/hdw3.xxx.com@OPENKBINFO.COM HTTP/hdw3.xxx.com@OPENKBINFO.COM ktadd -norandkey -k /etc/security/phd/keytab/hdfs-hdm.service.keytab hdfs/hdm.xxx.com@OPENKBINFO.COM HTTP/hdm.xxx.com@OPENKBINFO.COM
5.3 Create YARN key tabs.
For each host having a YARN process (resource manager, node manager or proxy server), run:kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/yarn-hostid.service.keytab yarn/<host_fqdn>@<REALM> HTTP/<host_fqdn>@<REALM>
eg:ktadd -norandkey -k /etc/security/phd/keytab/yarn-hdw1.service.keytab yarn/hdw1.xxx.com@OPENKBINFO.COM HTTP/hdw1.xxx.com@OPENKBINFO.COM ktadd -norandkey -k /etc/security/phd/keytab/yarn-hdw2.service.keytab yarn/hdw2.xxx.com@OPENKBINFO.COM HTTP/hdw2.xxx.com@OPENKBINFO.COM ktadd -norandkey -k /etc/security/phd/keytab/yarn-hdw3.service.keytab yarn/hdw3.xxx.com@OPENKBINFO.COM HTTP/hdw3.xxx.com@OPENKBINFO.COM ktadd -norandkey -k /etc/security/phd/keytab/yarn-hdm.service.keytab yarn/hdm.xxx.com@OPENKBINFO.COM HTTP/hdm.xxx.com@OPENKBINFO.COM
5.4 Create MAPRED key tabs.
For each host having a MapReduce job history server, run:kadmin.local: ktadd -norandkey -k /etc/security/phd/keytab/mapred-hostid.service.keytab mapred/host_fqdn@REALM HTTP/host_fqdn@REALM
eg:ktadd -norandkey -k /etc/security/phd/keytab/mapred-hdm.service.keytab mapred/hdm.xxx.com@OPENKBINFO.COM HTTP/hdm.xxx.com@OPENKBINFO.COM
6. Distribute the Keytab Files
6.1 Move all the keytab files for a given host to the keytab directory on that host.
eg:cd /etc/security/phd/keytab scp *hdw1*.keytab hdw1:/etc/security/phd/keytab/ scp *hdw2*.keytab hdw2:/etc/security/phd/keytab/ scp *hdw3*.keytab hdw3:/etc/security/phd/keytab/ scp *hdm*.keytab hdm:/etc/security/phd/keytab/
On each host:
6.2 Change the permissions on all key tabs to read-write by owner only.
massh ~/hostfile_all verbose "chmod 400 /etc/security/phd/keytab/*.keytab"
6.3 Change the group on all keytab files to hadoop.
massh ~/hostfile_all verbose "chgrp hadoop /etc/security/phd/keytab/*.keytab"
6.4 Change the owner of each keytab to the relevant principal name.
massh ~/hostfile_all verbose "chown yarn /etc/security/phd/keytab/yarn*.keytab"
massh ~/hostfile_all verbose "chown hdfs /etc/security/phd/keytab/hdfs*.keytab"
massh ~/hostfile_all verbose "chown mapred /etc/security/phd/keytab/mapred*.keytab"
6.5 Create links to the files of the form principalname.service.keytab .
massh ~/hostfile_all verbose "cd /etc/security/phd/keytab/; if ls yarn*.service.keytab &> /dev/null; then ln -s yarn*.service.keytab yarn.service.keytab ; fi"
massh ~/hostfile_all verbose "cd /etc/security/phd/keytab/; if ls hdfs*.service.keytab &> /dev/null; then ln -s hdfs*.service.keytab hdfs.service.keytab ; fi"
massh ~/hostfile_all verbose "cd /etc/security/phd/keytab/; if ls mapred*.service.keytab &> /dev/null; then ln -s mapred*.service.keytab mapred.service.keytab ; fi"
Here it how it looks like for all nodes:[root@admin ~]# massh ~/hostfile_all verbose "ls -altr /etc/security/phd/keytab/*.keytab" hdm : -r--------. 1 hdfs hadoop 894 Jan 7 15:11 /etc/security/phd/keytab/hdfs-hdm.service.keytab hdm : -r--------. 1 yarn hadoop 894 Jan 7 15:11 /etc/security/phd/keytab/yarn-hdm.service.keytab hdm : -r--------. 1 mapred hadoop 906 Jan 7 15:11 /etc/security/phd/keytab/mapred-hdm.service.keytab hdm : lrwxrwxrwx. 1 root root 23 Jan 7 15:24 /etc/security/phd/keytab/yarn.service.keytab -> yarn-hdm.service.keytab hdm : lrwxrwxrwx. 1 root root 23 Jan 7 15:24 /etc/security/phd/keytab/hdfs.service.keytab -> hdfs-hdm.service.keytab hdm : lrwxrwxrwx. 1 root root 25 Jan 7 15:31 /etc/security/phd/keytab/mapred.service.keytab -> mapred-hdm.service.keytab hdw2 : -r--------. 1 hdfs hadoop 906 Jan 7 15:11 /etc/security/phd/keytab/hdfs-hdw2.service.keytab hdw2 : -r--------. 1 yarn hadoop 906 Jan 7 15:11 /etc/security/phd/keytab/yarn-hdw2.service.keytab hdw2 : lrwxrwxrwx. 1 root root 24 Jan 7 15:24 /etc/security/phd/keytab/yarn.service.keytab -> yarn-hdw2.service.keytab hdw2 : lrwxrwxrwx. 1 root root 24 Jan 7 15:24 /etc/security/phd/keytab/hdfs.service.keytab -> hdfs-hdw2.service.keytab hdw3 : -r--------. 1 hdfs hadoop 906 Jan 7 15:11 /etc/security/phd/keytab/hdfs-hdw3.service.keytab hdw3 : -r--------. 1 yarn hadoop 906 Jan 7 15:11 /etc/security/phd/keytab/yarn-hdw3.service.keytab hdw3 : lrwxrwxrwx. 1 root root 24 Jan 7 15:24 /etc/security/phd/keytab/yarn.service.keytab -> yarn-hdw3.service.keytab hdw3 : lrwxrwxrwx. 1 root root 24 Jan 7 15:24 /etc/security/phd/keytab/hdfs.service.keytab -> hdfs-hdw3.service.keytab hdw1 : -r--------. 1 hdfs hadoop 906 Jan 7 15:11 /etc/security/phd/keytab/hdfs-hdw1.service.keytab hdw1 : -r--------. 1 yarn hadoop 906 Jan 7 15:11 /etc/security/phd/keytab/yarn-hdw1.service.keytab hdw1 : lrwxrwxrwx. 1 root root 24 Jan 7 15:24 /etc/security/phd/keytab/yarn.service.keytab -> yarn-hdw1.service.keytab hdw1 : lrwxrwxrwx. 1 root root 24 Jan 7 15:24 /etc/security/phd/keytab/hdfs.service.keytab -> hdfs-hdw1.service.keytab
7. Configure the Linux Container
7.1 Edit the /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg as follows on hdm for example:
# NOTE: these next two should be set to the same values they have in yarn-site.xml yarn.nodemanager.local-dirs=/data/1/yarn/nm-local-dir,/data/2/yarn/nm-local-dir,/data/3/yarn/nm-local-dir yarn.nodemanager.log-dirs=/data/1/yarn/userlogs,/data/2/yarn/userlogs,/data/3/yarn/userlogs # configured value of yarn.nodemanager.linux-container-executor.group yarn.nodemanager.linux-container-executor.group=yarn # comma separated list of users who can not run applications banned.users=hdfs,yarn,mapred,bin # Prevent other super-users min.user.id=500Note: The min.user.id varies by Linux distribution; for CentOS it is 500, RedHat is 1000.
7.2 Sync this file to other nodes:
scp /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg hdw1:/usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg scp /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg hdw2:/usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg scp /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg hdw3:/usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg
7.3 Check the permissions on /usr/lib/gphd/hadoop-yarn/bin/container-executor.
massh ~/hostfile_all verbose "ls -altr /usr/lib/gphd/hadoop-yarn/bin/container-executor"
They should look like:---Sr-s--- 1 root yarn 364 Jun 11 00:08 container-executorIf they do not, then set the owner, group and permissions as:
chown root:yarn container-executor chmod 050 container-executor chmod u+s container-executor chmod g+s container-executor
7.4 Check the permissions on /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg.
massh ~/hostfile_all verbose "ls -altr /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg"
They should look like:-rw-r--r-- 1 root root 363 Jul 4 00:29 /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfgIf they do not, then set them as follows:
massh ~/hostfile_all verbose "chown root:root /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg" massh ~/hostfile_all verbose "chmod 644 /usr/lib/gphd/hadoop-yarn/etc/hadoop/container-executor.cfg"
8. STOP the cluster, if it is running.
icm_client stop -l <cluster name>
9. Edit the Environment on the Datanodes.
Important:You only need to perform the steps below on the data nodes.
In my example, it is hdw1,hdw2,hdw3.
9.1 Uncomment the lines at the bottom of /etc/default/hadoop-hdfs-datanode:
# secure operation stuff
export HADOOP_SECURE_DN_USER=hdfs
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/hdfs
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
9.2 Set the JSVC variable:
If you are using the included jsvc the JSVC_HOME variable in /etc/default/hadoop , it should already be properly set:export JSVC_HOME=/usr/libexec/bigtop-utils
If, however, you built or hand-installed JSVC, your JSVC_HOME will be /usr/bin , so you must set it appropriately. Modify /etc/default/hadoop and set the proper JSVC_HOME:
export JSVC_HOME=/usr/bin
Important: Make sure JSVC_HOME points to the correct jsvc binary.As long as HADOOP_SECURE_DN_USER is set, the datanode will try to start in secure mode.
10. Site XML Changes
Using _HOST in Site XML:You can maintain consistent site XML by using the _HOST keyword for the host_fqdn part in the site XML if:
- Your cluster nodes were identified with fully qualified domain names when configuring the cluster.
- hostname -f on all nodes yields the proper fully qualified hostname (same as the one used when creating the principals).
You can only use _HOST in the site XML; files such as jaas.conf, needed for Zookeeper and HBase, must use actual FQDN's for hosts.
Edit the Site XML:
Finally, we are ready to edit the site XML to turn on secure mode. Before getting into this, it is good to
understand who needs to talk to whom. By "talk" we mean using authenticated kerberos to initiate
establishment of a communication channel. Doing this requires that you know your own principal, to identify
yourself, and know the principal of the service you want to talk to. To be able to use its principal, a service
needs to be able to login to Kerberos without a password, using a keytab file.
- Each service needs to know its own principal name.
- Each running service on a node needs a service/host specific keytab file to start up.
- Each data node needs to talk to the name node.
- Each node manager needs to talk to the resource manager and the job history server.
- Each client/gateway node needs to talk to the name node, resource manager and job history server.
- Redundant keytab files on some hosts do no harm and it makes management easier to have constant files. Remember, though, that the host_fqdn MUST be correct for each entry. Remembering this helps when setting up and troubleshooting the site xml files.
- Before making changes, backup the current site xml files so that you can return to non-secure operation, if needed.
Unfortunately, since data node and node manager principals are host-name-dependent (or more correctly the role for the yarn principal is set to the host_fqdn), the yarn-site.xml for data node and node manager principals will differ across the cluster.
10.1 Backup configurations.
massh ~/hostfile_all verbose "cp /usr/lib/gphd/hadoop/etc/hadoop/core-site.xml /usr/lib/gphd/hadoop/etc/hadoop/not-secure-core-site.xml"
massh ~/hostfile_all verbose "cp /usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml /usr/lib/gphd/hadoop/etc/hadoop/not-secure-hdfs-site.xml"
massh ~/hostfile_all verbose "cp /usr/lib/gphd/hadoop/etc/hadoop/yarn-site.xml /usr/lib/gphd/hadoop/etc/hadoop/not-secure-yarn-site.xml"
massh ~/hostfile_all verbose "cp /usr/lib/gphd/hadoop/etc/hadoop/mapred-site.xml /usr/lib/gphd/hadoop/etc/hadoop/not-secure-mapred-site.xml"
10.2 Edit /usr/lib/gphd/hadoop/etc/hadoop/core-site.xml as follows on hdm.
<property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> <property> <name>hadoop.security.authorization</name> <value>true</value> </property> <!-- THE PROPERTY BELOW IS OPTIONAL: IT ENABLES ON WIRE RPC ENCRYPTION --> <property> <name>hadoop.rpc.protection</name> <value>privacy</value> </property>Sync to other nodes:
scp /usr/lib/gphd/hadoop/etc/hadoop/core-site.xml hdw1:/usr/lib/gphd/hadoop/etc/hadoop/core-site.xml scp /usr/lib/gphd/hadoop/etc/hadoop/core-site.xml hdw2:/usr/lib/gphd/hadoop/etc/hadoop/core-site.xml scp /usr/lib/gphd/hadoop/etc/hadoop/core-site.xml hdw3:/usr/lib/gphd/hadoop/etc/hadoop/core-site.xml
10.3 Edit /usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml as follows on hdm.
<!-- WARNING: do not create duplicate entries: check for existing entries and modify if they exist! --> <property> <name>dfs.block.access.token.enable</name> <value>true</value> </property> <!-- short circuit reads do not work when security is enabled for PHD VERSION LOWER THAN 2.0 so disable ONLY for them --> <!-- For PHD greater than or equal to 2.0, set this to true --> <property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <!-- name node secure configuration info --> <property> <name>dfs.namenode.keytab.file</name> <value>/etc/security/phd/keytab/hdfs.service.keytab</value> </property> <property> <name>dfs.namenode.kerberos.principal</name> <value>hdfs/_HOST@OPENKBINFO.COM</value> </property> <property> <name>dfs.namenode.kerberos.http.principal</name> <value>HTTP/_HOST@OPENKBINFO.COM</value> </property> <property> <name>dfs.namenode.kerberos.internal.spnego.principal</name> <value>HTTP/_HOST@OPENKBINFO.COM</value> </property> <!-- (optional) secondary name node secure configuration info --> <property> <name>dfs.secondary.namenode.keytab.file</name> <value>/etc/security/phd/keytab/hdfs.service.keytab</value> </property> <property> <name>dfs.secondary.namenode.kerberos.principal</name> <value>hdfs/_HOST@OPENKBINFO.COM</value> </property> <property> <name>dfs.secondary.namenode.kerberos.http.principal</name> <value>HTTP/_HOST@OPENKBINFO.COM</value> </property> <property> <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name> <value>HTTP/_HOST@OPENKBINFO.COM</value> </property> <!-- data node secure configuration info --> <property> <name>dfs.datanode.data.dir.perm</name> <value>700</value> </property> <!-- these ports must be set < 1024 for secure operation --> <!-- conversely they must be set back to > 1024 for non-secure operation --> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:1004</value> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:1006</value> </property> <!-- remember the principal for the datanode is the principal this hdfs-site.xml file is on --> <!-- these (next three) need only be set on data nodes --> <property> <name>dfs.datanode.kerberos.principal</name> <value>hdfs/_HOST@OPENKBINFO.COM</value> </property> <property> <name>dfs.datanode.kerberos.http.principal</name> <value>HTTP/_HOST@OPENKBINFO.COM</value> </property> <property> <name>dfs.datanode.keytab.file</name> <value>/etc/security/phd/keytab/hdfs.service.keytab</value> </property> <!-- OPTIONAL - set these to enable secure WebHDSF --> <!-- on all HDFS cluster nodes (namenode, secondary namenode, datanode's) --> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.web.authentication.kerberos.principal</name> <value>HTTP/_HOST@OPENKBINFO.COM</value> </property> <!-- since we included the HTTP principal all keytabs we can use it here --> <property> <name>dfs.web.authentication.kerberos.keytab</name> <value>/etc/security/phd/keytab/hdfs.service.keytab</value> </property> <!-- THE PROPERTIES BELOW ARE OPTIONAL AND REQUIRE RPC PRIVACY (core-site): THEY ENABLE ON WIRE HDFS BLOCK ENCRYPTION --> <property> <name>dfs.encrypt.data.transfer</name> <value>true</value> </property> <property> <name>dfs.encrypt.data.transfer.algorithm</name> <value>rc4</value> <description>may be "rc4" or "3des" - 3des has a significant performance impact</description> </property>Sync to other data nodes(no need for name node or secondary name node):
scp /usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml hdw2:/usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml scp /usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml hdw3:/usr/lib/gphd/hadoop/etc/hadoop/hdfs-site.xml
10.4 Edit /usr/lib/gphd/hadoop/etc/hadoop/yarn-site.xml as follows:
<!-- resource manager secure configuration info --> <property> <name>yarn.resourcemanager.principal</name> <value>yarn/_HOST@OPENKBINFO.COM</value> </property> <property> <name>yarn.resourcemanager.keytab</name> <value>/etc/security/phd/keytab/yarn.service.keytab</value> </property> <!-- remember the principal for the node manager is the principal for the host this yarn-site.xml file is on --> <!-- these (next four) need only be set on node manager nodes --> <property> <name>yarn.nodemanager.principal</name> <value>yarn/_HOST@OPENKBINFO.COM</value> </property> <property> <name>yarn.nodemanager.keytab</name> <value>/etc/security/phd/keytab/yarn.service.keytab</value> </property> <property> <name>yarn.nodemanager.container-executor.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.group</name> <value>yarn</value> </property> <!-- OPTIONAL - set these to enable secure proxy server node --> <property> <name>yarn.web-proxy.keytab</name> <value>/etc/security/phd/keytab/yarn.service.keytab</value> </property> <property> <name>yarn.web-proxy.principal</name> <value>yarn/_HOST@OPENKBINFO.COM</value> </property>Sync from one node manager(eg, hdw1) to other node manager(no need for resource node):
scp /usr/lib/gphd/hadoop/etc/hadoop/yarn-site.xml hdw2:/usr/lib/gphd/hadoop/etc/hadoop/yarn-site.xml scp /usr/lib/gphd/hadoop/etc/hadoop/yarn-site.xml hdw3:/usr/lib/gphd/hadoop/etc/hadoop/yarn-site.xml
10.5 Edit /usr/lib/gphd/hadoop/etc/hadoop/mapred-site.xml as follows on hdm:
<!-- job history server secure configuration info --> <property> <name>mapreduce.jobhistory.keytab</name> <value>/etc/security/phd/keytab/mapred.service.keytab</value> </property> <property> <name>mapreduce.jobhistory.principal</name> <value>mapred/_HOST@OPENKBINFO.COM</value> </property>
11. Complete the HDFS/YARN Secure Configuration
11.1 Start the cluster.
icm_client start -l <cluster name>
11.2 Check that all the processes listed below start up.
Control processes: namenode, resourcemanager, historyserver should all be running.Cluster worker processes: datanode and namenode should be running.
Note: Until you do HBase security configuration, HBase will not start up on a secure cluster.
massh ~/hostfile_all verbose "service hadoop-hdfs-namenode status"
massh ~/hostfile_all verbose "service hadoop-mapreduce-historyserver status"
massh ~/hostfile_all verbose "service hadoop-yarn-resourcemanager status"
massh ~/hostfile_all verbose "service hadoop-hdfs-datanode status"
massh ~/hostfile_all verbose "service hadoop-yarn-nodemanager status"
massh ~/hostfile_all verbose "service hadoop-hdfs-secondarynamenode status"
Also make sure HDFS is usable by:su - hdfs
hdfs dfsadmin -report
hadoop fs -copyFromLocal .bash_history /tmp/
If you meet any issues, please refer to Common Issues when configuring Kerberos on PivotalHD. 11.3 Create a principal for a standard user (the user must exist as a Linux user on all cluster hosts):
Set the password when prompted.massh ~/hostfile_all verbose "useradd testuser" [root@admin ~]# kadmin.local Authenticating as principal root/admin@OPENKBINFO.COM with password. kadmin.local: addprinc testuser WARNING: no policy specified for testuser@OPENKBINFO.COM; defaulting to no policy Enter password for principal "testuser@OPENKBINFO.COM": Re-enter password for principal "testuser@OPENKBINFO.COM": Principal "testuser@OPENKBINFO.COM" created.
11.4 Login as that user on a client box (or any cluster box, if you do not have specific client purposed systems).
11.5 Get your Kerberos TGT by running kinit and entering the password:
kinit testuser
11.6 Test simple HDFS file list and directory create:
hadoop fs -ls /
hadoop fs -mkdir /tmp/testdir2
12 [Optional] Set the sticky bit on the /tmp directory (prevents non-super-users from moving or deleting other users' files in /tmp).
12.1 Login as gpadmin on any HDFS service node (namenode, datanode).
12.2 Execute the following:
sudo -u hdfs kinit -k -t /etc/security/phd/keytab/hdfs.service.keytab hdfs/this-host_fqdn@REALM
eg:sudo -u hdfs kinit -k -t /etc/security/phd/keytab/hdfs.service.keytab hdfs/this-host_fqdn@REALM
12.3 Execute the following:
sudo -u hdfs hadoop fs -chmod 1777 /tmp
12.4 Run a simple MapReduce job such as the Pi example:
hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100
If this all works, then you are ready to configure other services.Again, if you meet any issues, please refer to Common Issues when configuring Kerberos on PivotalHD.
No comments:
Post a Comment