티스토리 뷰

Bigdata/hdfs

Datanode read-only disk start failture

hellotheresy 2017. 1. 6. 10:08


hadoop 클러스터를 운영중에 datanode가 내려가서 재시작을 해보니 다음과 같은 메세지가 나온다 



/usr/lib/python2.6/site-packages/resource_management/core/environment.py:165: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  Logger.info("Skipping failure of {0} due to ignore_failures. Failure reason: {1}".format(resource, ex.message))
 stdout:
2017-01-05 14:13:19,719 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.3.0.0-2557
2017-01-05 14:13:19,719 - Checking if need to create versioned conf dir /etc/hadoop/2.3.0.0-2557/0
2017-01-05 14:13:19,720 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.3.0.0-2557', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2017-01-05 14:13:19,744 - call returned (1, '/etc/hadoop/2.3.0.0-2557/0 exist already', '')
2017-01-05 14:13:19,745 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.3.0.0-2557', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2017-01-05 14:13:19,770 - checked_call returned (0, '/usr/hdp/2.3.0.0-2557/hadoop/conf -> /etc/hadoop/2.3.0.0-2557/0')
2017-01-05 14:13:19,771 - Ensuring that hadoop has the correct symlink structure
2017-01-05 14:13:19,771 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-01-05 14:13:19,906 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.3.0.0-2557
2017-01-05 14:13:19,906 - Checking if need to create versioned conf dir /etc/hadoop/2.3.0.0-2557/0
2017-01-05 14:13:19,907 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.3.0.0-2557', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2017-01-05 14:13:19,930 - call returned (1, '/etc/hadoop/2.3.0.0-2557/0 exist already', '')
2017-01-05 14:13:19,931 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.3.0.0-2557', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2017-01-05 14:13:19,956 - checked_call returned (0, '/usr/hdp/2.3.0.0-2557/hadoop/conf -> /etc/hadoop/2.3.0.0-2557/0')
2017-01-05 14:13:19,957 - Ensuring that hadoop has the correct symlink structure
2017-01-05 14:13:19,957 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-01-05 14:13:19,959 - Group['hadoop'] {}
2017-01-05 14:13:19,960 - Group['users'] {}
2017-01-05 14:13:19,961 - Group['spark'] {}
2017-01-05 14:13:19,961 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2017-01-05 14:13:19,962 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2017-01-05 14:13:19,962 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']}
2017-01-05 14:13:19,963 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2017-01-05 14:13:19,964 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2017-01-05 14:13:19,965 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2017-01-05 14:13:19,965 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2017-01-05 14:13:19,966 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2017-01-05 14:13:19,967 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2017-01-05 14:13:19,969 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2017-01-05 14:13:19,974 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2017-01-05 14:13:19,974 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
2017-01-05 14:13:19,975 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2017-01-05 14:13:19,977 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2017-01-05 14:13:19,981 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2017-01-05 14:13:19,981 - Group['hdfs'] {}
2017-01-05 14:13:19,982 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'hdfs']}
2017-01-05 14:13:19,982 - FS Type:
2017-01-05 14:13:19,982 - Directory['/etc/hadoop'] {'mode': 0755}
2017-01-05 14:13:20,000 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2017-01-05 14:13:20,001 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
2017-01-05 14:13:20,015 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2017-01-05 14:13:20,021 - Skipping Execute[('setenforce', '0')] due to not_if
2017-01-05 14:13:20,022 - Directory['/var/log/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
2017-01-05 14:13:20,025 - Directory['/var/run/hadoop'] {'owner': 'root', 'create_parents': True, 'group': 'root', 'cd_access': 'a'}
2017-01-05 14:13:20,025 - Changing owner for /var/run/hadoop from 501 to root
2017-01-05 14:13:20,025 - Changing group for /var/run/hadoop from 502 to root
2017-01-05 14:13:20,026 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'cd_access': 'a'}
2017-01-05 14:13:20,031 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2017-01-05 14:13:20,033 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'}
2017-01-05 14:13:20,034 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2017-01-05 14:13:20,051 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs', 'group': 'hadoop'}
2017-01-05 14:13:20,052 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2017-01-05 14:13:20,053 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'}
2017-01-05 14:13:20,059 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}
2017-01-05 14:13:20,062 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2017-01-05 14:13:20,239 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.3.0.0-2557
2017-01-05 14:13:20,240 - Checking if need to create versioned conf dir /etc/hadoop/2.3.0.0-2557/0
2017-01-05 14:13:20,240 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.3.0.0-2557', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2017-01-05 14:13:20,263 - call returned (1, '/etc/hadoop/2.3.0.0-2557/0 exist already', '')
2017-01-05 14:13:20,264 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.3.0.0-2557', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2017-01-05 14:13:20,289 - checked_call returned (0, '/usr/hdp/2.3.0.0-2557/hadoop/conf -> /etc/hadoop/2.3.0.0-2557/0')
2017-01-05 14:13:20,290 - Ensuring that hadoop has the correct symlink structure
2017-01-05 14:13:20,290 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-01-05 14:13:20,295 - Stack Feature Version Info: stack_version=2.3, version=2.3.0.0-2557, current_cluster_version=2.3.0.0-2557 -> 2.3.0.0-2557
2017-01-05 14:13:20,297 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.3.0.0-2557
2017-01-05 14:13:20,297 - Checking if need to create versioned conf dir /etc/hadoop/2.3.0.0-2557/0
2017-01-05 14:13:20,298 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.3.0.0-2557', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2017-01-05 14:13:20,321 - call returned (1, '/etc/hadoop/2.3.0.0-2557/0 exist already', '')
2017-01-05 14:13:20,322 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.3.0.0-2557', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2017-01-05 14:13:20,346 - checked_call returned (0, '/usr/hdp/2.3.0.0-2557/hadoop/conf -> /etc/hadoop/2.3.0.0-2557/0')
2017-01-05 14:13:20,347 - Ensuring that hadoop has the correct symlink structure
2017-01-05 14:13:20,347 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-01-05 14:13:20,355 - checked_call['rpm -q --queryformat '%{version}-%{release}' hdp-select | sed -e 's/\.el[0-9]//g''] {'stderr': -1}
2017-01-05 14:13:20,398 - checked_call returned (0, '2.3.0.0-2557', '')
2017-01-05 14:13:20,404 - Directory['/etc/security/limits.d'] {'owner': 'root', 'create_parents': True, 'group': 'root'}
2017-01-05 14:13:20,414 - File['/etc/security/limits.d/hdfs.conf'] {'content': Template('hdfs.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2017-01-05 14:13:20,415 - XmlConfig['hadoop-policy.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2017-01-05 14:13:20,428 - Generating config: /usr/hdp/current/hadoop-client/conf/hadoop-policy.xml
2017-01-05 14:13:20,428 - File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2017-01-05 14:13:20,439 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2017-01-05 14:13:20,450 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-client.xml
2017-01-05 14:13:20,450 - File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2017-01-05 14:13:20,457 - Directory['/usr/hdp/current/hadoop-client/conf/secure'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'cd_access': 'a'}
2017-01-05 14:13:20,458 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf/secure', 'configuration_attributes': {}, 'configurations': ...}
2017-01-05 14:13:20,469 - Generating config: /usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml
2017-01-05 14:13:20,469 - File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2017-01-05 14:13:20,476 - XmlConfig['ssl-server.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2017-01-05 14:13:20,487 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-server.xml
2017-01-05 14:13:20,487 - File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2017-01-05 14:13:20,495 - XmlConfig['hdfs-site.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {'final': {'dfs.support.append': 'true', 'dfs.datanode.data.dir': 'true', 'dfs.namenode.http-address': 'true', 'dfs.namenode.name.dir': 'true', 'dfs.webhdfs.enabled': 'true', 'dfs.datanode.failed.volumes.tolerated': 'true'}}, 'configurations': ...}
2017-01-05 14:13:20,506 - Generating config: /usr/hdp/current/hadoop-client/conf/hdfs-site.xml
2017-01-05 14:13:20,506 - File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2017-01-05 14:13:20,559 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'mode': 0644, 'configuration_attributes': {'final': {'fs.defaultFS': 'true'}}, 'owner': 'hdfs', 'configurations': ...}
2017-01-05 14:13:20,570 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml
2017-01-05 14:13:20,570 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2017-01-05 14:13:20,588 - File['/usr/hdp/current/hadoop-client/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'}
2017-01-05 14:13:20,590 - Directory['/var/lib/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'group': 'hadoop', 'mode': 0751}
2017-01-05 14:13:20,590 - Directory['/var/lib/ambari-agent/data/datanode'] {'create_parents': True, 'mode': 0755}
2017-01-05 14:13:20,594 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/home', '/usr', '/var', '/data01', '/data02', '/data03', '/data04', '/data05', '/data06', '/data07', '/data08', '/data09', '/data10', '/proc/sys/fs/binfmt_misc', '/repo'].
2017-01-05 14:13:20,595 - Mount point for directory /data02/hadoop/hdfs/data is /data02
2017-01-05 14:13:20,595 - Mount point for directory /data03/hadoop/hdfs/data is /data03
2017-01-05 14:13:20,595 - Mount point for directory /data04/hadoop/hdfs/data is /data04
2017-01-05 14:13:20,595 - Mount point for directory /data05/hadoop/hdfs/data is /data05
2017-01-05 14:13:20,595 - Mount point for directory /data06/hadoop/hdfs/data is /data06
2017-01-05 14:13:20,595 - Mount point for directory /data07/hadoop/hdfs/data is /data07
2017-01-05 14:13:20,596 - Mount point for directory /data08/hadoop/hdfs/data is /data08
2017-01-05 14:13:20,596 - Mount point for directory /data09/hadoop/hdfs/data is /data09
2017-01-05 14:13:20,596 - Mount point for directory /data10/hadoop/hdfs/data is /data10
2017-01-05 14:13:20,596 - Mount point for directory /data02/hadoop/hdfs/data is /data02
2017-01-05 14:13:20,596 - Forcefully ensuring existence and permissions of the directory: /data02/hadoop/hdfs/data
2017-01-05 14:13:20,597 - Directory['/data02/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,597 - Changing permission for /data02/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,597 - Mount point for directory /data03/hadoop/hdfs/data is /data03
2017-01-05 14:13:20,598 - Forcefully ensuring existence and permissions of the directory: /data03/hadoop/hdfs/data
2017-01-05 14:13:20,598 - Directory['/data03/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,598 - Changing permission for /data03/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,598 - Mount point for directory /data04/hadoop/hdfs/data is /data04
2017-01-05 14:13:20,599 - Forcefully ensuring existence and permissions of the directory: /data04/hadoop/hdfs/data
2017-01-05 14:13:20,599 - Directory['/data04/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,599 - Changing permission for /data04/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,599 - Mount point for directory /data05/hadoop/hdfs/data is /data05
2017-01-05 14:13:20,600 - Forcefully ensuring existence and permissions of the directory: /data05/hadoop/hdfs/data
2017-01-05 14:13:20,600 - Directory['/data05/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,600 - Changing permission for /data05/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,600 - Mount point for directory /data06/hadoop/hdfs/data is /data06
2017-01-05 14:13:20,600 - Forcefully ensuring existence and permissions of the directory: /data06/hadoop/hdfs/data
2017-01-05 14:13:20,601 - Directory['/data06/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,601 - Changing permission for /data06/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,601 - Mount point for directory /data07/hadoop/hdfs/data is /data07
2017-01-05 14:13:20,601 - Forcefully ensuring existence and permissions of the directory: /data07/hadoop/hdfs/data
2017-01-05 14:13:20,602 - Directory['/data07/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,602 - Changing permission for /data07/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,602 - Skipping failure of Directory['/data07/hadoop/hdfs/data'] due to ignore_failures. Failure reason:
2017-01-05 14:13:20,602 - Mount point for directory /data08/hadoop/hdfs/data is /data08
2017-01-05 14:13:20,603 - Forcefully ensuring existence and permissions of the directory: /data08/hadoop/hdfs/data
2017-01-05 14:13:20,603 - Directory['/data08/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,603 - Changing permission for /data08/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,603 - Mount point for directory /data09/hadoop/hdfs/data is /data09
2017-01-05 14:13:20,604 - Forcefully ensuring existence and permissions of the directory: /data09/hadoop/hdfs/data
2017-01-05 14:13:20,604 - Directory['/data09/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,604 - Changing permission for /data09/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,604 - Mount point for directory /data10/hadoop/hdfs/data is /data10
2017-01-05 14:13:20,604 - Forcefully ensuring existence and permissions of the directory: /data10/hadoop/hdfs/data
2017-01-05 14:13:20,605 - Directory['/data10/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2017-01-05 14:13:20,605 - Changing permission for /data10/hadoop/hdfs/data from 750 to 755
2017-01-05 14:13:20,609 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/home', '/usr', '/var', '/data01', '/data02', '/data03', '/data04', '/data05', '/data06', '/data07', '/data08', '/data09', '/data10', '/proc/sys/fs/binfmt_misc', '/repo'].
2017-01-05 14:13:20,609 - Mount point for directory /data02/hadoop/hdfs/data is /data02
2017-01-05 14:13:20,609 - Mount point for directory /data03/hadoop/hdfs/data is /data03
2017-01-05 14:13:20,610 - Mount point for directory /data04/hadoop/hdfs/data is /data04
2017-01-05 14:13:20,610 - Mount point for directory /data05/hadoop/hdfs/data is /data05
2017-01-05 14:13:20,610 - Mount point for directory /data06/hadoop/hdfs/data is /data06
2017-01-05 14:13:20,610 - Mount point for directory /data07/hadoop/hdfs/data is /data07
2017-01-05 14:13:20,610 - Mount point for directory /data08/hadoop/hdfs/data is /data08
2017-01-05 14:13:20,611 - Mount point for directory /data09/hadoop/hdfs/data is /data09
2017-01-05 14:13:20,611 - Mount point for directory /data10/hadoop/hdfs/data is /data10
2017-01-05 14:13:20,611 - File['/var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2017-01-05 14:13:20,613 - Directory['/var/run/hadoop'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0755}
2017-01-05 14:13:20,613 - Changing owner for /var/run/hadoop from 0 to hdfs
2017-01-05 14:13:20,613 - Changing group for /var/run/hadoop from 0 to hadoop
2017-01-05 14:13:20,614 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True}
2017-01-05 14:13:20,614 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True}
2017-01-05 14:13:20,615 - File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'}
2017-01-05 14:13:20,633 - Deleting File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid']
2017-01-05 14:13:20,633 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'}


일단 시작하는 것에 관해서는 별 문제가 없어 보이는데.. 자세히보면 Disk mount위치에 권한 관련 로그가 보인다.


실행되지 않고 다운되는 것으로 보아.. 문제가 있는 것 같아 해당 서버의 Datanode log를 확인해 보니 

아래와 같은 로그를 확인해 볼 수 있다. 




2017-01-05 14:13:24,632 INFO  datanode.DataNode (DataNode.java:initStorage(1386)) - Setting up storage: nsid=1743022974;bpid=BP-115950507-192.168.100.142-1481771120609;lv=-56;nsInfo=lv=-63;cid=CID-8874f4d0-1bf4-46a0-a858-7d385346ad10;nsid=1743022974;c=0;bpid=BP-115950507-192.168.100.142-1481771120609;dnuuid=90438e58-e394-4ce3-995d-9eec4a564aa2
2017-01-05 14:13:24,647 FATAL datanode.DataNode (BPServiceActor.java:run(807)) - Initialization failed for Block pool <registering> (Datanode Uuid 90438e58-e394-4ce3-995d-9eec4a564aa2) service to xxxx/192.168.100.142:8020. Exiting.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 8, volumes configured: 9, volumes failed: 1, volume failures tolerated: 0
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:289)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1396)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1348)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:795)
        at java.lang.Thread.run(Thread.java:745)
2017-01-05 14:13:24,647 WARN  datanode.DataNode (BPServiceActor.java:run(828)) - Ending block pool service for: Block pool <registering> (Datanode Uuid 90438e58-e394-4ce3-995d-9eec4a564aa2) service to xxxx/192.168.100.142:8020
2017-01-05 14:13:24,750 INFO  datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid 90438e58-e394-4ce3-995d-9eec4a564aa2)
2017-01-05 14:13:25,723 INFO  datanode.DataNode (DataXceiverServer.java:closeAllPeers(263)) - Closing all peers.
2017-01-05 14:13:25,725 ERROR datanode.DataNode (DataXceiver.java:run(278)) - xxxx:50010:DataXceiver error processing unknown operation  src: /192.168.101.124:40816 dst: /192.168.101.124:50010
java.io.IOException: Server closed.
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.addPeer(DataXceiverServer.java:221)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:187)
        at java.lang.Thread.run(Thread.java:745)
2017-01-05 14:13:26,750 WARN  datanode.DataNode (DataNode.java:secureMain(2520)) - Exiting Datanode
2017-01-05 14:13:26,752 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0
2017-01-05 14:13:26,753 INFO  datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG:



결국 디스크에 있는 disk volume을 제대로 가져오지 못한 것인데.. 확인해보니 해당 디스크가 read-only로 셋팅되어 있었다.. ( 뭐지? ) 


관련해서 hortonworks community를 확인해보니 관련 케이스가 있었다. 

https://community.hortonworks.com/questions/36427/unable-to-start-datanode-too-many-failed-volumes.html )  


장애 사항에 좀 더 유연하게 대응하고 싶다면 hdfs-site를 수정하여 대응하는 것도 방법이다. 


dfs.datanode.failed.volumes.tolerated  ( default : 0 ) 






'Bigdata > hdfs' 카테고리의 다른 글

[Hadoop Operation] Balancer not working in hdfs HA  (0) 2017.01.09
[Hadoop Operation] HDFS Balancer bandwidth 변경  (0) 2017.01.09
hdfs-audit Log4j 설정  (0) 2017.01.06
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함