- Type: Bug
- Status:
- Priority: Major
- Resolution:Fixed
- Affects Version/s:None
- Fix Version/s:2.4.0
- Component/s:None
- Labels:None
티스토리 뷰
Ambari agent version : 2.1.2.1
ambari agent가 다량의 Swap을 사용중이라 OS hang을 유발할 수 있다는 모니터링 내용을 들었다.
( 은근 저 version에 다양한 이유가 많은 듯 하다.. )
어차피 지속적인 현상이 아니라 간헐적인 경우라 직접 디버깅은 어렵다고 봐야한다..;
관련해서 hortonwoks community에서 검색을 해봤더니 관련한 글이 나온다
Very high memory utilization by Ambari Agent
Short Description:
Performance issue due to memory Leak in Ambari agent
Article
ENVIRONMENT: All Ambari versions prior to 2.4.x
SYMPTOMS: Intermittent loss of heartbeat to cluster nodes, freeze of ambari-agent service, intermittent issues in Ambari alerts and service status updates in Ambari dashboard.
Ambari-agent logs:-
- INFO 2016-08-21 19:10:20,080 Heartbeat.py:78 - Building Heartbeat: {responseId = 139566, timestamp = 1471821020080, commandsInProgress = False, componentsMapped = True}ERROR
- 2016-08-21 19:10:20,102 HostInfo.py:228 - Checking java processes failedTraceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py", line 211, in javaProcs cmd = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()IOError: [Errno 2] No such file or directory: '/proc/24270/cmdline'
Top command output:
- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP TIME DATA COMMAND 10098 root 20 0 54.4g 53g 4540 S 54.5 14.0 18000:11 224 300,00 54g /usr/bin/python2 /usr/lib/python2.6/site-packages/ambari_agent/main.py start --expected-hostname=123.example.com
ROOT CAUSE: Race condition in subprocess python module. Due to this race condition, at some unlucky cases python garbage collection was disabled. This usually happened when running alerts, as a bunch of our alerts run shell commands and they do it in different threads. This is a known issue reported in AMBARI-17539.
SOLUTION: Upgrade to Ambari 2.4.x
WORKAROUND: Restart ambari-agent which would fix issue temporarily. Log a case with HWX support to get a patch for the bug fix.
이슈 번호를 확인해보자
AMBARI-17539
https://issues.apache.org/jira/browse/AMBARI-17539
ENVIRONMENT: All Ambari versions prior to 2.4.x
SYMPTOMS: Intermittent loss of heartbeat to cluster nodes, freeze of ambari-agent service, intermittent issues in Ambari alerts and service status updates in Ambari dashboard.
Ambari-agent logs:-
- INFO 2016-08-21 19:10:20,080 Heartbeat.py:78 - Building Heartbeat: {responseId = 139566, timestamp = 1471821020080, commandsInProgress = False, componentsMapped = True}ERROR
- 2016-08-21 19:10:20,102 HostInfo.py:228 - Checking java processes failedTraceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py", line 211, in javaProcs cmd = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()IOError: [Errno 2] No such file or directory: '/proc/24270/cmdline'
Top command output:
- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP TIME DATA COMMAND 10098 root 20 0 54.4g 53g 4540 S 54.5 14.0 18000:11 224 300,00 54g /usr/bin/python2 /usr/lib/python2.6/site-packages/ambari_agent/main.py start --expected-hostname=123.example.com
ROOT CAUSE: Race condition in subprocess python module. Due to this race condition, at some unlucky cases python garbage collection was disabled. This usually happened when running alerts, as a bunch of our alerts run shell commands and they do it in different threads. This is a known issue reported in AMBARI-17539.
SOLUTION: Upgrade to Ambari 2.4.x
WORKAROUND: Restart ambari-agent which would fix issue temporarily. Log a case with HWX support to get a patch for the bug fix.
이슈 번호를 확인해보자
AMBARI-17539
https://issues.apache.org/jira/browse/AMBARI-17539
Ambari Agent memory Leak fix.
Details
Description
Attachments
Issue Links
-
결국 Python GC부분에 대한 패치가 나와 있다. 관련한 내용은 2.4이상의 version에서 해결 되었다니 적용하면 될 것 같다.아니면 간단한 import 구문의 patch이니 직접 적용하는 것도 좋을 것 같다.
결국 Python GC부분에 대한 패치가 나와 있다.
관련한 내용은 2.4이상의 version에서 해결 되었다니 적용하면 될 것 같다.
아니면 간단한 import 구문의 patch이니 직접 적용하는 것도 좋을 것 같다.
'Bigdata > ambari' 카테고리의 다른 글
ambari metric collector Not running (0) | 2017.02.09 |
---|
댓글