티스토리 뷰


Ambari agent version : 2.1.2.1 



ambari agent가 다량의 Swap을 사용중이라 OS hang을 유발할 수 있다는 모니터링 내용을 들었다. 
( 은근 저 version에 다양한 이유가 많은 듯 하다.. )

어차피 지속적인 현상이 아니라 간헐적인 경우라 직접 디버깅은 어렵다고 봐야한다..; 
관련해서 hortonwoks community에서 검색을 해봤더니 관련한 글이 나온다



Very high memory utilization by Ambari Agent


Short Description:

Performance issue due to memory Leak in Ambari agent

Article

ENVIRONMENT: All Ambari versions prior to 2.4.x

SYMPTOMS: Intermittent loss of heartbeat to cluster nodes, freeze of ambari-agent service, intermittent issues in Ambari alerts and service status updates in Ambari dashboard.

Ambari-agent logs:-

  1. INFO 2016-08-21 19:10:20,080 Heartbeat.py:78 - Building Heartbeat: {responseId = 139566, timestamp = 1471821020080, commandsInProgress = False, componentsMapped = True}ERROR
  2. 2016-08-21 19:10:20,102 HostInfo.py:228 - Checking java processes failedTraceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py", line 211, in javaProcs cmd = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()IOError: [Errno 2] No such file or directory: '/proc/24270/cmdline'

Top command output:

  1. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP TIME DATA COMMAND 10098 root 20 0 54.4g 53g 4540 S 54.5 14.0 18000:11 224 300,00 54g /usr/bin/python2 /usr/lib/python2.6/site-packages/ambari_agent/main.py start --expected-hostname=123.example.com

ROOT CAUSE: Race condition in subprocess python module. Due to this race condition, at some unlucky cases python garbage collection was disabled. This usually happened when running alerts, as a bunch of our alerts run shell commands and they do it in different threads. This is a known issue reported in AMBARI-17539.

SOLUTION: Upgrade to Ambari 2.4.x

WORKAROUND: Restart ambari-agent which would fix issue temporarily. Log a case with HWX support to get a patch for the bug fix.



이슈 번호를 확인해보자 

AMBARI-17539

https://issues.apache.org/jira/browse/AMBARI-17539


Ambari Agent memory Leak fix.

Details

    • Type: Bug
    • Status:RESOLVED
    • Priority: Major
    • Resolution:Fixed
    • Affects Version/s:None
    • Fix Version/s:2.4.0
    • Component/s:None
    • Labels:
      None

    Description

      Reason of memory leak:
      Race condition in subprocess python module. 
      Due to this race condition at some unlucky cases python garbage collection was disabled. 
      This usually happened when running alerts, as a bunch of our alerts run shell commands and they do it in different threads.

      Fix for the issue:
      Synchronizing subprocess is not the best option. Since some people can still use it without synchronization not knowing about the issue. 
      Also synchronizing will provide some unnecessary slowdown. So for this issue the proposed fix is to monkey patch subprocess.gc.isenabled.

      Attachments

      1. Text File
        AMBARI-17539.patch
        1 kB

      Issue Links






        결국 Python GC부분에 대한 패치가 나와 있다. 
        관련한 내용은 2.4이상의 version에서 해결 되었다니 적용하면 될 것 같다.
        아니면 간단한 import 구문의 patch이니 직접 적용하는 것도 좋을 것 같다. 











         












        'Bigdata > ambari' 카테고리의 다른 글

        ambari metric collector Not running  (0) 2017.02.09
        댓글
        공지사항
        최근에 올라온 글
        최근에 달린 댓글
        Total
        Today
        Yesterday
        링크
        «   2024/05   »
        1 2 3 4
        5 6 7 8 9 10 11
        12 13 14 15 16 17 18
        19 20 21 22 23 24 25
        26 27 28 29 30 31
        글 보관함