Saturday, March 27, 2010

LOADER BACKLOG : - OEM GRID CONTROL

Loader backlog (files) in OEM


Loader is a part of the Management Service that pushes metric data into the Management Repository at periodic intervals. when there is data pending load the Loader Backlog chart indicates that the backlog is high and Loader output is low, which may indicate a system bottleneck or the need for another Management Service. The chart shows the total backlog of files totaled over all Oracle Management Services for the past 24 hours. Click the image to display loader backlog charts for each individual Management Service over the past 24 hours.

Somtimes we face that /ora is 100% full and it becomes difficult to start the services using "opmnctl" and will throu errors like

ahc55(grid):/ora/product/oem/10203/oms10g/opmn/bin>opmnctl startall

opmnctl: starting opmn and all managed processes...

================================================================================

opmn id=ahc55:6200

5 of 6 processes started.



ias-instance id=EnterpriseManager0.ahc55

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

ias-component/process-type/process-set:

HTTP_Server/HTTP_Server/HTTP_Server



Error

--> Process (pid=28195)

failed to start a managed process after the maximum retry limit

Log:

/ora/product/oem/10203/oms10g/opmn/logs/HTTP_Server~1



In this case we need to clear the Sysman and Apache logs to make room for that or if this fails the we need to restart the reporsitory database.



we can use "():/ora/product/oem/10203/oms10g/opmn/bin>opmnctl status" command to check the status of this



[opmnctl is the supported tool for starting and stopping all components in an Oracle instance, with the exception of the Fusion Middleware Control Console. opmnctl provides a centralized way to control and monitor Oracle Application Server components from the command line]



opmnctl status


It generates the list of process running



ahc55():/ora/product/oem/10203/oms10g/opmn/bin>opmnctl status



Processes in Instance: EnterpriseManager0.ahc55

-------------------+--------------------+---------+---------
ias-component process-type pid status
-------------------+--------------------+---------+---------
DSA  DSA  N/A  Down
HTTP_Server HTTP_Server 17007 Alive
LogLoader logloaderd N/A Down
dcm-daemon dcm-daemon N/A Down
OC4J home 17008 Alive
OC4J OC4J_EM 17010 Alive
WebCache WebCache 17011 Alive
WebCache WebCacheAdmin 17012 Alive


Solving ths issue
Normally we follow these steps to solve this issue..if every things fails we restart the repository database

Steps

Solution :


1) We need to clear the Apache/Sysman logs

2) Stop and Start the opmnctl

3) File upload should start

4) If that fails the step 5

5) If everything fails we need to start the database [pgrid]

1) Check the disk space

ahc55():/ora>df -k /ora

Filesystem kbytes used avail capacity Mounted on

/ora 18588650 18402767 0 100% /ora

2) Check the file upload status for backlog files at loader console or….

Or we can check this from prompt

ahc55():/ora/product/oem/10203/oms10g/sysman/recv>ls wc -l

93455

ahc55():/ora/product/oem/10203/oms10g/sysman/recv>ls wc -l

93487

ahc55():/ora/product/oem/10203/oms10g/sysman/recv>ls wc -l

93530

ahc55():/ora/product/oem/10203/oms10g/sysman/recv>ls wc -l

93566



If it is increasing the we need to follow the next steps

[this number should decrease instead of increase]





3) Stop and start the OMS



a) ahc55(grid):/ora/product/oem/10203/oms10g/opmn/bin>opmnctl stopall

b) ahc55(grid):/ora/product/oem/10203/oms10g/opmn/bin>opmnctl startall



4) Check if the file count is decreasing if not then follow next


Clean /ora/product/oem/10203/oms10g/Apache/Apache/logs



NOTE:- Except fastcgi and httpd.pid you can move all to /tmp



ahc55():/ora/product/oem/10203/oms10g/Apache/Apache/logs>ls -ltr

total 45502

drwx------ 3 oracle oinstall 512 Nov 27 15:35 fastcgi

-rw------- 1 oracle oinstall 1056768 Mar 22 11:35 mm.23113.mem

-rw------- 1 oracle oinstall 0 Mar 22 11:35 mm.23113.sem

-rw-r--r-- 1 oracle oinstall 0 Mar 22 11:35 ssl_request_log

-rw-r--r-- 1 oracle oinstall 6 Mar 22 11:35 httpd.pid

-rw------- 1 oracle oinstall 1056768 Mar 22 11:35 mod_oc4j.23113.shm.mem

-rw------- 1 oracle oinstall 0 Mar 22 11:35 mod_oc4j.23113.shm.sem

-rw------- 1 oracle oinstall 0 Mar 22 11:35 ssl_mutex.23113

-rw------- 1 oracle oinstall 0 Mar 22 11:35 ssl_scache.sem

-rw-r--r-- 1 oracle oinstall 257 Mar 22 11:35 ssl_engine_log

-rw------- 1 oracle oinstall 0 Mar 22 11:35 dms_metrics.23113.shm.sem

-rw------- 1 oracle oinstall 3072000 Mar 22 11:35 dms_metrics.23113.shm.mem

-rw------- 1 oracle oinstall 1572864 Mar 22 11:41 ssl_scache.mem

-rw-r--r-- 1 oracle oinstall 892604 Mar 22 12:01 error_log

-rw-r--r-- 1 oracle oinstall 11694 Mar 22 12:41 error_log.1269216000

-rw-r--r-- 1 oracle oinstall 454916 Mar 22 12:59 access_log.1269216000

-rw-r--r-- 1 oracle oinstall 5070917 Mar 22 19:06 access_log.1269259200

-rw-r--r-- 1 oracle oinstall 14718281 Mar 22 19:08 access_log

ahc55():/ora/product/oem/10203/oms10g/Apache/Apache/logs>



Next go to SYSMAN/log and clear the Logs except pafLogs (as we will be having space to restart the OMS)

ora/product/oem/10203/oms10g/sysman/log



ahc55():/ora/product/oem/10203/oms10g/sysman/log>ls -ltr

total 9794

drwxr-xr-x 2 oracle oinstall 512 Jul 30 2008 pafLogs

-rw-r--r-- 1 oracle oinstall 2498438 Mar 22 19:03 emoms.log

-rw-r--r-- 1 oracle oinstall 2498438 Mar 22 19:03 emoms.trc

ahc55():/ora/product/oem/10203/oms10g/sysman/log>



Check if uploading happening

ahc55(grid):/ora/product/oem/10203/oms10g/sysman/recv>ls wc -l

94045

ahc55(grid):/ora/product/oem/10203/oms10g/sysman/recv>ls wc -l

93902

2 comments:

  1. Well explained.
    Thanks for sharing the real time experience.
    Rajesh Yogi

    ReplyDelete
  2. Hi Vijay,

    Your post is interesting. I've a similar issue:
    I've a huge loader backlog. There are now some 67000 files in $OMS_HOME/sysman/recv and that amount is increasing. I did stop the HTTP server in order to stop metrics coming in and let Oracle time to load in those files. But that went very slowly. In about 1 hour I saw a decrease of 200 files. to slow and I can't stop the http server for a prolonged time.
    I've OMS 10.2.0.1 and rdbms 10.2.0.4 undex Linux.
    Any ideas how to decrease the backlog? I did create a TAR but Oracle advised to upgrade to OMS 10.2.0.5 or 11g . I think it's a wise advise but can't upgrade right now.

    regards,

    Ivan

    ReplyDelete