Agent running yet OEM shows unreachable/metric collection error…what gives?

One of the most confusing (and frustrating) things with Oracle Enterprise Manager is figuring out why agents are not uploading from time-to-time.  This issue was worse in previous versions; in OEM12c the uploading issues have been somewhat corrected and do not seem to be that big of an issue.  What I have found to be more of an issue with OEM12c is when the agent says it is unreachable within OEM12c.
An unreachable agent could be almost anything, a firewall blocking the required upload port, invalid DNS entries, and hostname configuration issues to network related issues.  In most of these cases, except for agent configuration issues, the DBA doesn’t have access to resolve these issues and require the assistance of different departments within the IT.  Sometimes these external resources are not available to help troubleshoot and resolve network issues; example, if you are trying to configure OEM12c at home as I have.
The problem I recently ran into while adding an agent to a local server without DNS; was that the agent would install and run; yet OMS couldn’t sync it due to metric collection errors.  The status of the agent after the install was up and running yet OEM still wouldn’t recognize it.  This prompted me to uninstall the agent and push the agent again.  From the OEM12c management server, I was able to push the agent and add the target successfully though the Add Host Targets wizard.  Although the host target is added successfully, the status in OEM12c still showed “Unreachable” due to Metric Collection Error.   Why is this target unreachable, I just added the agent and didn’t have any problems with the push form OEM12c?
I was perplexed to say the least; however, in searching MOS I came across a note that helped resolve this issue.  The note number for reference is: 1440682.1.  This note outlines similar symptoms to what I was having and provided a workable solution.  What I found interesting in this note was the fact that this issue is an unpublished bug.  The note also gives examples of what messages may be received from OEM via notification messages.
The incorrect message that may be received is (via notification emails):
Message=Agent is Unreachable (REASON = unable to connect to the agent at https://hostname.domain:3872/emd/main/ [Connection timed out]). Host is unreachable (REASON = Unknown Error pinging the host of URL https://hostname.domain:3872/emd/main/.1)
As outlined in the MOS note, the workaround for this issue is to check to see if the ping property is set.  If so, then it needs to be disabled to allow for the target host status to be changed.  The steps below will assist in resolving this issue:
On the OMS, check the property via emctl
./emctl get property –name oracle.sysman.core.omsAgentComm.ping.pingCommand
If it returns this status:
Emdrep.ping.pingCommand=%EM_PING_COMMAND%
This is the reason for the invalid agent status and metric collection error in OEM12c.
Disable the ping command:
./emctl delete property –name “oracle.sysman.core.omsAgentComm.ping.pingCommand”
With this property removed, the OMS will ping targets using an alternative successful method (getPingCmdForOS).
Now stop and restart the OMS:
./emctl stop oms
./emctl start oms
Lastly, ensure that the agent is started and that the status in OMS is saying up.  (may take a few minutes due to agent uploads)
Let me know what you thing about this resolution to this interesting problem.

Please follow and like:
Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Enquire now

Give us a call or fill in the form below and we will contact you. We endeavor to answer all inquiries within 24 hours on business days.