Ever have one of those days when someone calls and says “We/I accidently deleted the whole directory; can you get it back for me”? Well, over the weekend I had that happen with an OEM 12c agent on an Exadata, where the core directory for the agent was deleted by mistake. Before I could evaluate the situation, I had to reassure the end user that the removal of the core directory under the agent home wasn’t a major issue. The agent was still running in memory and reporting to OEM. For all intensive-purposes the agent was still monitoring the Exadata node.
After the end user was assured that the problem could be fixed, the question became: How can the missing core directory be replaced?
The simplest way is to do a reinstall of the agent silently; however, this takes a lot of time and effort to get it working again. I wanted the shortest possible way to recover the directory so there would not be a huge window of unmonitored time.
In this post, what I want to show you how I recovered the agent’s core directory and didn’t loose any targets or have to resync the agent from OEM afterwards.
Note: I have a good post on agent silent installs located here which is helpful to understand some of the process that was used. Although, I cover every environment in my post for silent installs, Maaz Anjum covers silent installs for windows pretty well too; check it out here.
As I mention in the note above, I needed to pull the correct agent binaries from the OMS library (outlined in the silent install post). Once I had the binaries extracted to a temporary location, I needed to edit the response file (agent.rsp). The response file was edited according to the silent install post.
The values that were changed within the response file were for:
All the values for these variables need to match what the existing agent had (information can be found in OEM under Setup –> Manage Cloud Control –> Agents).
With the response file ready, before I can run a silent install; the currently running agent needs to be stopped. With the core directory gone, the only way to stop the running agent is to using “ps –ef | grep agent” and “kill –9 <process id>”.
Reminder: there will be two (2) process that need to be killed; one which is a Perl process and the other a Java process. Once the processes are killed the agent that was running is down.
Once the agent is down, installing the agent software using the silent install method can be done. Now, here is where the install process becomes different from a normal agent installation. Earlier I said the core directory under the agent home was deleted. This means that everything else is still in place; only have to relink the core directory of the binaries. How can I get only the core directory out of the binaries?
In trying to answer this question, I used the -help option with the agentDeploy.sh script. The –help option provides a few examples on how to use the agentDeploy.sh script. I have listed these examples below:
agentDeploy.sh AGENT_BASE_DIR=/scratch/agent OMS_HOST=hostname.domain.com EM_UPLOAD_PORT=1000 INVENTORY_LOCATION=/scratch AGENT_REGISTRATION_PASSWORD=2Bor02B4
This command is expected to do the complete agent install and configuration with the provided inputs.
agentDeploy.sh AGENT_BASE_DIR=/scratch/agt RESPONSE_FILE=/scratch/agent.rsp -softwareOnly -invPtrLoc /scratch/agent/oraInst.loc -debug
This command is expected to copy the agent bits to the agent base directory.
agentDeploy.sh AGENT_BASE_DIR=/scratch/agent OMS_HOST=hostname.domain.com EM_UPLOAD_PORT=1000 -forceConfigure
This command is expected to do the agent install and also force the agent configuration even though the oms host and port are not available.
agentDeploy.sh AGENT_BASE_DIR=/scratch/agent AGENT_INSTANCE_HOME=/scratch/agent/agent_inst -configOnly
This command is expected to do the agent configuration only with the provided inputs.
agentDeploy.sh AGENT_BASE_DIR=/scratch/agent s_agentHomeName=myAgent -ignorePrereqs
This command is expected to skip the prereqs and then continue with the agent deployment also notice in the inventory that instead of the default home name, myAgent home name will be assigned for the agent home.
As I looked at the examples, I noticed Example 2; a software only install. I decided to give that a try. Keep in mind all I needed was the core directory. The command I used to do a software only install was:
./agentDeploy.sh AGENT_BASE_DIR=/u01/app/oracle/product/agent12c RESPONSE_FILE=/tmp/agent_12030/agent.rsp –softwareOnly
As the deployment started, I noticed that the rebuilds and relinks for all the binaries was be performed on the the agent home. Once the deployment is done updating all the dependences, the deployment completes successfully and returned me to the command problem.
The software only deployment of the silent install replaced the missing core directory in the agent home. Now the only question left was will the agent start?
To test if the agent would start, I needed to go into the agent home:
$ cd /u01/app/oracle/product/agent12c/core/220.127.116.11.0/bin
$ ./emctl status agent
In running the above commands, I was expecting to see the agent status as being down since I have just completed the agent deployment. What I received instead is an unusual error. The error was:
$ ./emctl status agent
EM Configuration issue. #DEFAULT_EMSTATE# not found.
In researching this error (#DEFAULT_EMSTATE#) in My Oracle Support (MOS), there were only two notes that were found (1607805.1/1543473.1). From reading the notes and reviewing the emctl file under the core directory, I identified that the problem was a configuration problem. In order to fix this configuration problem, what needed to be done?
To make a long story short, the simplest way to fix this issue was to copy an emctl from another Exadata node. The reason why this was the simplest is due to all the nodes have the same agent home configurations. Once the updated emctl was put in place, I was able to start and get all the information I wanted from the agent.
With the agent running, my next question was what was OEM’s reaction to the agent being reconfigured/built this way? To my surprise, OEM didn’t have a problem. The agent was able to upload with no issues and OEM reported that no re-syncing was needed. The only thing I can conclude from this is that the configuration files were never deleted and when the core directory was relinked, OEM thinks everything is as it was before the core directory was deleted.