Perforce Public Knowledge Base - Fixing a hung Helix Sever
Reset Search
 

 

Article

Fixing a hung Helix Sever

« Go Back

Information

 
Problem
What steps should I take if my Helix server is not responding?
Solution
If the Helix Server is not responding for a user, follow these steps to isolate the problem.  It is important to determine whether the Helix Server is down, slow, or hung.

1. Run p4 info from the end user workstation.

First and foremost, have the end user run p4 info.   Have the user with the problem run

p4 -p <server IP>:1666 info 

where 1666 is the port.  The IP address will bypass DNS.  If p4 is not installed have the user download the P4: Command-Line Client.  The p4 info command uses very few resources on the Perforce database.

If a p4 info error comes back with a "check $P4PORT" error, double-check that you entered the proper IP address and port number after the -p flag.  A "check $P4PORT" error indicates a network connectivity issue, a firewall issue, or most likely, Perforce is down.

If p4 info does return, then the Helix Server is up.  The Helix Server may be slow, but it is not down.

If p4 info does return quickly, have the user run a command that uses more resources like

p4 changes -t -m 10

Check how fast this runs, or if it hangs. 

You will be able to know if the Helix Server is up but waiting on the database if you can get output from

p4 lockstat
p4 lockstat -C

The "p4 lockstat" command will let you know if the Helix Server is currently processing commands that are currently locking the database.  If database tables are locked, run this command repeatedly to check whether the database locks are freeing up.

The "p4 lockstat -C" will check for client or global metadata locks as seen in Client Workspace and Global Metadata Locks

If locks are present, you may have to ask the client to stop their processes or perhaps reboot their workstation, or run

p4 monitor show
p4 monitor terminate pid

Before running "p4monitor terminate", make sure the db.monitor.interval is turned on by checking

p4 configure show allservers

You can set db.monitor.interval by running

p4 configure set db.monitor.interval=30


2. Run p4 info while logged into the Helix Server.
 
    Remote desktop or ssh into the Helix Server

A.  If p4 info does not run on the Helix Server

If p4 info returns on the server, then the Helix Server is up. But if p4 info cannot connect, then check if the Helix Server parent process is running.
On Unix, run

ps -elf | grep p4d 

Make sure you do not mistake the grep command line for the actual p4d process.

On Windows, check that the Perforce service under Control Panel, Services, is running.

If you cannot run p4 info on the Helix Server server, the server is basically unusable so you might as well stop the Helix Server. Check whether CPU and memory are adequate. 

On Unix, run

top

and press the number 1 to see each processor.  Determine if the p4d process is consuming all the CPU or memory.

On Windows, run Task Manager and determine if the p4d or p4s process is consuming all the CPU or memory.

Note that overall CPU may be low, but a single processor may be at 100% CPU.

Kill the parent (not child!) pid, then wait five minutes (running "ps -elf | grep p4d" frequently) and wait for all p4d processes to exit. On Unix, use the Unix "kill" or "kill -15" command. There is usually no need for "kill -9". On Windows servers, kill the parent process by stopping the Perforce service or by using Task Manager. After five minutes, if the child processes are not being removed, make sure that your running journal is not growing, then kill some of the child p4d processes until the rest are terminating normally. Then wait until all p4d or p4s processes are gone and restart Perforce. If Perforce does not start, view the Perforce log for clues. 

B.  If  p4 info does run properly on the Helix Server

But if p4 info does work properly on the Helix Server, the server is up. If p4 info runs without errors on the server, but not at the end user, there is a firewall issue. But assuming p4 info runs, then try a larger resource command like

p4 changes -t -m 10

If this command hangs, the Helix Server is running slowly.
In any case, run

p4 lockstat 

to see if the Helix Server database is locked. If the database is locked, run

p4 monitor show -ael 

and look at some of the oldest commands that are not a form to fill out. (Ignore forms like "p4 client" that may be the oldest process, but it is just waiting for the user to complete the form).  While the oldest command times may not always be accurate, it provides candidates to kill.  To free up commands that are locking the Helix Server, contact the user to stop their command, or assuming db.monitor.interval is set up as described earlier, run

p4 monitor terminate pid

where the<pid> is found from "p4 monitor show -ael".

But if the Perforce database is not locked per p4 lockstat, check that the Helix Server journal is not growing.

tail -f <pathto>/journal 

If the journal is growing rapidly, then the Helix Server is processing commands.  Simply wait for the command running to complete.  Killing a sub-process that is accessing the Helix Server can corrupt the database files.

If the journal seems sluggish, then check hardware resources for adequate CPU and memory.  On Unix, use

top

and on Windows, use Task Manager.  If "top" or Task Manager shows a lack of memory or 100% CPU, find the process or thread that uses up the memory or CPU. 

In any case, you may want to use "p4 monitor terminate <pid>" to remove a process or thread.  If CPU and memory is at 100%, you may have to run the command with less files (such as running "p4 revert" a smaller directory at a time)..
 

3.  From here, you will have plenty to go on.
  1. From p4 lockstat and  p4 lockstat -C, you will know if the Helix Server is running, but a command is locking the database.
  1. From running p4 info commands on the same machine as the server, you will isolate network issues from your client to the server
  1. From "top" or Task Manager, you will know whether you are out of CPU or memory.
     
  2. From "p4 monitor show -ael", you will know if the Helix Server is overwhelmed with processes and you can start guessing which process you can stop by running "p4 monitor terminate".
Feel free to run "p4 monitor terminate <pid>" anytime. This is a safe way to kill processes that will not harm the Perforce database. If you do this, let the end user know you terminated their command.

If you need off-hours support, note that each of our UK, Canada, US, and Australia offices will support you from Monday through Friday excluding holidays.  Use our support page and dial our international number, or send a new email to support@perforce.com.  If you have an existing case number, place this into the body of the message. 
Related Links

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255