It is occasionally possible for even Linux servers to run into an unrecoverable error, which in turn may result in system halt or server shutdown. Common causes for such errors are usually related to malfunctioning or misconfigured software, but troubleshooting for the real culprit might not always be easy. Outlined below are some simple steps you should go through in order to restore any services and narrow down the reason for the outage.
Get the server back running
When your web service no longer responds and you cannot connect to your server via SSH, it could be that your system encountered an error which it could not resolve, and was forced to shut down to prevent any further damage. In this kind of situation, your first step should be to log in to your UpCloud control panel and check your server status.
The indicator colour at the beginning of each server line will tell the state your server is currently in. Green would mean the server is running, red is shown when the server is powered down, and yellow is a maintenance mode, for example, when the server is in a process of powering up or down. Click on your server details to go in the Server settings for more information.
If your server is shown to be shut down, click on Start in the Server Management options to power it up.
Once your server is on again, and the status shows green, or if the server was already on, try to connect to it using the browser Console, move to the Console tab and click on Open the console connection.
When the console connects, if you are greeted by your server’s login dialogue, you may proceed using the console or by connecting with your preferred method. Usually, SSH connection is again possible after logging in through console once, even if it didn’t work right after restarting the server.
If the first console connection does not land you at the login screen but instead shows something completely different, it might be helpful in later troubleshooting to take a screenshot of the console display before restarting the server.
Check the logs for possible cause
With your server up and running, you should check the logs as usual for any indication as to what could have caused the system shutdown to begin with. Start by looking into when the system was shut down. One handy command for this is the following.
sudo last -1x shutdown
The printout line will tell when the most recent shutdown occurred.
shutdown system down 3.16.0-4-amd64 Fri Jul 24 12:41 - 12:45 (00:04)
Like in the example print above the system was last shut down on 24th of July at 12:41. The second part of the time interval estimates the time when the system was again turned on, and the last time in ( ) brackets counts the downtime.
For another last command option to find when the server was previously restarted, use the next command.
sudo last -1x reboot
An example output for this command as shown below resembles greatly the shutdown version with the minor differences on time stamp meanings.
reboot system boot 3.16.0-4-amd64 Fri Jul 24 12:52 - 15:37 (02:45)
This print indicates the system was last started on 24th of July at 12:52, the second part shows when the log was last updated, and the last time tells the system uptime when the latest update was made.
Depending on the reason why the system was shut down or rebooted, different levels of logs might be incomplete, as the error might have caused services responsible for log writing to cease functioning. This is important to remember when checking logs, as they might not have recorded everything leading up to the system shut down. So make sure that you go back far enough in the logs to be able to find anything relevant. This is usually 10 to 20 minutes before the shutdown.
A good way to start troubleshooting with logs is to search through all the various logs at once, use this handy command.
sudo grep -E -i -r ’error|warning|panic’ /var/log/
Important logs to look out for are syslog, messages and dmesg.
The command line tool ‘grep’ is powerful in filtering logs, and you may wish to narrow the search down to specific files or try different keywords to search for. Here is a quick explanation on the parameters used in our example search: -E allows the use of extended expressions like multiple different words separated by | as used here, -i tells grep to ignore upper and lower cases, and finally -r makes the search recursive allowing you to search from all the files in the specified folder.
Check for any announcements
In some situations a server shut down might have been caused by maintenance or other actions to the underlying hardware. At such cases UpCloud will always notify affected customers via the email used to register, it is important that you keep your contact details up to date so that you’ll receive the notification. You can check and update your contact information in your UpCloud control panel at your contact details.
Another source for information on the UpCloud services can be found at http://status.upcloud.com/ where you can subscribe to updates on additional email accounts, SMS notices, and Atom or RSS feeds.