Even Linux servers occasionally encounter an unrecoverable error, which may result in a system halt or server shutdown. Common causes for such errors are usually related to malfunctioning or misconfigured software, but troubleshooting the real culprit might not always be easy. Below are some simple steps you should follow to restore any services and narrow down the reason for the outage.
Get the server back running
When your web service no longer responds, and you cannot connect to your server via SSH, your system could have encountered an error that it cannot resolve and was forced to shut down to prevent any further damage. In this situation, your first step should be to log in to your UpCloud Control Panel and check your server status.
The indicator colour at the beginning of each server line tells you the state your server is currently in. Green means the server is running, red shows when the server is powered down, and yellow indicates a maintenance mode, for example, when the server is in the process of powering up or down. Click on your server details to go to the Server settings for more information.
If your server is shown to be shut down, click on Start in the Server Management options to power it up.
Once your server is on again and the status shows green, or if the server was already on, try to connect to it using the browser Console. Move to the Console tab and click on Open the console connection.
When the console connects, if your server’s login dialogue greets you, you may proceed using the console or connect using your preferred method. Usually, an SSH connection is again possible after logging in through the console once, even if it doesn’t work right after restarting the server.
If the first console connection does not land you at the login screen but instead shows something completely different, it might be helpful in later troubleshooting to take a screenshot of the console display before restarting the server.
Check the logs for possible cause
With your server up and running, you should check the logs as usual for any indication of what could have caused the system shutdown. Start by looking into when the system was shut down. One handy command for this is the following.
sudo last -1x shutdown
The printout line will tell when the most recent shutdown occurred.
shutdown system down 3.16.0-4-amd64 Fri Jul 24 12:41 - 12:45 (00:04)
Like in the example output above the system was last shut down on the 24th of July at 12:41. The second part of the time interval estimates the time when the system was again turned on, and the last time in ( ) brackets counts the downtime.
For another last command option to find when the server was previously restarted, use the next command.
sudo last -1x reboot
The example output for this command, shown below, greatly resembles the shutdown version, with minor differences in time stamp meanings.
reboot system boot 3.16.0-4-amd64 Fri Jul 24 12:52 - 15:37 (02:45)
This print indicates the system was last started on July 24th at 12:52. The second part shows when the log was last updated, and the last time tells the system uptime when the latest update was made.
Depending on the reason why the system was shut down or rebooted, different levels of logs might be incomplete, as the error might have caused services responsible for log writing to cease functioning. This is important to remember when checking logs, as they might not have recorded everything leading up to the system shutdown. So make sure that you go back far enough in the logs to be able to find anything relevant. This is usually 10 to 20 minutes before the shutdown.
A good way to start troubleshooting with logs is to use this handy command to search through all the various logs at once.
sudo grep -E -i -r ’error|warning|panic’ /var/log/
Important logs to look out for are syslog, messages and dmesg.
The command line tool ‘grep’ is powerful in filtering logs, and you may wish to narrow the search down to specific files or try different keywords to search for. Here is a quick explanation of the parameters used in our example search: -E allows the use of extended expressions like multiple different words separated by | as used here, -i tells grep to ignore upper and lower cases, and finally, -r makes the search recursive allowing you to search from all the files in the specified folder.
Check for any announcements
In some situations, a server shutdown might have been caused by maintenance or other actions to the underlying hardware. In such cases, UpCloud will always notify affected customers via the email used to register. It is important that you keep your contact details up to date so that you’ll receive the notification. You can check and update your contact information in your UpCloud Control Panel using your contact details.
Another source for information on the UpCloud services is http://status.upcloud.com/, where you can subscribe to updates on additional email accounts, SMS notices, and Atom or RSS feeds.