OpenNMS Failing to Start

This will outline some troubleshooting steps to take when OpenNMS is refusing to start. In this scenario, the service looked like the following, before eventually stopping:

OpenNMS.Eventd         : start_pending
OpenNMS.Trapd          : start_pending
OpenNMS.Queued         : start_pending
OpenNMS.Actiond        : start_pending
OpenNMS.Notifd         : start_pending
OpenNMS.Scriptd        : start_pending
OpenNMS.Rtcd           : start_pending
OpenNMS.Pollerd        : start_pending
OpenNMS.PollerBackEnd  : start_pending
OpenNMS.Ticketer       : start_pending
OpenNMS.Collectd       : start_pending
OpenNMS.Discovery      : start_pending
OpenNMS.Vacuumd        : start_pending
OpenNMS.EventTranslator: start_pending
OpenNMS.PassiveStatusd : start_pending
OpenNMS.Statsd         : start_pending
OpenNMS.Provisiond     : start_pending
OpenNMS.Reportd        : start_pending
OpenNMS.Alarmd         : start_pending
OpenNMS.Ackd           : start_pending
OpenNMS.JettyServer    : start_pending
opennms is partially running
[00:11:56]-> service opennms status
Could not connect to 127.0.0.1 on port 8181 (OpenNMS might not be running or could be starting up or shutting down): Connection refused
opennms is stopped

With the web server rendering:

Service Temporarily Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

A key step is going to be analysing the daemon logs to assess any potential issues, which can be found in: /var/log/opennms/daemon/ (tail -f /var/log/opennms/daemon/* to show them in real-time, as you start the service in another terminal).

You may see some exceptions like the following, but they are commonly only shown as a symptom of another problem. AKA. it is unable to call stop because the service could never start:

2014-06-15 00:19:29,166 DEBUG [Main] Invoker: Invoking stop on object OpenNMS:Name=Vacuumd
2014-06-15 00:19:29,172 ERROR [Main] Invoker: An error occurred invoking operation stop on MBean OpenNMS:Name=Vacuumd: javax.management.RuntimeMBeanException: java.lang.NullPointerException
javax.management.RuntimeMBeanException: java.lang.NullPointerException
2014-06-15 00:19:29,231 DEBUG [Main] Manager: Thread dump completed.
2014-06-15 00:19:29,232 DEBUG [Main] Manager: memory usage (free/used/total/max allowed): 47401760/116815072/164216832/1200160768
2014-06-15 00:19:29,232 INFO  [Main] Manager: calling System.exit(1)
An error occurred while attempting to start the "OpenNMS:Name=Notifd" service (class org.opennms.netmgt.notifd.jmx.Notifd).  Shutting down and exiting.
javax.management.RuntimeMBeanException: java.lang.reflect.UndeclaredThrowableException
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)

However, this is indicative of a startup problem and a common reason for this is broken config files. A good step would be to have a close look at any files yourself or others have modified and perform a check on them. Else, you can parse the XML for syntax errors using xmllint:

xmllint --noout /etc/opennms/*xml

At least in my case, this showed that there was a missing “<" at the beginning of a file I had been working on, but had somehow accidentally removed this character.

notifd-configuration.xml:1: parser error : Start tag expected, ‘<' not found
?xml version="1.0" encoding="UTF-8"?>
^

Fix this up and perform another service start on OpenNMS. It can take a while to start but you can keep checking the status to see if the service remains active until it eventually gets into a permanent working state.

Read More

Salt Failing to Highstate due to Keys on Upgrade to 2014

Upon upgrading from a 17.x to 2014 version of Saltstack, some care needs to be taken to switch out the keys in the process. “Deleting” the key (with ‘salt-key -D’) will not be sufficient on its own.

Upon upgrading an existing instance of a Salt Master, you may notice issues when attempting to highstate a Minion (even a Minion and Master on the same box). Running the highstate in debug mode will identify delay and repeated attempts to load in keys.

[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/master.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/master.conf
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem

The solution is to delete everything within /etc/salt/pki/* on Master and Minion. Then, delete the existing key from the Mater (salt-key -D or salt-key -d $key). Proceed to restart Minions and Master. You can test connectivity via “sudo salt \* test.ping” and proceed with another highstate to see if the problem remains.

Read More

Apache Redirect to Alternate Domain Whilst Passing URI

Moving a site such as domain.com/oldblog to domain.com/blog using Apache/Apache2. The requirements were to maintain the URI and 301 for SEO purposes. The last rule will carry across the URI, so hitting domain.com/oldblog/something/wat.jpg will forward through to domain.com/blog/something/wat.jpg . The first rule is to catch it when you hit domain.com/oldblog directly. The second query catches the trailing slash, so redirects when you hit domain.com/oldsite/ . The lack of “L” in the first two rules tells the web server that this is not the final rule for this pattern, and to continue parsing other rewrites.

RewriteRule ^/oldblog$ /blog [R=301]
RewriteRule ^/oldblog/$ /blog [R=301]
RewriteRule ^/oldblog.* /blog%{REQUEST_URI} [R=301,L]

Read More

Cloudflare and Transfer-Encoding of Absent or Chunking in a 403 Java App

There was a problem where a java application was returning this error, after a recent switch to Cloudflare:

Application Error
Application Error	 	A general application error has occurred. 
(java.io.IOException) 
Server returned HTTP response code: 403 for URL: https://url.com:443/somepath.xml

It was noticed that the HTTP response header of “Transfer-Encoding:chunked” through Cloudflare, where it was otherwise present when hitting the web server directly.

One option was to look into Cloudflares different levels of caching. They have basic, simplified and aggressive.

Transfer-encoding denotes the transfer method used by HTTP to transfer data to the user. It can be chunked etc.

It turns out the problem was the Cloudflare Web Application firewall was adding in rules to block IPs. Due to this feature being hard to find in the current design of their website, you can find and configure it here: https://www.cloudflare.com/waf . Please note this is only for paid accounts.

Read More

Github Pull Request HTTP Request Failed

Receiving the message below when attempting a git push on a repo can be caused by a few things, but take note that it may be a permissions issue. If you don’t have permissions to create and push a new branch or push to the master, this message will occur.

error: The requested URL returned error: 403 while accessing https://github.com/someproject.git/info/refs
fatal: HTTP request failed> 

If you are looking to create a pull request to a project of this nature, the process will be to fork the project, commit to your fork, then navigate to the original project and click through to create a pull request, then select to compare to a remote fork and it will pick up your commits. This will allow you to create a pull request on the original project. For the future, you may also like to set the upstream in the fork to the original, so you are able to git fetch and merge changes from the original to your clone.

Read More

Postfix and Sendmail MTA Rejecting Mail

A recent issue I’ve experienced is that a person is unable to send mail to a certain email address from a script operating on another server, using their local MTA. Emails from all other places can be forwarded through to this address successfully. Investigation on the receiving server shows no evidence of rejection in mail log.

Sending server sees something like the following, whilst attempting to send a message via telnet for localhost on port 25:

Recipient address rejected: User unknown in virtual alias table

This is a bit confusing at first because it can be comprehended to think that the receiving server is incorrectly configured/inappropriately set up to not receive mail for this alias. In reality, the issue is actually with the sending server. Grepping around in /etc/postfix on sending shows:

virtual:$domain OK | virtual:problemalias@$domain.com $mailbox 

Delete the configuration from the local server to fix the issue, else arrange to relay via an alternate MTA.

Read More

Presence of javac (Java Compiler)

Even with some openjdk jdk or jre installed, javac may not be present on a Debian system. You may receive a message like “The program ‘javac’ can be found in the following packages:” or “bash: javac: command not found” when trying to use it.

It may be worth seeing if you can “find” anything with ‘javac’ presence in /usr/lib/jvm/*, or if you have been playing with multiple versions, what’s configured on /etc/alternatives or /usr/bin .

Otherwise, javac can be pulled in via the following:

 apt-get install default-jdk

You should then be able to interface by typing “javac” in your shell or manipulating /usr/bin/javac , when is a symlink to /etc/alternatives/javac , when could a symlink to [depending on architecture] /usr/lib/jvm/java-6-openjdk-amd64/bin/javac

Note you may also want to try the following on centos:

yum install java-devel

Read More

Troubleshooting Python/Django and Mysql/Percona Display Error

Here are a few quick steps in relation to a Django website that began rendering an error page, with no apparent cause. The developer claimed no changes to the codebase had been made and timestamps of relevant files and recent logins seemed to support this notion. This ended up being an unusual environment problem and the steps below cover some basic Google troubleshooting steps that failed and propose a solution at the end.

It was discovered that some packages had been upgraded as part of scheduled updates:

Mar 18 23:01:45 Updated: Percona-Server-shared-55-5.5.36-rel34.1.el5.x86_64
Mar 18 23:01:45 Updated: Percona-Server-client-55-5.5.36-rel34.1.el5.x86_64
Mar 18 23:02:12 Updated: Percona-Server-server-55-5.5.36-rel34.1.el5.x86_64
Mar 18 23:02:13 Updated: gnutls-1.4.1-14.el5_10.x86_64
Mar 18 23:02:13 Updated: postgresql-libs-8.1.23-10.el5_10.x86_64
Mar 18 23:02:16 Updated: tzdata-2014a-1.el5.x86_64
Mar 18 23:02:17 Updated: tzdata-java-2014a-1.el5.x86_64
Mar 18 23:02:17 Updated: sudo-1.7.2p1-29.el5_10.x86_64

This timing relates to when the proxy server began noticing issues:

[Tue Mar 18 23:17:02 2014] [error] (111)Connection refused: proxy: HTTP: attempt to connect to 127.0.0.1:8001 (127.0.0.1) failed
[Tue Mar 18 23:17:02 2014] [error] ap_proxy_connect_backend disabling worker for (127.0.0.1)
[Tue Mar 18 23:17:02 2014] [error] proxy: HTTP: disabled connection for (127.0.0.1)
[Tue Mar 18 23:17:22 2014] [error] proxy: HTTP: disabled connection for (127.0.0.1)
[Tue Mar 18 23:17:22 2014] [error] proxy: HTTP: disabled connection for (127.0.0.1)
[Tue Mar 18 23:17:24 2014] [error] proxy: HTTP: disabled connection for (127.0.0.1)

The error from the web server returned:

2014-03-20 15:05:20.722777500 return getattr(connections[DEFAULT_DB_ALIAS], item)
2014-03-20 15:05:20.722816500 File "/home/site/.pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/django/db/utils.py", line 92, in __getitem__
2014-03-20 15:05:20.7264646460 backend = load_backend(db['ENGINE'])
2014-03-20 15:05:20.724544540 File "/home/site/.pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/django/db/utils.py", line 24, in load_backend
2014-03-20 15:05:20.64276766 return import_module('.base', backend_name)
2014-03-20 15:05:20.332423400 File "/home/site/.pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/django/utils/importlib.py", line 35, in import_module
2014-03-20 15:05:20.724548500 __import__(name)
2014-03-20 15:05:20.723067500 File "/home/site/.pythonbrew/pythons/Python-2.7.3/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 16, in
2014-03-20 15:05:20.723464640 raise ImproperlyConfigured("Error loading MySQLdb module: %s" % e)
2014-03-20 15:05:20.35350 django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: libmysqlclient.so.18: cannot open shared object file: No such file or directory

Attempting to upgrade or otherwise manipulate the mysql-python resulted in:

:~# pip install --upgrade mysql-python
Downloading/unpacking mysql-python
Downloading MySQL-python-1.2.5.zip (108Kb): 108Kb downloaded
Running setup.py egg_info for package mysql-python
sh: mysql_config: command not found
Traceback (most recent call last):
File "", line 14, in
File "/home/site/.pythonbrew/venvs/Python-2.7.3/site/build/mysql-python/setup.py", line 17, in
metadata, options = get_config()
File "setup_posix.py", line 43, in get_config
libs = mysql_config("libs_r")
File "setup_posix.py", line 25, in mysql_config
raise EnvironmentError("%s not found" % (mysql_config.path,))
EnvironmentError: mysql_config not found
Complete output from command python setup.py egg_info:
sh: mysql_config: command not found

Traceback (most recent call last):

File "", line 14, in

File "/home/site/.pythonbrew/venvs/Python-2.7.3/site/build/mysql-python/setup.py", line 17, in

metadata, options = get_config()

File "setup_posix.py", line 43, in get_config

libs = mysql_config("libs_r")

File "setup_posix.py", line 25, in mysql_config

raise EnvironmentError("%s not found" % (mysql_config.path,))

The presence of libmysqlclient-dev is not a package present on this CentOS 5 install, so an attempt to upgrade mysql-devel was made unsuccessfully, as well.

In the end, this ended up being a problem with the fact this server is running Percona (a mysql fork) rather than raw Mysql. The absence of the following package was the root cause and installing it fixed the issue:

Percona-Server-devel-55.x86_64 0:5.5.36-rel34.1.el5

As for why this was missing in the first place? Looks like some very unusual updates to the existing packages no longer used this as a dependency and as such, it was never brought in and installed.

Read More

Checking How a Linux Server Reboot was Triggered

Let’s say you suspect a reboot has occurred on a Linux machine (pingdom alert, other monitoring detects it etc) and you want to find out why and how. Guide based on a Debian install, so may differ slightly between distros.

Start off with one of the following to ascertain how long the machine has been up [to know the time since last reboot]:

uptime

Or a display of the time of last system boot:

who -b

Or the last reboot through ‘last’:

last reboot

Or use last to also display logged in users, to see if a user was logged in prior to reboot:

last -a

Or ‘w’ utility also can help:

w

Now that we’ve established that, have a look through the bash history of any user you think may have triggered it themselves and compare the appropriate timestamp (only if bash history is configured to show timestamps; still could be useful if not):

less /home/$user/.bash_history OR /root/.bash_history

Check package manager log to ascertain any updates that have occurred near the time in question and could have triggered a reboot:

less /var/log/dpkg.log

Peruse general message and system log for timestamps near the reboot. Take note that most of the logging will actually be irrelevant and will show what the machine was doing when it was booting:

 less /var/log/messages AND/OR /var/log/syslog 

You may like to explore possibilities based off what information you discover. For example, a reference to CRON may cause you to look through the following log and then look at what cronjobs are active, etc:

 less /var/log/cron 

If by now you still haven’t determined a cause, it’s possible that the reboot was triggered from outside of the machine. This could be the equivalent of pressing the power button on a physical machine or triggering a reboot of a virtual machine guest on a hypervisor. For KVM inparticular, “virsh destory $guest” could explain the symptoms.

Read More

Using Mysql Instead of SQLite For a Drum (Mezzanine) Project

This post demonstrates how to use mysql instead of sqlite for a Mezzanine project. The documentation is a tad unclear so hopefully this will be of use to someone. You can replace “mysql” with “postgresql_psycopg2″ or “oracle” to make use of an alternate RDMBS, but this procedure is tailored for mysql and debian.

Get your environment ready and in this case we’ll be using a virtualenv. Note that a failure to install python-dev will cause Mezzanine install to fail.

 python-pip python-virtualenv python-dev python-setuptools python-imaging build-essential libmysqlclient-dev

Create your virtualenv and activate it:

 virtualenv $nameofenv 
 source $nameofenv/bin/activate

Install some things:

 pip install mezzanine fabric virtualenvwrapper drum

Create the mezzanine project:

 mezzanine-project -a drum $nameofsite
 cd $nameofsite 

Install mysql-python:

 pip install mysql-python

This will prevent you getting the following error if you try to install without it:

django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: No module named MySQLdb

Also, if the install of mysql-python fails, it means you didn’t install the dependency libmysqlclient-dev:

Traceback (most recent call last):
File "", line 14, in
File "/home/$nameofsite/build/mysql-python/setup.py", line 17, in
metadata, options = get_config()
File "setup_posix.py", line 43, in get_config
libs = mysql_config("libs_r")
File "setup_posix.py", line 25, in mysql_config
raise EnvironmentError("%s not found" % (mysql_config.path,))
EnvironmentError: mysql_config not found
----------------------------------------
Command python setup.py egg_info failed with error code 1 in /home/mezz/build/mysql-python
Storing complete log in /root/.pip/pip.log

Now we will update the config to make use of mysql before running the command to set up the database (which will use sqlite by default). As such, open up local_settings.py in your present working dir.

 Set ENGINE to "django.db.backends.mysql",
 Set NAME, USER and PASSWORD to the details of your mysql database (you'll need to create one)

You can then leave HOST and PORT empty if it’s a local databae (localhost) and default port. Now we’ll configure Mezzanine to use the database and create all the tables and base install:

 python manage.py createdb --noinput 

You may now run your installation and test that a site is correctly appearing. Append 0.0.0.0:8000 to make it listen on all interfaces (localhost by default) and 8000 to the port of your choosing:

python manage.py runserver 0.0.0.0:8000

Make sure your port is open and you can navigate to “$ipofhost:8000″ in a browser to see the site. Now navigate to:

$ipofhost:8000/admin

And log in with “username: admin, password: default” to access the admin panel to your site.

python manage.py runserver 0.0.0.0:8000 to listen on all interfaces.

Read More