Fail2ban for 404

Today we had an issue were a server was continuously getting requests from IP addresses in a particular country. The requests were obviously automated and searching for PHP pages that didn’t exist within the application we were running.

However this did mean that apache was continually trying to process and log these requests, causing the server to slow.

We knew fail2ban should do what we wanted but weren’t quite sure how to implement it. So here is a quick run through of how to implement a fail2ban policy for continuous requests for pages that don’t exist.

First install fail2ban. Our setup is Debian so we just used apt-get

sudo apt-get update
sudo apt-get install fail2ban

In the current version of fail2ban, filters are added to the filter.d folder. The default install directory is /etc/fail2ban. We used the following commands to create a new filter

cd /etc/fail2ban/filter.d/
sudo nano apache-404.conf

The configuration of the filter requires a regex that will cause a fail and an optional regex that can be used to ignore requests. We just wanted a regex to block requests so we only needed the fail regex.

[Definition]
failregex = ^ - .* "(GET|POST|HEAD).*HTTP.*" 404 .*$
ignoreregex =

This regex does a match on any get, post or head request that returns a 404 e.g.

XX.XXX.XXX.XXX - - [1/Jan/2020:12:30:00 +0000] "GET /url/path/file.php HTTP/1.1" 404 465 "-"

We also eventually refined it to only fail on requests for unknown PHP pages by adding it to the regex

failregex = ^ - .* "(GET|POST|HEAD).*php.*HTTP.*" 404 .*$

Save the file and then we can add it into the fail2ban configuration.

Now create a file to append configuration to the default configuration. The current configuration is stored in the jail.conf file and is automatically overridden or appended by jail.local

cd /etc/fail2ban
sudo nano jail.local

Add the following lines:

[apache-404]                           # label
enabled = true                         # enable the filter
port = http,https                      # ports to enable filter on
filter = apache-404                    # name of the filter to use
logpath = /var/log/apache2/*access.log # the log file to do the regex check on
maxretry = 3                           # number of failed before banning
bantime = 600                          # how long to band for in seconds
ignoreip =                             # exempt IP addresses

We are getting the filter to check the apache access log but you could just as easily get to check the error log or any other log for a match. You can also easily update the number of allowed fails and how long to ban for.

Restart fail2ban and you should be ready to go. You can check the status of the filter using the following command:

sudo fail2ban-client status apache-404

If you called your filter something different then you exchange the name in the above command. This should show you a count of bans and currently banned IP addresses.

Full text search using Yomu and Cloudsearch

At a request from a client, I started to look at solutions to do full text search for all documents containing text. My first thought was that there must be a service out there where you can upload or reference files that will then index the file and allow you to query the data? Unfortunately I didn’t find one. Most of the indexing and searching services out there are for structured data. However, this does become helpful as you will see.

Since I had to come up with a more customised solution, I started to see if there were any libraries that contained functionality to parse documents containing text. This lead me to Apache Tika. “The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types“. All the file types I needed to use were covered by Tika’s extensive list of usable file types.

Since I was working in Ruby on Rails I went looking for a gem that would utilise Tika’s toolkit for me to use in my project. The Yomu gem is a great wrapper for the toolkit and easily allows for the reading of data from any file passed to it.

For your PHP projects there is a library called PhpTikaWrapper available using Composer.

The files I had to read were being stored on Amazon S3 so using Ruby’s ‘open-uri’ module I passed in the S3 url and read out the data, which allowed Yomu to extract the text.

data = open(s3_url).read
text = Yomu.read :text, data

My next step could have been to save the text to a database table along with an id for the document and then query the database and this would have sufficed as a solution. However, not to waste some of the research, I decided to use one of the indexing and search services that I previously referred to, namely Amazon Cloudsearch. Since the files were already hosted on S3 and the account was already set up for the project it seemed like the logical option.

All I had to do was create a new Cloudsearch document with an id and text field that I could subsequently query and return a list of ids containing the text I was searching for. The AWSCloudSearch gem seemed to be the most up-to-date gem and with a few lines of code I could create , search and remove documents from Cloudsearch. Here is an example of adding a document

ds = AWSCloudSearch::CloudSearch.new('your-domain-name-53905x4594jxty')
doc = AWSCloudSearch::Document.new(true)
doc.id = id
doc.lang = 'en'
doc.add_field('text', text)
batch = AWSCloudSearch::DocumentBatch.new
batch.add_document doc
ds.documents_batch(batch)

I integrated the search into my search box for the document list and, hey presto, documents containing the requested text were returned.

Automatic restart of php cgi using monit

Once monit is installed add the following lines to the /etc/monit/monitrc or one of the included files.
You will need to replace [USER] and [GROUP] with your executing user and group.

check process php5-cgi with pidfile /var/run/fastcgi-php.pid
start program = "/usr/bin/spawn-fcgi -a 127.0.0.1 -p 9000 -u [USER] -g [GROUP] -f /usr/bin/php5-cgi -P /var/run/fastcgi-php.pid" with timeout 60 seconds
stop program = "start-stop-daemon -K -n php5-cgi"