Web Developer

I am a web developer located in Coleraine, on the north coast of Ireland. I have been developing commercially for the web since 2005. I have worked in the public and the private sectors and in various business types from charity promotion to chemical compliance data management.

My blog is a record of things I have learned and problems I have solved. The posts may seem unrelated but that is mostly due to the fact that I continue to learn new technologies and approaches.

Maybe you will find something of interest. If you would like to know more about me or some things I have worked on, there is additional information on other profiles.

MySQL causing hard disk to running out of space

Recently we have had a couple of server run out of hard disk space causing inaccessibility and crashing. On further investigation the hard drive was full so many operations were complaining. The hard drive was being consumed by a very large file within the /var/lib/mysql directory called ibtmp1. So MySQL is causing the server to run out of disk space.

It turns out that this is the temporary working file for MySQL. When you turn the MySQL server off and on again the file is deleted and the server runs again without issue until the file becomes too large again.

After some quick research we found that adding or editing a few settings in the MySQL configuration allows you to limit the size of this file and prevent this issue from reoccurring.

Open the mysqld.cnf file. The location of our mysqld.cnf was in /etc/mysql/mysql.conf.d/mysqld.conf. According to the file you can also copy this configuration to /etc/mysql/my.cnf. Our my.cnf just had an include to the mysqld.cnf file.

The file should have some sections like:

[mysqld_safe]
[mysqld]

There should be a section for InnoDB settings. If not, it is fine to add these configuration settings at the bottom of the file.

Here is the configuration variable:

innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:1G

This value can be tweaked based on your server setup and needs but these are the settings we used for our server to set the file to start at 12 MB and limit it to 1 GB.

While setting this we also set a few other variables for optimisation:

innodb_buffer_pool_size=2048M
innodb_log_file_size=25M
innodb_log_buffer_size=80M

If you have had any other experiences on optimising MySQL, please let us know.

Removing and recreating all tables in a Postgresql database

One of the things I do on a regular basis is validating database backups. This requires getting a copy of the database that was dumped out to a sql file and loading back into a database to be checked.

Sometimes, depending on the format of the dump file, the database complains about tables already existing and duplicate keys and other such errors. There are flags and parameters that can be added to stop from adding ids and overwriting existing tables but I found the simplest way for me was to remove the existing tables from the database and recreate the structure with no content.

Here is the SQL query that quickly drops and recreates the database schema.

DROP SCHEMA public CASCADE; CREATE SCHEMA public;

 

Free SSL using Let’s Encrypt

Update – May 2017

After adding a ruby on rails application to my website I realised let’s encrypt wasn’t renewing. I had been using unicorn for my application and had done a proxy forward for incoming traffic. This had caused the .well-known path to not be recognised. I had to add in a path to the public folder in the application to get this working again.

Replaced

location ~ /.well-known {
allow all;
}

with

location ~ /.well-known {
allow all;
root /path/to/rails/app/public;
}

 

I have just recently set up a couple of sites with free SSL from Let’s Encrypt. Here is a summary of the commands I used for Nginx from this nice tutorial on Digital Ocean. There is also an Apache version.

sudo apt-get update
sudo apt-get -y install git bc
sudo git clone https://github.com/letsencrypt/letsencrypt /opt/letsencrypt
sudo nano /etc/nginx/sites-available/default

Change SSL lines in nginx config
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;

And add the following inside the server block
location ~ /.well-known {
allow all;
}

cd /opt/letsencrypt
./letsencrypt-auto certonly -a webroot --webroot-path=/usr/share/nginx/html -d example.com -d www.example.com

Add you email address and agree to T&Cs

Reload Nginx
sudo service nginx reload

Automatic renewal with crontab
sudo crontab -e
30 2 * * 1 /opt/letsencrypt/letsencrypt-auto renew >> /var/log/le-renew.log
35 2 * * 1 /etc/init.d/nginx reload

DoS Attack on WordPress

The last couple of days my WordPress site has been getting hit by a DoS attack from a single IP address trying to access the ‘xmlrpc.php‘ page. Even though the page is not there it was still causing a lot of slow down processing the request.

Once I had established the source address, a quick google search gave me a solution to block the address through linux’s iptables.

sudo apt-get install iptables
iptables -A INPUT -s IP_ADDRESS -j DROP

Done and done. Now let’s just hope that my site hasn’t been added to a list somewhere and I get pummelled from somewhere else.

Full text search using Yomu and Cloudsearch

At a request from a client, I started to look at solutions to do full text search for all documents containing text. My first thought was that there must be a service out there where you can upload or reference files that will then index the file and allow you to query the data? Unfortunately I didn’t find one. Most of the indexing and searching services out there are for structured data. However, this does become helpful as you will see.

Since I had to come up with a more customised solution, I started to see if there were any libraries that contained functionality to parse documents containing text. This lead me to Apache Tika. “The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types“. All the file types I needed to use were covered by Tika’s extensive list of usable file types.

Since I was working in Ruby on Rails I went looking for a gem that would utilise Tika’s toolkit for me to use in my project. The Yomu gem is a great wrapper for the toolkit and easily allows for the reading of data from any file passed to it.

For your PHP projects there is a library called PhpTikaWrapper available using Composer.

The files I had to read were being stored on Amazon S3 so using Ruby’s ‘open-uri’ module I passed in the S3 url and read out the data, which allowed Yomu to extract the text.

data = open(s3_url).read
text = Yomu.read :text, data

My next step could have been to save the text to a database table along with an id for the document and then query the database and this would have sufficed as a solution. However, not to waste some of the research, I decided to use one of the indexing and search services that I previously referred to, namely Amazon Cloudsearch. Since the files were already hosted on S3 and the account was already set up for the project it seemed like the logical option.

All I had to do was create a new Cloudsearch document with an id and text field that I could subsequently query and return a list of ids containing the text I was searching for. The AWSCloudSearch gem seemed to be the most up-to-date gem and with a few lines of code I could create , search and remove documents from Cloudsearch. Here is an example of adding a document

ds = AWSCloudSearch::CloudSearch.new('your-domain-name-53905x4594jxty')
doc = AWSCloudSearch::Document.new(true)
doc.id = id
doc.lang = 'en'
doc.add_field('text', text)
batch = AWSCloudSearch::DocumentBatch.new
batch.add_document doc
ds.documents_batch(batch)

I integrated the search into my search box for the document list and, hey presto, documents containing the requested text were returned.

Heroku timeout management

Recently I have been migrating a Rails application to Heroku. The application is a few years old and very large. Deploying to Heroku was fairly straight forward but within a short time I realised that Heroku’s maximum 30 timeout was going to be a problem.

Heroku’s timeout is completely non-negotiable and probably rightly so. Any application which is taking more than 30 seconds to load has some serious problems. Unfortunately, my application has some serious problems. Heroku compounds these problems in one swift move by using the swap file. Once an application starts using the swap file, processing time skyrockets.

So, here are a few ways I have used to try and alleviate the problem:

  1. Move any long or processor intensive tasks to a worker dyno. Using the delayed_job gem, I was able to pass a number of intensive tasks over to a heroku worker instance. The worker instance then churns away in the background freeing up the web instance to continue serving the user. Also, by using the workless gem, I was able to switch of the worker dyno when it wasn’t required, saving those all important $$.
  2. Pre-empt the heroku timeout to rescue the application. If the heroku timeout limit or your web server timeout limit are reached, you loose the ability to manage the problem within you application. By using the rack-timeout gem, you can catch the exception by setting the rack timeout to a second less than the web server timeout. You can then manage the exception within the application by logging data and displaying useful information to the user.
  3. Size your instance correctly to stop switching to swap file. If you are using a web server like unicorn, you can set the number of concurrent processes for users accessing you application. But beware, each instance will increase the memory usage on your dyno. New relic will give you a good indication of the average memory usage of you application. If you set unicorn concurrency then you can work out your required instance size [average_memory * concurrency = instance_size]. Quick tip – derailed_bencmarks gem shows the memory usage of your gems. It might help reduce your application size.
  4. If you hit the swap file, restart the dyno. Using the unicorn-worker-killer gem, you can set the dyno to restart if the memory quota is exceeded. This may also allow you to retry the page using the exception catching explained in ‘Step 2’. By doing redirect_to response.path in your exception rescue, you are effectively retrying the page with the newly restarted dyno.

Using these methods, you should be able to significantly reduce the impact of application timeouts if not avoid them completely.

Heroku remote database backup

Heroku’s new toolbelt commands have had a few updates which make copying backups a little trickier. Also, if you want to automate these commands in scheduler, your rake task needs to authenticate before it can run the command.

Here is my rake task to backup the database and copy the backup to another location:

namespace :pgbackup do
desc "Database Backup and copy"
task :db_backup => :environment do

heroku_server = ENV['APP_NAME']
timestamp = Time.now.strftime('%Y%m%d%H%M%S')
temp_path = "latest.dump"

Bundler.with_clean_env do
db_list = `echo "#{ENV['BACKUP_USERNAME']}n#{ENV['BACKUP_PASSWORD']}" | heroku pg:backups capture -a #{heroku_server}`
res = db_list.split("n")[6]
db_id = res.split(" ")[2].strip
restore_url = `heroku pg:backups public-url #{db_id} -a "#{heroku_server}"`
restore_url = URI.extract(restore_url)
restore_url = restore_url[0]
p `curl -o "#{temp_path}" "#{restore_url}"`
end

file = YAML.load_file("#{Rails.root}/config/s3.yml")
config = file[Rails.env]

connection = Fog::Storage.new({
:provider => 'AWS',
:aws_access_key_id => config['access_key_id'],
:aws_secret_access_key => config['secret_access_key'],
:region => 'eu-west-1'
})

directory = connection.directories.get(ENV["APP_NAME"])

file = directory.files.create(:key => "db-backups/#{timestamp}.dump", :body => File.open("#{temp_path}"), :public => false)

end
end

As you can see I have used the fog gem to connect to S3. Fog has other options of where it can connect to so you are not just limited to S3.

You will also have to make sure to add the following environment variables to heroku:

  1. APP_NAME
  2. BACKUP_USERNAME
  3. BACKUP_PASSWORD

The backup username and password are heroku login details for an owner or collaborator account.

Remove file from git history

During a careless commit to my git repository I managed to push a database dump file before realising my mistake. Even though I removed the file and pushed the new version the file still existed in the repository history making it very large for re-downloading or for any new cloning.

Here is what I found. To remove the file you need to rewrite the repository history and remove the reference to the offending file.

git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch db_dump.sql" -- --all
Thanks to Drew

Then all I had to do was push the local repository back up.
git push --force origin --all

Mono server for ASP.NET

#install mono server
sudo apt-get install mono-fastcgi-server4

Add following to /etc/nginx/fastcgi_params
fastcgi_param PATH_INFO "";
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;

Nginx configuration
server {
listen 80;
server_name [HOST_NAME];

location / {
root [APP_ROOT];
index index.html index.htm default.aspx Default.aspx;
fastcgi_index Default.aspx;
fastcgi_pass 127.0.0.1:9000;
include /etc/nginx/fastcgi_params;
}
}

Restart Nginx
sudo service nginx restart

Start mono server
fastcgi-mono-server4 /applications="[HOST_NAME]:/:[APP_ROOT]" /socket=tcp:127.0.0.1:9000
e.g. fastcgi-mono-server4 /applications="192.168.0.0:/:/var/www/net" /socket=tcp:127.0.0.1:9000

Mono Server initialization script
#!/bin/sh

### BEGIN INIT INFO
# Provides: monoserve.sh
# Required-Start: $local_fs $syslog $remote_fs
# Required-Stop: $local_fs $syslog $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start fastcgi mono server with hosts
### END INIT INFO

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/local/bin/mono
NAME=monoserver
DESC=monoserver

MONOSERVER=$(which fastcgi-mono-server4)
MONOSERVER_PID=$(ps auxf | grep fastcgi-mono-server4.exe | grep -v grep | awk '{print $2}')

WEBAPPS="/:[APP_ROOT]"

case "$1" in
start)
if [ -z "${MONOSERVER_PID}" ]; then
echo "starting mono server"
${MONOSERVER} /applications=${WEBAPPS} /socket=tcp:127.0.0.1:9000 &
echo "mono server started"
else
echo ${WEBAPPS}
echo "mono server is running"
fi
;;
stop)
if [ -n "${MONOSERVER_PID}" ]; then
kill ${MONOSERVER_PID}
echo "mono server stopped"
else
echo "mono server is not running"
fi
;;
esac

exit 0

Add appropriate rights:
chmod +x /etc/init.d/monoserve

And install the script:
update-rc.d monoserve defaults