Reverse Proxy with Mojolicious

During the last month I was working on a project for a customer, who wanted a reverse proxy, so he could modify his web pages before delivery.

This posed a very interesting problem which I chose to solve with good old Perl.
Why Perl? Because it’s the absolute right thing to use, when you want to manipulate text!

It’s very fast, very stable, easy on resources and Perl has everything to build a proxy without using any additional software like Apache, Nginx, Squid or the like.

Mojolicious to the Rescue

It gets even easier, when you have an amazing framework like Mojolicious, which I stumbled upon a few months ago!

It calls itself „A next generation web framework for the Perl programming language“. It has no (I repeat that: NO!) mandatory module dependencies, features a full HTTP 1.1 web server (even non-blocking if you like) and also a HTML5/XML client complete with a DOM parser using CSS3 syntax jQuery style for accessing elements.

Additionally, it has a preforking non-blocking I/O server featuring hot deployment called „Hypnotoad„. And a heck of a lot more jokes in the source code comments taken from Futurama.

One of it’s primary goals is to bring back fun into coding web in Perl. And boy, did I have!

no Moose

Mojolicious has a great base class called Mojo::Base. It has all the syntactic sugar one wants, provides all the good switches (warnings/strict etc.), a default constructor and a ‚has‘ accessor creator. Good riddance, Moose, you always bugged me with your weight!

So everything was there for this little project already. I just had to put the bricks together. And smooth it went!

The whole project took under 10 man-days to finish!

Lessons Learned

Configuration Files

One always needs configuration files, at least to store the database credentials somewhere. This application was no different.

When using Catalyst, it has a configuration file by default, which can be in any format one could possibly think of, thanks to Config::Any.
Mojolicious doesn’t bring configuration files by default, one has to build its own code for it. At first I built a class using YAML::Any, because I like YAML for configuration files very much.

But when working with Mojolicious for a while, one’s mindset changes: Why introduce this absolutely unneccessary dependency? Well, with Catalyst, one more to the hundreds doesn’t hurt. But Mojolicious is different: Why not use the provided Mojo::JSON instead?

So I quickly put together a class, which reads the configuration from a JSON file. While it’s not that flexible as YAML, JSON works well enough for the task. Absolute plus: No extra dependency introduced. Smooth.

Charset Guessing

While the DOM parser Mojo::DOM is very complete, I missed one thing, which is automatic charset guessing. Mojo tries to leave it’s fingers off the encoding as good as possible, so it’s easy to just proxy a file unmodified, but if you want to modify the content, you should make sure you know the encoding, so you’re not messing it up.

While it’s not a problem to build one’s own charset guesser, I nevertheless would like to see it in the Framework, because that’s where I think it belongs. (If it’s there already, please put my nose on it, because I couldn’t find it!)

I built the following guesser, which was purely by instinct and is most probably improvable:

sub guess_charset {
    my ($self, $content_type, $body) = @_;

    # Avoid undef warnings
    $content_type ||= '';
    $body ||= '';

    # HTTP Content-Type header
    if ($content_type =~ /charset=(.*)$/i) {
        return $1;
    }

    # XML prolog
    if ($body =~ /<\?xml .*encoding="(\S*?)".*\?>/si) {
        return $1;
    }

    my $dom = Mojo::DOM->new( $body );

    # <meta http-equiv="content-type">
    my $meta = $dom->at( 'meta[content*="charset"]' );

    if ($meta && $meta->attrs( 'content' ) =~ /charset=(.*)$/i) {
        return $1;
    }

    # HTML5 <meta charset="">
    $meta = $dom->at( 'meta[charset]' );
    if ($meta) {
        return $meta->attrs( 'charset' );
    }

    return;
}

Database Access

I came to love DBIx::Class as an ORM for database access. You need some patience to get acquainted with it, but then its great to work with, because it’s so complete. No comparison to the poor excuse of an ORM called ActiveRecords used in Rails.

The downside is, again, it introduces so many dependencies, you make your customer cry when trying to deploy your application.

Since my customers were no Perl guys, using DBC was out of the question. Instead I wrote a small model class for DB access, which allows named pre-prepared statements and polishes all the rough edges off DBI.

I hate having raw SQL statements between program code, because I so often see how it leads to bad style, SQL injections and inefficient database access. Therefore I put the statements in the config file and pre-prepare them on model construction. I know, there are issues with this, but I mind these. Promised.

I’ve done stuff like this so many times, and it seems goofy, to reinvent the wheel once again, but somehow, there are so many ways to talk with your database that nobody seems to be able to agree on some standards between a full fledged ORM and the absolute lowest level.

Anyway, the one issue I had with Mojolicious in this regard is this:

The Hypnotoad server does preforking which destroys all database connections done on startup, so one needs to connect only after forking.

I came across this wiki page where it is explained, and I at first opted for the DBIx::Connector solution, which doesn’t introduce too much dependencies and sounds like the fire-and-forget solution to all database connection problems.

Somehow, it did solve the problem, but we observed another issue: the connections dropped after some hours of operations, preferably after having some hours of traffic and then some without any.

So I got rid of the DBIx::Connector again and now handle dropped connections manually in the model class. Ironically far more stable and another dependency saved!

Static Routes

When doing a reverse proxy, Mojolicious::Static gets in your way. The nice it is to have some default output, so you have something to start from, you definitely don’t want to get served up the default Mojolicious favicon.

To get around this, I had to build a custom static file serving class, which actually does: nothing.

package Proxy::NoStatic;
use Mojo::Base 'Mojolicious::Static';

sub dispatch {}

1;

A little tedious and maybe this could be achieved simpler. However, this was the fastest I came up with.

Bottom Line

Mojolicious is by far the best framework for server side web development I came across in a long time. And I saw some: Spring 3, Rails, Catalyst, CakePHP, Yii, FuelPHP, Zend, CodeIgniter and not to mention the various home made ones I had the „pleasure“ to work with…

I so much enjoyed it and I can’t wait for the next project to use it!

Thank you Sebastian Riedel for giving it to us!
And, btw., Sebastian: Mission accomplished! 🙂