A better way (package) to link R and PostgreSQL

Recently I was loading a large (~420mb) CSV file for Kaggle’s Adzuna job salary competition into R and ran into some speed problems.  Specifically R was crunching endlessly and my Macbook pro turned into a paper weight.  It was probably naive of me to attempt loading a CSV of such size into R, but I assumed some well written functions in C would handle this quickly and not block the rest of my processes (actually it didn’t block them but it may as well have).  Anyway for the most part I load data into R via ODBC so I figured I’d throw the data into a local PostgreSQL database and get on with my data munging.

However, OS X didn’t want to play ball.  Apparently, some big cats ago, Apple decided ODBC wasn’t super important (?!) and decided to remove it from the stock install (I recently upgraded to Mountain Lion so my mac would run slower and sometimes fall into an infinite loop during startup).  I searched around a bit and found the usual ODBC suspects that I could install on OS X but quickly recalled the small pains I dealt with doing the same thing with my Ubuntu machine at work.  Anyway I played around with ODBC for a bit and wasn’t really getting the optimal configuration I wanted so I looked a little further and found RPostgreSQL.

Now, having used RODBC over the years I don’t have many complaints except that if you don’t already have a stock ODBC install setting everything up can take time from your cause (fun data analysis in this case) that you generally don’t really have to spare.  What would be better is a package that’s wicked fast and a configuration that’s handled during package install.  Check out the CRAN entry.

After installation I loaded the entire CSV into R in about 3 seconds:

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, db="postgres", user="postgres");
rs <- dbSendQuery(con, statement = "select * from rev1");
df <- fetch(rs, n = -1);

Cheers!

_Chris


Hack Your Mac Back to Life

About a week and a half ago my Macbook Pro’s keyboard and trackpad stopped working upon sitting down at my favorite coffee shop ‘The Dripolator’.  Actually I’m not completely sure if I was at the Drip yet, but I recall ending up in my apartment with a USB keyboard and mouse searching the internet for users with similar problems.  I quickly found dozens of forums, and hundreds of posts, related to this very issue.

To summarize, there is a ribbon cable that attaches the trackpad to the motherboard, power button, bluetooth, keyboard, etc..  It makes the connection to the trackpad beneath the battery (Macbook Pro 4,1 Spring 2008, non – unibody) and there is a visible bulge where the connector slides into the trackpad circuitry.  People reported fixes ranging from taping folded paper to the battery bottom to put pressure on the connection to cleaning the contacts.  Essentially, something was wrong with the connector on the trackpad, or the cable:

For such a simple issue I assumed a quick call to my local mac authorized repair shop would solve the problem quickly I would be back to work with a portable computer.  Success?  They would take care of it, for $300 (a new top case) + $60 (diagnostic) + labor????

$400 – $500 plus no computer for a few days?  No Way.

My computer was a hack, maybe a new ribbon cable or trackpad from being functional.  A brand new top case seemed excessive and needlessly expensive.  So I began trying my own fixes and started a search for a ribbon cable (which I suspected would solve my issue if the hacks failed).  I tried everything from the folded paper trick to cleaning the contacts (99% isopropyl alcohol is surprisingly hard to find!).  I even added small strips of masking tape beneath the ribbon to help seat it more securely in the connector.  No luck, everything failed rather quickly.

Meanwhile, finding a ribbon cable was much harder than I had expected.  You would think Apple would sell them, but the only parts I could come across had been pulled off old machines and were a few days of ebay bidding away or out of stock.  Further complicating matters, the old Macbook Pro models vary somewhat and have different revisions of the ribbon cable!

This was very frustrating.  Apple clearly had a solid lock on the repair process their machines went through from start to finish and did NOT want hackers poking around fixing stuff for free.  The Apple authorized repair shops told me they had to follow protocol (IE running the diagnostic and replacing the entire top case) and there was little room for creativity.  For me, this is a major Apple turn off.  I love my Macbook Pro and OS X, its one of the only expensive, proprietary systems I support.  But making it difficult to repair at home is silly.

A week and a half of searching and failed hacks led me to some old parts pulled off a warrantied machine in exchange for a sandwich.  Thanks to all the techs that helped me find what I needed outside of Apple’s ‘system’.  Didn’t Apple used to make fun of IBM for being a monolithic beast in the 80’s (http://www.youtube.com/watch?v=OYecfV3ubP8)?  Who’s throwing the sledge hammer at the imac in 2012?

Oh and the actual fix?  I replaced that flex ribbon pictured earlier.  So far it has withstood the ‘shake’ test, the ‘biking to the Dripolator’ test, and the ‘thrown on the couch in the bag’ test.  Never made it this far with the previous fixes!!

Still love my Macbook Pro…


Serving up geospatial data from the Ec2 cloud

Ec2 Free Tier

Recently I’ve been building a Debian Squeeze image on Amazon’s ec2 cloud for free (http://aws.amazon.com/free/).  The purpose (aside from messing around) is to develop a nice POC (proof of concept) platform on which to stage geospatial web services (viewing, analysis, hosting, etc).  Theoretically, whatever I build can be extended and expanded in terms of power and space so time well spent.

Ec2 has come a ways since I first started playing with it a few years ago.  Aside from offering a dizzying array of services, the AWS management portal has become full featured and very useful.  Needless to say I didn’t bother to install the ec2 API this time around.  The goal here is to get up and running, plenty of time to automate the client end of this down the road.

Getting Connected

Getting up and running was pretty straightforward.  Sign up for an account, navigate to the ‘quickstart’ guide buried in ‘Documentation’, alter the instructions slightly (I prefer to use a Debian image rather than the guide’s default image), and log in.  The only issue I ran into was permissions on my machine when using the public key to access the remote server (ssh -i key.pem <server>).  Without knowing it my public key was not being read by ssh, so the server was asking me for a password (which nobody has) and I was fairly confused.  The solution:

sudo ssh -i key.pem <server>

I’m not exactly sure why this works ('vim key.pem' reads the file).  Maybe its a Mac thing.

Installing your software

Now that we’re logged in, we can begin installing and customizing our image (best practice is to create a user and sudo your way to glory).  We’re going to use GeoServer to serve up our GIS data, so we’ll need Java.  Depending on your linux distro you’ll have access to different packaging tools.  I’m a big fan of apt (Debian, Ubuntu, etc.):

apt-get install sun-java6-sdk

If you have a permissions problem ‘sudo’ before the previous code.  Now that we have Java up and running, let’s get GeoServer (http://geoserver.org/display/GEOS/Welcome) up.  Grab the universal binary from the downloads section:

wget http://downloads.sourceforge.net/geoserver/geoserver-2.1.1-bin.zip

This will plop it in whatever directory you ran the command from.  Geoserver is self contained in this form.  Unzip it:

apt-get install unzip
unzip geoserver_plus_version

Move into the directory and check out the README.  Long story short, you need to go into bin/ and run the startup script.  before you do so you’ll have to deal with JAVA_HOME.  The quick and dirty way:

JAVA_HOME="/usr/lib/jvm/java-6-sun"
export JAVA_HOME

Best to stick that somewhere that runs on startup, but that’s out of the scope of this article.  Also note the path to Java above works on Debian Squeeze with Sun’s JDK.  If you deviated from the instructions you’ll have to find it yourself.

Ok!  Assuming all went well navigate your way into geoserver_plus_version/bin and:

./startup.sh

You should see GeoServer setting itself up.  Head back to the AWS management console and navigate to ‘Security Groups’, ‘your group name’ (could be ‘default’) and edit your security rules so that all ips can access port 8080 (0.0.0.0/0).  Make sure to click the ‘Apply Rule Changes’ button or your changes will not take effect.  Type into your web browser:

<server>:8080/geoserver

Where <server> is the address you used to connect to the server via ssh.  If all went well you should see GeoServer’s default startup screen!  Impress your friends by going to ‘layer preview’ and clicking on an ‘OpenLayers’ link and pasting the ridiculously long URL into Skype or gChat!

What Now?

That’s really up to you.  I installed PostgreSQL 9 and PostGIS for GeoServer.  For me this is a sandbox / development machine I will use to show friends what I’m up to.  There’s no reason why a dynamic DNS server script couldn’t be installed and a web presence maintained (although there are some security issues to work out prior).  A final tip:

The default EBS storage space is 1 Gig.  I was able to get GeoServer up and running without issue.  However once I started installing PostgreSQL I ran out of space rather quickly.  To increase your space use the AWS management console and navigate to ‘EBS Volumes’.  Take a snapshot of your volume (while the instance is running) and use it to make a new EBS volume (free tier will allow 10 Gig).  Stop (don’t terminate) your instance, detach the 1 Gig volume and attack the 10 Gig volume to ‘sda1’.  Startup the instance, log in, and enter the command:

resize2fs /dev/xvda1

This assumes your device is name ‘xvda1’.  You can check with:

df -h

Essentially you are telling the operating system to check the size of the hard drive, as it has changed.

Have fun!