Bare-metal Restore

As you can see by my previous post, my question to squeeze more req/sec from the server, I decided to try out Gentoo (again, last time was four years ago). Now, I like Gentoo, there is no doubt about that. However, I realized things took just too long to get set up. I guess that is the disadvantage of a source based package management tool. Back to Debian I go.

Two hours later everything was up and running – and I guess I can’t complain about a two hour bare-metal restore from one distro to another. And let me iterate, this isn’t just a typical LAMP boxen. It’s got:

  • apache/mod_php/ssl with ~10 domains
  • apache/mod_python/ssl with ~4 domains
  • lighttpd with ~5 domains (static files)
  • about 8 gigs of web data/images
  • svn repos + web_dav access
  • mysql restored
  • postfix(chroot)/dovecot/sasl + mysql auth
  • home dirs restored
  • chrooted users again

I’m sure I missed something on this list, I was typing pretty quick. Well, that’s the update. I’m gonna go tinker with mod_cache some.

The Gentoo test

I have a love-hate relationship with Linux. I love it because if there is a problem, I can actually tinker and find the problem and fix it. But I hate it because I like to tinker.

Recently I’ve been doing a fair amount of Django programming – enjoying every minute of it. After completing several of my projects I decided to do some benchmarks, and the results are in! Generally I can server cached/semi-cached pages at about 200req/sec. 200req! Considering this is 5,000,000 or so requests a day, and a number I am never going to reach, I still began to wonder: why isn’t it higher? I mean, serving a static html page is at like 1000+ req/sec, so why would a cached page be significantly different? I started exploring and noticed that Apache would spike the CPU. Ok, time to be thorough, and as I said, I like to tinker.

I tried lighttpd as a fastcgi to python – not a significant different, basically the same. Next I tried several versions of python – one from 2.4 and one from 2.5, one as a package and one from source – same results. High cpu usage. Thinking it could be something related to my VPS (or some odd limit within Debian) I decided, ok, I’ll reinstall.

I reinstalled and got things working pretty quickly. The only slight hiccup was postfix/dovecot, cause postfix insists on being in a jail (and my configs are all setup for that). Also, Chinese support in Apache isn’t working. Regardless, I re-ran the benchmarks and the results were the same – so, it isn’t related to my previous install after all. Doh.

I’ll evaluate Gentoo as a server setup for a little while, but I’m thinking I’ll do a quick reinstall of Debian.

Django SVN Update Goes Splat

I’m writing this just in case somebody runs into this same issue. I’m about to go live with a website and figured it would be best to have the latest SVN snapshot checked out from Django. I updated, and noticed that my voting module didn’t quite work as expected. I was getting the following error:

'module' object has no attribute 'GenericForeignKey'

I jumped into Trac and noticed that just yesterday some things were rearranged. In short, if you are using generic relations, you’ll need to change two parts of your code. First, the generic relations field must be imported out of conttenttype.

from django.contrib.contenttypes import generic

And second, you’ll need to change the ’location prefix’ (for lack of a better description:
From:

generic_field = models.GenericRelation(SomeOtherModel)

To:

generic_field = generic.GenericRelation(SomeOtherModel)

All should be find from there on out. For more information, take a look at the reference wiki article.

PSAD and Syslog-NG

I really like using PSAD, both on my server and my laptop. You never know where the mean people are. I also seem to use syslog-ng quite often, meanwhile PSAD seems oriented to syslog. This is fine, and I’m pretty sure the install.pl for the source built will configure syslog-ng.conf automatically. However, I almost always tend to stick with packages if I can – if they are even remotely close to the current version.
Anyways, if you need to get syslog-ng.conf configured for PSAD, this is what you need to do:
Add this code to the “# pipes” section, maybe stick to keeping it alphabetical.

destination psadpipe { pipe("/var/lib/psad/psadfifo"); };

Next, go down a little to the “# filters” section, add this:

filter f_kerninfo { facility(kern); };

And finally in the last section, add this:

log {
        source(s_all);
        filter(f_kerninfo);
        destination(psadpipe);
};

Restart syslog-ng, and you are good to go. Cheers to Michael Rash at Cipherdyne for his work on PSAD.

A Dying Laptop

I have the pleasure of owning an old T23 laptop. To show you how old this puppy is, the current T series is at T60, and those have been out for over a year. This laptop was made in 2012, and I picked it up somewhat discounted late in 2003. It is now March 2007, and this puppy is still rock solid.

You heard me, it is almost six years old and still working fine – that is testimony to how well this laptop was built. There are several small cracks around the case, but nothing you would notice by just walking by. This laptop has been to more countries than many people.

I had the first problem this weekend, and it isn’t even related to the laptop. The hard drive, a 30G I put in at some point, started to crap out on me. Bad sectors were everywhere, so some of the programs were slightly unhappy (e.g. I couldn’t boot into X).

I’m going to buy a new laptop soon, I promise, about the time my MBA goals are reached. Until then, I’ll continue to be frugal, and deal with the bad sectors. Being a good IT nerd, everything is backed up to an external hard drive (and most stuff backed up remotely).

Luckily I’m using Linux – so was able to runs fsck/smartmontools a few times in recovery mode, make the bad blocks happy, and continue as “normal.” Phew, disaster averted.

One More Point Linux

It should come as a surprise that I enjoy using Linux. For the record, the first time I booted into Linux on my own was 1997, this was just before entering high school. So, while some of my tech friends played with NT, I was rumbling with the Penguin. Starting in 2000 I was using Linux as my main operating system, sometimes supplemented by OS X, and only using Windows when the gaming urge surfaced. In 2004 I mostly dropped playing any games, which resulted in dropping Windows – and besides for work, I haven’t used it since.

For me, I’ll admit, there are three things that Linux still lacks:

  • Simplistic video conferencing support
  • Video editing support
  • Gaming

I know that all of these are supported, but, in my opinion, not particularly well. Well, I don’t care about any of these enough to actually need windows, but it would be nice to see them improve.

So, I’m set. I’m 100% legal (don’t steal a single piece of software). And don’t have to be too afraid of virus’. What prompted me to write this little excerpt? A recent article at the Washington Post scared the beejeepers out of me, and makes me wish even more for Vista to either cure security problems, or everybody move over to Linux. The article details the aftermath a virus can cause, not on damaging one’s computer, but on capturing information. The author further details his experience hunting down the data. This was one of the better articles I’ve read, and I thoroughly enjoyed the further details. If you want a little more motivation to move to Linux (or just tighten your machine), then I suggest you take a few moments to read the articles as well.

The Risk in Risk Mitigation

Back in the day the barrier to entry for the Internet was quite high. The technology used required a steep learning curve, the equipment extremely expensive, and sometimes even hard to acquire. Fast forward to 2007 and things have certainly changed. If you know any tech people you can likely get free hosting for a small website, and even more demanding websites can be hosted for not much. The cost of dedicated servers has dropped even more. And the final kicker: web services. I’ve started to think of some web services not as a service, but more like outsourcing requirements.

This very dependency adds risk for a multitude of reasons, and when your entire web application platform revolves around a third party, such as is the case with mashups, you incur great risk.

One of the nice things when requirements are outsourced is the fact that risk is mitigated. I’ll use SmugMug as an example. In summary, they moved their storage to Amason’s S3 network, which is something I will be utilizing as well. Amazon’s S3 (and other web services) continue to drive down the barrier of entry – now you don’t even need to purchase hugely expensive servers for the sole purchase of storage! If you don’t need to purchase them, you also don’t need to manage them. Risk mitigated.

However, continuing the slight allusion from The Other Blog’s article on mashups, I see a slight problem with the outsourcing of requirements. While the following thought isn’t particularly innovative: mitigating risk and outsourcing requirements creates a dependency on the third-party. This very dependency adds risk for a multitude of reasons, and when your entire web application platform revolves around a third party, such as is the case with mashups, you incur great risk.

But, as is evident by the fact that I’ve had stitches nine different times, I’m still going to do some cool mashups anyways, so stay tuned.

Python, AST and SOAP

For one of my projects I need to generate thumbnails for a page. And lots and lots and lots of them. Even though I can generate them via a python script and a very light “gtk browser”, I would prefer to mitigate the server load. To do this I’ve decided to tap into the Alexa Thumbnail Service. They allow two methods: REST and SOAP. After several hours of testing things out, I’ve decided to toss in the towel and settle on REST. If you can spot the error with my SOAP setup, I owe you a beer.
I’m using the ZSI module for python.

1. wsdl2py

I pull in the needed classes by using wsdl2py.

wsdl2py -b http://ast.amazonaws.com/doc/2006-05-15/AlexaSiteThumbnail.wsdl

2. Look at the code generated.

See AlexaSiteThumbnail_types.py and AlexaSiteThumbnail_client.py.

3. Write python code to access AST over SOAP.


#!/usr/bin/env python
import sys
import datetime
import hmac
import sha
import base64
from AlexaSiteThumbnail_client import *

print 'Starting...'

AWS_ACCESS_KEY_ID = 'super-duper-access-key'
AWS_SECRET_ACCESS_KEY = 'super-secret-key'

print 'Generating signature...'

def generate_timestamp(dtime):
    return dtime.strftime("%Y-%m-%dT%H:%M:%SZ")

def generate_signature(operation, timestamp, secret_access_key):
    my_sha_hmac = hmac.new(secret_access_key, operation + timestamp, sha)
    my_b64_hmac_digest = base64.encodestring(my_sha_hmac.digest()).strip()
    return my_b64_hmac_digest

timestamp_datetime = datetime.datetime.utcnow()
timestamp_list = list(timestamp_datetime.timetuple())
timestamp_list[6] = 0
timestamp_tuple = tuple(timestamp_list)
timestamp_str = generate_timestamp(timestamp_datetime)

signature = generate_signature('Thumbnail', timestamp_str, AWS_SECRET_ACCESS_KEY)

print 'Initializing Locator...'

locator = AlexaSiteThumbnailLocator()
port = locator.getAlexaSiteThumbnailPort(tracefile=sys.stdout)

print 'Requesting thumbnails...'

request = ThumbnailRequestMsg()
request.Url = "alexa.com"
request.Signature = signature
request.Timestamp = timestamp_tuple
request.AWSAccessKeyId = AWS_ACCESS_KEY_ID
request.Request = [request.new_Request()]

resp = port.Thumbnail(request)

4. Run, and see error.


ZSI.EvaluateException: Got None for nillable(False), minOccurs(1) element 
(http://ast.amazonaws.com/doc/2006-05-15/,Url), 



 xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" 
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
xmlns:ZSI="http://www.zolera.com/schemas/ZSI/" 
xmlns:ns1="http://ast.amazonaws.com/doc/2006-05-15/" 
xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

[Element trace: /SOAP-ENV:Body/ns1:ThumbnailRequest]

55. Conclusion

I’m not entirely certain what I’m doing wrong. I’ve also written another version but actually with NPBinding connecting to the wsdl file. It seems to work much better, as it fully connects, and I get a 200, but it doesn’t return the thumbnail location in the response, and I get a:

TypeError: Response is "text/plain", not "text/xml"

So, while I have things working fine with REST, I would like to get the SOAP calls working. One beer reward.

AWS in Python (REST)

As some of you may know, I have some projects cooked up. I don’t expect to make a million bucks (wish me luck!), but a few extra bills in the pocket wouldn’t hurt. Plus, I’m highly considering further education, which will set me back a few-thirty grand. That said, one of my projects will rely heavily on Amazon Web Services. Amazon has, for quite some time now, opened up their information via REST and SOAP. I’ve been trying (virtually the entire day) to get SOAP to work, but seem to get snagged on a few issues. Stay tuned.
However, in my quest to read every RTFM I stumbled upon a post regarding Python+REST to access Alexa Web Search. After staring at Python code, especially trying to grapple why SOAP isn’t working, updating the outdated REST code was a 5 minute hack. So, if you are interested in using Alexa Web Search with Python via Rest, look below:

websearch.py


#!/usr/bin/python

"""
Test script to run a WebSearch query on AWS via the REST interface.  Written
 originally by Walter Korman ([email protected]), based on urlinfo.pl script from 
  AWIS-provided sample code, updated to the new API by  
Kelvin Nicholson ([email protected]). Assumes Python 2.4 or greater.
"""

import base64
import datetime
import hmac
import sha
import sys
import urllib
import urllib2

AWS_ACCESS_KEY_ID = 'your-access-key'
AWS_SECRET_ACCESS_KEY = 'your-super-secret-key'

def get_websearch(searchterm):
    def generate_timestamp(dtime):
        return dtime.strftime("%Y-%m-%dT%H:%M:%SZ")
    
    def generate_signature(operation, timestamp, secret_access_key):
        my_sha_hmac = hmac.new(secret_access_key, operation + timestamp, sha)
        my_b64_hmac_digest = base64.encodestring(my_sha_hmac.digest()).strip()
        return my_b64_hmac_digest
    
    timestamp_datetime = datetime.datetime.utcnow()
    timestamp_list = list(timestamp_datetime.timetuple())
    timestamp_list[6] = 0
    timestamp_tuple = tuple(timestamp_list)
    timestamp = generate_timestamp(timestamp_datetime)
    
    signature = generate_signature('WebSearch', timestamp, AWS_SECRET_ACCESS_KEY)
    
    def generate_rest_url (access_key, secret_key, query):
        """Returns the AWS REST URL to run a web search query on the specified
        query string."""
    
        params = urllib.urlencode(
            { 'AWSAccessKeyId':access_key,
              'Timestamp':timestamp,
              'Signature':signature,
              'Action':'WebSearch',
              'ResponseGroup':'Results',
              'Query':searchterm, })
        return "http://websearch.amazonaws.com/?%s" % (params)
    
    # print "Querying '%s'..." % (query)
    url = generate_rest_url(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, searchterm)
    # print "url => %s" % (url)
    print urllib2.urlopen(url).read()

You run it like this:

>>> from websearch import get_websearch
>>> get_websearch('python')

Hamachi Rules

I’ve been playing around more with Hamachi, and have decided that it officially rules. Since I’m a big Linux guy I don’t have access to some features, but the program seems to be a gem. It is brainlessly easy to install (even when doing 20 things at once), and works quite well. Thanks to Ben and Sean for helping me test it out.