Error opening /dev/sda: No medium found

Published on Saturday, March 1, 2014

I have had this issue before, solved it, and had it again.

Let's say you plug in a USB drive into a Linux machine, and try to access it (mount it, partition it with fdisk/parted, or format it), and you get the error

Error opening /dev/sda: No medium found

Naturally the first thing you will do is ensure that it appeared when you plugged it in, so you run 'dmesg' and get:

sd 2:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)

And it appears in /dev

Computer:~ $ ls /dev/sd*
Computer:~ $

Now what? Here's what has bitten me twice: make sure the drive has enough power. Let's say you mounted a 2.5" USB drive into a Raspberry Pi. The Pi probably doesn't have enough current to power the drive, but it does have enough to make the drive recognisable. Or, if you are like me, the USB charger powering the drive is faulty, so even though it has power, it doesn't have enough.

The next troubleshooting step should be obvious: give the drive enough power to completely spin up.

Continuous Flow Through Worm Bin

Published on Sunday, January 5, 2014

Status: Done!

A few months ago we decided we wanted a worm bin, as we were eating a lot of vegetables, and tossing away bits that weren't used. We were also buying soil for our plants, so it made sense to try to turn one into another.

One of our friends gave us some worms from her compost - no idea what kind - and I build an experimental CFT worm bin (sample plans). We harvested once at about two months, but I don't think it was quite ready. We'll keep experimenting.

Free Splunk Hosting

Published on Thursday, November 28, 2013

I first used Splunk about 10 years ago after an old colleague installed it on a computer in the corner, and ever since then I have preached about it. If you have log data, of any kind, I'd recommend you give it a go.

The Splunk people have a a few pretty good options for trying Splunk out, as you can either use Splunk Storm or Splunk Free. The first option is obviously hosted, and has a generous storage option, but also does not allow long term storage of data. I send system log data to Splunk Storm.

However, what if you don't have a lot of data, but you want to keep that data forever? After reading Ed Hunsinger's Go Splunk Yourself entry about using it for Quantified Self data, I knew I had to do the same.

From personal experience, Splunk requires at least 1GB to even start. You can probably get it to run on less, but I haven't had much success. This leaves two options: look at Low End Box for a VPS with enough memory (as cheap as $5/month), of use OpenShift. Red Hat generously provides three "gears" to host applications, for free, and each with 1GB of memory. I have sort of a love-hate relationship with OpenShift, maybe a bit like using OAuth. Red Hat calls OpenShift the "Open Hybrid Cloud Application Platform", and I can attest that it is really this. They have provided a method to bundle an application stack and push it into production without needing to fuss about infrastructure, or even provisioning and management of the application. It feels like what would happen if Google App Engine and Amazon's EC2 had a child. Heroku or dotCloud might be its closest alternatives.

Anyways, this isn't a review of OpenShift, although it would be a positive review, but instead on how to use OpenShift to host Splunk. I first installed Splunk in a gear using Nginx as a proxy, and it worked. However, this felt overly complex, and after one of my colleagues started working on installing Splunk in a cartridge, I eventually agreed this would be the way to go. The result was a Splunk cartridge that can be installed inside any existing gear. Here are the instructions; you need an OpenShift account, obviously. The install should take less than ten clicks of your mouse, and one copy/paste.

From the cartridge's GitHub README:

  1. Create an Application based on existing web framework. If in doubt, just pick "Do-It-Yourself 0.1" or "Python 2.7"
  2. Click on "Continue to the application overview page."
  3. On the Application page, click on "Or, see the entire list of cartridges you can add".
  4. Under "Install your own cartridge" enter the following URL:
  5. Next and Add Cartrdige. Wait a few minutes for Splunk to download and install.
  6. Logon to Splunk at:

More details can be read on the cartridge's GitHub page, and I would especially direct you to the limitations of this configuration. This will all stop working if Splunk makes the installer file unavailable, but I will deal with that when the time comes. Feel free to alert me if this happens.

Finding The Same (Misspelled) Name Using Python/NLTK

Published on Friday, September 13, 2013

I have been meaning to play around with the Natural Language Toolkit for quite some time, but I had been waiting for a time when I could experiment with it and actually create some value (as opposed to just play with it). A suitable use case appeared this week: matching strings. In particular, matching two different lists of many, many thousands of names.

To give you an example, let's say you had two lists of names, but with the name spelled incorrectly in one list:

List 1:
Leonard Hofstadter
Sheldon Cooper
Howard Wolowitz
Raj Koothrappali
Leslie Winkle
Bernadette Rostenkowski
Amy Farrah Fowler
Stuart Bloom
Alex Jensen
Barry Kripke

List 2:
Leonard Hofstadter
Sheldon Coopers
Howie Wolowits
Rav Toothrapaly
Ami Sarah Fowler
Stu Broom
Alexander Jensen

This could easily occur if somebody was manually typing in the lists, dictating names over the phone, or spell their name differently (e.g. Phil vs. Phillip) at different times.

If we wanted to match people on List 1 to List 2, how could we go about that? For a small list like this you can just look and see, but with many thousands of people, something more sophisticated would be useful. One tool could be NLTK's edit_distance function. The following Python script displays how easy this is:

import nltk

list_1 = ['Leonard Hofstadter', 'Sheldon Cooper', 'Penny', 'Howard Wolowitz', 'Raj Koothrappali', 'Leslie Winkle', 'Bernadette Rostenkowski', 'Amy Farrah Fowler', 'Stuart Bloom', 'Alex Jensen', 'Barry Kripke']

list_2 = ['Leonard Hofstadter', 'Sheldon Coopers', 'Howie Wolowits', 'Rav Toothrapaly', 'Ami Sarah Fowler', 'Stu Broom', 'Alexander Jensen']

for person_1 in list_1:
    for person_2 in list_2:
        print nltk.metrics.edit_distance(person_1, person_2), person_1, person_2

0 Leonard Hofstadter Leonard Hofstadter
15 Leonard Hofstadter Sheldon Coopers
14 Leonard Hofstadter Howie Wolowits
15 Leonard Hofstadter Rav Toothrapaly
14 Leonard Hofstadter Ami Sarah Fowler
16 Leonard Hofstadter Stu Broom
15 Leonard Hofstadter Alexander Jensen
14 Sheldon Cooper Leonard Hofstadter
1 Sheldon Cooper Sheldon Coopers
13 Sheldon Cooper Howie Wolowits
13 Sheldon Cooper Rav Toothrapaly
12 Sheldon Cooper Ami Sarah Fowler
11 Sheldon Cooper Stu Broom
12 Sheldon Cooper Alexander Jensen
16 Penny Leonard Hofstadter
13 Penny Sheldon Coopers
13 Penny Howie Wolowits
14 Penny Rav Toothrapaly
16 Penny Ami Sarah Fowler
9 Penny Stu Broom
13 Penny Alexander Jensen
11 Howard Wolowitz Leonard Hofstadter
13 Howard Wolowitz Sheldon Coopers
4 Howard Wolowitz Howie Wolowits
15 Howard Wolowitz Rav Toothrapaly
13 Howard Wolowitz Ami Sarah Fowler
13 Howard Wolowitz Stu Broom
14 Howard Wolowitz Alexander Jensen
16 Raj Koothrappali Leonard Hofstadter
14 Raj Koothrappali Sheldon Coopers
16 Raj Koothrappali Howie Wolowits
4 Raj Koothrappali Rav Toothrapaly
14 Raj Koothrappali Ami Sarah Fowler
14 Raj Koothrappali Stu Broom
16 Raj Koothrappali Alexander Jensen
14 Leslie Winkle Leonard Hofstadter
13 Leslie Winkle Sheldon Coopers
11 Leslie Winkle Howie Wolowits
14 Leslie Winkle Rav Toothrapaly
14 Leslie Winkle Ami Sarah Fowler
12 Leslie Winkle Stu Broom
12 Leslie Winkle Alexander Jensen
17 Bernadette Rostenkowski Leonard Hofstadter
18 Bernadette Rostenkowski Sheldon Coopers
18 Bernadette Rostenkowski Howie Wolowits
19 Bernadette Rostenkowski Rav Toothrapaly
20 Bernadette Rostenkowski Ami Sarah Fowler
20 Bernadette Rostenkowski Stu Broom
17 Bernadette Rostenkowski Alexander Jensen
15 Amy Farrah Fowler Leonard Hofstadter
14 Amy Farrah Fowler Sheldon Coopers
15 Amy Farrah Fowler Howie Wolowits
14 Amy Farrah Fowler Rav Toothrapaly
3 Amy Farrah Fowler Ami Sarah Fowler
14 Amy Farrah Fowler Stu Broom
13 Amy Farrah Fowler Alexander Jensen
15 Stuart Bloom Leonard Hofstadter
12 Stuart Bloom Sheldon Coopers
12 Stuart Bloom Howie Wolowits
14 Stuart Bloom Rav Toothrapaly
13 Stuart Bloom Ami Sarah Fowler
4 Stuart Bloom Stu Broom
14 Stuart Bloom Alexander Jensen
15 Alex Jensen Leonard Hofstadter
12 Alex Jensen Sheldon Coopers
13 Alex Jensen Howie Wolowits
15 Alex Jensen Rav Toothrapaly
13 Alex Jensen Ami Sarah Fowler
10 Alex Jensen Stu Broom
5 Alex Jensen Alexander Jensen
15 Barry Kripke Leonard Hofstadter
13 Barry Kripke Sheldon Coopers
13 Barry Kripke Howie Wolowits
12 Barry Kripke Rav Toothrapaly
13 Barry Kripke Ami Sarah Fowler
10 Barry Kripke Stu Broom
14 Barry Kripke Alexander Jensen

As you can see, this displays the Levenstein distance of the two sequences. Another option we have is to look at the ratio.

len1 = len(list_1)
len2 = len(list_2)
lensum = len1 + len2
for person_1 in list_1:
    for person_2 in list_2:
        levdist = nltk.metrics.edit_distance(person_1, person_2)
        nltkratio = (float(lensum) - float(levdist)) / float(lensum)
        if nltkratio > 0.70:
            print nltkratio, person_1, person_2

1.0 Leonard Hofstadter Leonard Hofstadter
0.944444444444 Sheldon Cooper Sheldon Coopers
0.777777777778 Howard Wolowitz Howie Wolowits
0.777777777778 Raj Koothrappali Rav Toothrapaly
0.833333333333 Amy Farrah Fowler Ami Sarah Fowler
0.777777777778 Stuart Bloom Stu Broom
0.722222222222 Alex Jensen Alexander Jensen

Sydney's Education Levels Mapped

Published on Sunday, September 8, 2013

I was talking with my wife about what education levels might look like across Sydney, so she challenged me to map it. The below map is my first draft.

The map was derived by combining three datasets from the Australian Bureau of Statistics (ABS - a department releasing some great datasets). The first dataset was the spatial data for "SA2" level boundaries, the second the population data for various geographic areas, and the third from the 2011 Census on Non-School Qualification Level of Education (e.g. Certificates, Diplomas, Masters, Doctorates). I aggregated all people with bachelors or higher in an SA2 region, and then divided that number by the total number of people in that region. A different methodology could have been used.

EDIT: I should have paid more attention to mapping education levels. I mapped the percentage of overall population, but should have mapped the percentage of 25 to 34 year olds, as this would have aligned to various government metrics.

Reported education levels differ vastly by region, e.g. "North Sydney - Lavender Bay" (40%) vs. "Bidwell - Hebersham - Emerton" (3%). It is interesting to look at the different urban density levels of the areas, as well as the commute times to the nearest centre.

Without trying to sound too elitist, I was hoping to use this map to guide me where to consider buying our next property (i.e. looking for a well educated, clean area with decent schools and frequent public transport). It was interesting to discover that the SA2 region we currently live in has the second highest percentage in NSW.

Feel free to take a look at the aggregated data yourself or download it (attribution to ABS for source datasets).

View Full Screen

Sydney Commute Times Mapped Part 2

Published on Monday, August 5, 2013

In Sydney Commute Times Mapped Part 1 I took a small step to a bigger goal of mashing together public transport in Sydney, and the Metropolitan Strategy for Sydney to 2031. The question I wanted to answer is this: how aligned is Sydney's public transport infrastructure and the Metropolitan Strategy's of a "city of cities"?

I decided to find out.

Thanks to the release of GTFS data by 131500 it is possible to visualise how long it takes via public transport to commute to the nearest "centre".

Cities and Corridors - Metropolitan Strategy for Sydney to 2031

The Australian Bureau of Statistics collects data based on "mesh blocks", or roughly an area containing roughly 50 dwellings. Last week I had some fun mapping the mesh blocks, as well as looking at Sydney's urban densities. These mesh blocks are a good size to look at for calculating commute times.

The simplified process I used was this, for the technical minded:

  1. Calculate the centre of each mesh block
  2. Calculate the commute time via public transport from each block to every "centre" (using 131500's GTFS and OpenTripPlanner's Analyst tool)
  3. Import times in a database, calculate lowest commute time to each centre
  4. Visualise in TileMill
  5. Serve tiles in TileStache and visualise with Leaflet

The first map I created was simply to indicate how long it would take to the nearest centre. There appears to be rapidly poorer accessibility on the fringe of Sydney. I was also surprised of what appears to be a belt of higher times between Wetherill Park and all the way to Marrickville. There also appears to be poorer accessibility in parts of Western Sydney. It is worth noting that I offer not guarantee of the integrity of the data in these maps, and I have seen a few spots where the commute times increase significantly in adjacent mesh blocks. This tells me the street data (from OpenStreetMap) might not be connected correctly.

View Full Screen

My next map shows what areas are within 30 minutes.

View Full Screen

These maps were both created using open data and open source tools, which I find quite neat. In that spirit, I have exported the database (probably a bit hard for most to work with) to a Shapefile. You can open this in TileMill and experiment, if you wish. Download it from here (note: 250MB zip file):

I have been interested in mapping traffic for a number of years, maybe ever since arriving in Sydney. It is sort of a hobby; I find making maps relaxing. My first little map was way back in 2008, where I visualised speed from a GPS unit. A little later I added some colour to the visualisations, and then used this as an excuse to create a little GUI for driving speed. My interest in visualising individual vehicles has decreased recently, as it has now shifted to the mapping wider systems. Have an idea you would like to see mapped? Leave a note in the comments.

Quantified Self Interview

Published on Saturday, July 27, 2013

YS and I were recently interviewed about self-tracking and Quantified Self by one of the major news channels in Australia. I will reflect on the experience after the show has aired, but it was an overall great experience. We have a new respect for filming what may ultimately be just a two minute segment. Depending on how the editing is done it will either provoke the hosts to contemplate the value of a data-centric macroscopic view of the world, or give them lots of fodder.

That said, as you would expect, I had to track my heart rate during the interview - see below. My interpretation is that my heart rate jumped at the start of every questions, and went down as I answered the question. It also dropped when the interview finished. I wish I had a more expensive heart rate monitor (e.g. Zephyr BioHarness or Scanadu) that tracked skin temperature and breathing. My hands felt cold by the end.