Gather and harken unto my tale of woe!

Well, this roadtrip has been rather difficult so far. Not necessarily bad, but definitely difficult.

It was about a week before the weather would let up enough for us to even escape our home state. I came down with a cold as we were leaving. The campground we were originally going to be staying at on the second night was mysteriously closed for the season despite supposedly being open year-round. Panoramio appears to have forgotten that I exist and won’t let me login to upload more to my photos (and I’ve not yet heard back from the email contacts there about getting back in). And then on the third night, neither of the truckstops next to our campground had sour cream. And the following morning, after stopping briefly to pick up some food for breakfast, the truck sputtered and died on the way up the onramp to continue the trip. And then we had some stress and confusion getting things worked out initially with the RV’ers organization to get towed to a repair shop and a campground. And then my wife has apparently picked up the cold that I’m getting over now. And then someone took a doody in my sandbox…oh, wait. That was just a “song” on one of my CD’s. Never mind.

On the other hand, we did manage to finally escape our home state, we did find a replacement campground for the second night, we did get everything worked out okay, and our truck’s problem turned out to be a relatively minor issue with the distributor though it evidently took a fair amount of labor to extract, fix, reassemble, and reinstall it, and we should be able to get back on the road in the morning. So, enough whining from me for now.

Meanwhile, I’ve thought about my “geotagging arbitrary files” issue a bit more. At this point I’m favoring the “geostrings” approach, split into what I’m calling “Where, When, and Whither” fields, which is to say, a field containing location (latitude, longitude, elevation), a field containing time-related information (timestamp, track-id), and a field containing direction (heading and angle) information. I’ve actually started putting geostrings in this form into some of the pictures I’ve been taking, just to get a feel for how easy or hard they are to work with. An example containing all information including the optinal stuff would look like this:

geostr:35.068531033,-106.5019369,1716.0905m:20080104T122418-06,track01:60,20:geostr

The “where” field is latitude, longitude, and elevation, separated by commas. The “when” is the ISO8601 standard simplified timestamp and a track ID, and the “whither” indicates a heading of 60° and an upward angle of 20°. The colon-separated fields and the comma-separated data within each field are in order from (as I perceive it) most important to least important. Aside from the latitude and longitude, and the “geostr” markers on either side, everything is optional.

Comments?

Linking this more-relevant latter portion of the post to the whining at the beginning is the fact that the cold I’m now getting over has messed up my voice. I did bring microphones and both my computer and some cheap portable recording gadgets, so at some point along the way I still want to do at least one short audio recording, geotagged and including an embedded image to go with it. I just need to wait for my voice to properly return (and to spot something about which I feel an urge to inflict people with my blabbering.)

Proposed format(s) for geotagging arbitrary types of media

Yet more thoughts on geotagging – here’s what I’ve come up with so far.

The format needs to handle only two fundamental data types – points and polygons. It also obviously needs to handle “lines” or tracks, but those are made of “points”. Polygon, for my purposes, might be unnecessary and I’m not sure if I should leave it in. I’m reluctant to leave it out – that way you could easily georeference media to a building or field’s outline, for example. On the other hand, I’m trying to keep this format terse and concise – I’m not trying to merely embed .gpx or .kml files in things.

A “point”, as I am thinking of defining it here, is made of up to seven attributes (more or less in order of importance): a latitude/longitude pair, elevation, timestamp, track-ID, heading, and angle. A polygon is the same, except that it contains a list of at least three lat/lon/optional-elevation sets. It still only has a single timestamp, though, just like a “point”. I suppose in some odd cases one could even define a track as a series of polygons – defining the field of view in a video taken from the bottom of an airplane that’s taking off, for example.

Leaving aside the question of polygons for now, I’m envisioning two possible formats which I will arbitrarily name “geotag” (XML-type) and “geostring”(simple text) for the moment.

I picture a geotag entry looking something like this:

<geotag:point lat="41.228063" lon="-115.058119" elev="1720.901m" datetime="20071115T143000-06" trackid="1" heading="340" angle="-5.0">Metropolis Hotel</geotag:point>

In this format, the optional description of the point is between the opening and closing tags there. “lat” and “lon” might be better as a single “latlon” or “coord” attribute, with the latitude and longitude separated by commas (i.e. <geotag:point coord="41.228063,-115.058119">:</geotag:point>)

A “geotring” point might look something like this instead:

geostring:point:41.228063:-115.058119:1720.901m:20071115T143000-06:1:340:-5.0:geostring

Not sure if the closing “geostring” is really necessary here, but it would make backwards-compatibility easier if fields were added to future revisions. As with the geotag, it might be better to treat the lat/lon pair (the only mandatory information for a minimal “point” definition) as a single field, so the minimal “geotag” example above done as a “geostring” would look something like: geostring:41.228063,-115.058119::::::geostring

Even as I write this, I find myself leaning towards combining the latitude and longitude into a single field, if for no other reason than it means each point only has one required field. Either way, I currently think the fields ought to be defined thus:

  • latitude and longitude are decimal degrees. Either may be prefixed by a + or – (lat: +=”Northern Hemisphere”, -=”Southern Hemisphere”, Lon: +=East, -=West) – if neither is there, + will be assumed. Latitude and longitude are required for every point.
  • Elevation may be suffixed by “m” or “f” (for “meters” or “feet”). If neither is specified, meters are assumed.
  • Timestamp is in the ISO 8601 “basic format”. If neither “Z” or an offset from UTC are specified, “the viewer’s local time” should be assumed (which is kind of silly, but it still would allow one to synchronize a track with, say, an audio recording or video.)
  • trackid is any arbitrary alphanumeric term with a maximum of, say, 16 characters (is that enough?) Any points with the same trackid are assumed to be part of the same track. If unspecified, the point is assumed to be unrelated to any other points (if any exist) that may be in the same file.
  • Heading is in decimal degrees from 0 to 360. This represents facing a particular (horizontal) direction from the point in question. “Which direction the camera was pointing” in the case of a photograph.
  • Angle is in decimal degrees from -90 to 90. This represents an angle above or below the current elevation at that point (for a picture, this would represent the upward or downward angle that the camera was pointing when the picture was taken.)

Hmmm, if I shorten “geostring” to “geostr” and either eliminate the “data type” field (“point”) or just reduce it to a single letter, that entire and complete “geostring” example would fit even into a single tiny 64-character comment field, if there are any file formats still floating around limited to that kind of small metadata size.

My main goal here is to make it easy to create files tagged with this information. So long as it’s easily read and not likely to get separated from the file it describes, using the data for anything ought to be easy, even if one has to do it “by hand”. As was mentioned on the “Into the Pudding” blog (found via the GeoRSS blog), having applications that can read metadata is useless if nobody’s putting the metadata in their files to begin with. If an acceptable format can be worked out, I intend to start making as much georeferenced information available as possible.

Who’s with me? Comments, suggestions, offers of patronage, anyone?

More on geotagging

Some good comments came up in the last post on georeferencing. I thought a followup post was
merited.

The itch I’m trying to scratch here is that I want to be able to georeference just about any kind of data,
and I want to be able to embed the georeference information directly in the data file, whether it’s a
graphic, or audio, or video, or gene sequence data, or anything else. I want to have a standard form for tagging any of these files. And I don’t want to store the location metadata in a separate file.

What I think I need, then, is a standard, simple way of making geographic notations in a terse, concise format that is both easily parsed by and readily recognizeable to a computer, is reasonably human readable, and can be made to fit just about anywhere that arbitrary text is allowed.

Right now, there are only two types of files that have some way of embedding geographic information into them that I know of. The obvious one is that EXIF data in JPEG files can contain “GPS” tags. For hardcore GIS people, GeoTIFF is the other one. Both are for photographs or other still-image data only. What about the rest?

A variation of one of the current geotagging XML formats like the W3C (“<geo:lat>41.4354840</geo:lat><geo:lon>-112.6660845</geo:lon>”) or GeoRSS is an obvious possibility. XML has two potential problems though, as I see it. First, it’s not very terse – the markup substantially increases the amount of space the information takes up. I think in most cases that wouldn’t necessarily be a problem, but I suspect there are a few file formats out there with only comparatively small spaces set aside for a “comment” or “description” field.

The second potential “problem” is something odd that occurred to me today: it’s hard to pronounce out loud. There are some popular audio formats (e.g. “.wav”) that as far as I know have no space whatsoever for arbitrary text…but if my little standard was something that could be distinctly spoken, someone making a recording could literally speak the metadata in a format that a speech-to-text engine (like Sphinx) might be able to recognize and convert to a compatible string of text which could be parsed just like data from anywhere else. This is something of a corner case, I admit, but I think it’s at least worth considering.

Another good point that came up was what you do if your data extends beyond a single point. For example, if I want to georeference an audio recording I might make while narrating what I’m seeing out the window of a speeding train, it makes good sense to at least try to store line segments rather than just a point. That way, if someone wants to find the spot within a several-mile stretch where I suddenly exclaim “Hey, wow, look at that!” they can. The ability to define areas with a polygon or a point-and-radius seems like it would be handy, too, though obviously much more optional.

So, let’s see, I’m looking for a format with minimal markup, but which is easily recognized, is made of plain text which could be crammed into, say, a PNG tEXt chunk, an mp3 comment frame, a Genbank “Source” field, or any other field which allows arbitrary text. I want a form that’s minimally objectionable to anyone else who might be willing to use it. And I think I want it to be able handle points consisting of at least latitude, longitude, optional elevation, optional timestamp, and possibly even an optional heading and angle, and can handle more than one point per file (for the case of lines). Am I forgetting anything?

Besides “going to bed before 3am”?

I want to geotag something besides photographs!

Cornelia - Queen of the Snow!For no particular reason, here is a picture of The Dog in her natural habitat. This picture really has nothing to do with today’s blog post, but since this is supposed to be a happy time of year, I suppose a happy picture is in order.

In case anyone is wondering if I’ve forgotten the supposed microbiological emphasis on this blog, the answer is no. In fact, I’ve got a post on amateur yeast culture brewing, but I’m still researching it a bit.

Meanwhile, it seems reasonable to post about geolocation, which after all is an important and useful trick for associating information with its place in The Big Room.

Geolocation of photographs is well established, at least for JPEG images. There are standard ways of tagging a JPEG file with an ICBM address, and I’ve been having a lot of fun doing this with my own pictures. (If you’re bored, you can browse them on Panoramio, and perhaps in a few weeks may stumble on some of them in Google Earth.)

There doesn’t appear to be any standard way of tagging other forms of media files, though. What if I want to geotag an .mp3 or OGG/Vorbis audio file recorded at a particular spot? Or a “DivX/Xvid” or OGG/Theora video?

Irritatingly, it seems as though a few people have mused about it, but nobody seems to have addressed it. There are projects like The Freesound Project which does geolocate sounds, but the geographic information is not actually embedded into the sound files in any way. As far as I can tell, the location is tracked in their own server’s database only. A Google search turned up a post on the “Random Connections” Blog musing about this, but the only application mentioned is adding georss tags to the RSS for a podcast feed, not to the podcast’s audio file itself. Even the otherwise excellent Mapping Hacks book (written before O’Reilly’s current decline into yet another “Proprietary Product® How-To Guides” publisher over the last couple of years) mentions the topic in Hack #59, but disappointingly appears to have really had nothing to do with tagging files so much as “interpolating a position from a GPS track, given a timestamp”.

This all comes up because we’re about to go on a roadtrip to check out a part of the country where we seem likely to end up living next year. I’ve been told I’ve got a pretty good voice, so I was considering generating a travelogue series along the way. It appears to be relatively easy to generate a “narrated picture” as a standard mp3 file, the picture being loaded as though it were “album art”. The only aspect of the whole thing that’s missing is geolocation. For now, just being able to easily obtain the ICBM address associated with the file while playing it so that one could plug the coordinates into Google Maps to see where the recording was done, but ideally I’d like to do it in a way that could be considered standardized, so that later on people might be encouraged to add geolocalization plugins to their media-playing software.

Sure, I can just generate a .kml file with a track of where we were, with markers containing picture and audio links. In fact, I probably will, but I don’t want people to have to use Google Maps or Google Earth to make use of the geolocation information associated with the audio.

Any suggestions, anyone?

I’m having too much fun with this.

I finally managed to get Hugin to work, as you can see from the picture of the Dead Fish Museum above.

Okay, it’s the visitor’s center at the Fossil Butte National Monument, but it really is a museum of dead fish. And other fossils. If you click the image to get to the Panoramio page, you can even see where it is on the map: in fact if you zoom in, the building itself is visible in the aerial photo imagery.

Between digiKam’s ability to handle geocorrelation with tracks from my GPS, Panoramio’s support for geolocation and mapping (and connection to Google Earth…), playing with High Dynamic Range digital photography, and now panoramas, I’m beginning to develop an increased urge to travel around and take pictures again…

Nerd Photography in the Big Room

Readers may have noticed by now that I have a cheap but serviceable digital camera that I’ve been using to take pictures which occasionally show up here on the blog. (Hey, there’s another thing that the External Deliverer, in Its benevolence, might bring me: a nicer digital camera.)

I’ve been playing with geolocation for a while now. Just recently, I started also doing some crude playing with High Dynamic Range digital photography. It’s obviously going to take me some work to get it figured out and get better results, but what I’m getting so far doesn’t look too bad, at least in my own opinion. Kind of surreal, like Mars Rover pictures…

I’ve discovered that my Handy-Dandy Linux box has access to a couple of tools that make these easy.

I noticed a few days ago that digiKam is actually able to read .gpx format files downloaded from my GPS and then correlate the track from the GPS with the timestamps on the photos automatically, so in what little spare time I have I’ve been going back through my archives of GPS tracks and timestamped photos and trying to find as many to correlate as I can. I managed to get geolocation tagged into pictures from as long ago as three years or so. I also tagged this more recent one. I saw this place half a decade ago and had been wondering if it was still there. Last week we finally had a chance to visit and sure enough, it was there. If you were wondering where one could go to learn to do the Squirrel Dance, here it is.

Landscape and Sign:Don't Trespass on the 'I'

Today after classes I trudged up to the top of the hill at one corner of the campus with my trusty GPS in hand and took a few pictures, as you can tell. Since Google Earth seems to get most of it’s photos from Panoramio, I’ve started uploading them there. I may also get around to uploading them to flickr one of these days, too. I kind of need some pleasant distraction – I’m starting to hit the “Am I there yet???” phase of the semester. Just another week-and-a-half of classes, then finals, then I’m finally done. At least with the undergraduate stuff.

If you’re bored, there are a couple of additional pictures on the Panoramio site, here. You can also get the ICBM address there, and a .kml file for Google Earth so my pictures will pop up if you happen to run past an area where one of them is while you’re browsing the globe.

’tis the season to be greedy

Members of my immediate family start asking around this time of year about what kinds of things I’d like for Christmas presents this year.

This strikes me as a good way to break the week-long bout of blogstipation I’ve been having. Here, then, is what I want for Christmas, Xmas, Hannukah, Kwanzaa, Cephalopodmas, or whatever gift-giving winter holiday you prefer (each category is sorted roughly in order of desire at the moment):

Ridiculously Expensive Stuff

Which I only list on the off-chance that someone wins the lottery or happens to find an amazing bargain on “e-bay®” or something.

Relatively Expensive Books

Other kinda-expensive-but-maybe-you-can-find-it-at-reasonable-price stuff

Relatively Cheap Stuff (but still spiffy)

I know there was more, but my brain seems to have gone on break right now…

It’s over!

No you can't have $10,000.  Not yours.

I am proud to announce that I am 5th Loser in this 2007 College Blogging Scholarship competition!

Lacking the emotional appeal and/or existing promotional network of the top scorers, I was pretty much up the creek without a plunger. Given the popularity contest format of the competition, I’m actually pretty pleased with how I did. My regular readers (judging by the hits to the RSS feed) have approximately tripled or quadrupled, and I did get a small but useful amount of feedback to help improve things. Oh, and hey, I seem to have readers in Berlin and somewhere in Chile, among other places, so now I can say I’m “world famous™”. Though the proportion of voters who actually did check out all of the blogs was pitifully low, it does still look like it was around 1-2% of the voters, which is actually higher than I would have predicted.

I get the impression that some of us running less well known blogs were a little disappointed about the format of the competition, but there’s really no reason to be. All it means is that rather than being a contest for “highest quality” blog, it was a contest for “most effective” blog. Certainly, being able to get your “vote for me” message out to a larger range of people is a valid measure of effectiveness, so the results seem reasonable to me. And I wasn’t the bottom scorer. Judging by the way my score moved, at least some portion of the people who were examining all of the blogs actually did like what they saw here as I was getting a couple of votes a day on average, so I’m doing something right at least.

The only complaint I really have about the “popularity contest” format is this: I think one of the major benefits to humanity of “blogging” is the fact that unlike mainstream media, a blogger can afford to present unusual, less broadly popular content which otherwise would never be made available. Not having to worry about the internet equivalent of “Nielsen Ratings”, we can afford to put up obscure or strange things that only a fraction of the world might be interested in, which is why if you poke around the internet, you can find something that isn’t the latest celebrity crap or badly-reported political scandal. I actually don’t know how much of a role it played in this particular competition, but this sort of approach in general strikes me as something that would be strongly biased towards “mainstream” content. I think a little more love for all of us off-center folks would be in order.

I also hope they’re offering runner-up prizes again this year. Even if *I* don’t win, at least one of “my people” (nerds, that is – hey, you don’t go for a PhD in Neuroscience without being at least a little bit of a nerd…) would get something again this year if they do.

This does mean, though, that I won’t have $10,000 to buy a microscope with. Woe is me. On the other hand, that means I’ve got no excuse not to try begging in front of scientific conferences. I figure that ought to be worth some entertainment, once I get some time to try it. Perhaps by this time next year, I’ll have a bit more fame and popularity and have a better shot at the prize.

Hey, scienceblogs.com, if you want to promote my blog next year when I’m (hopefully) in graduate school, I may have a shot at the prize next time around… (UPDATE: It may not be obvious, but this should be read as good-natured jealously, not some kind of complaint or accusation…)

And now that all that’s over, we’ll be returning once again to my usual nerdity. Stay tuned (some more).

Hello, College Blogging Scholarship reviewer and other casual viewers

I see the hits from people examining the finalist blogs (including this one) at the 2007 College Blogging Scholarship are up, presumably since this is the last weekend of voting (insert obligatory “please vote for me” plea here).

I have a favor to ask of you, and everyone else who happens to stumble on this blog one way or another (including my regular readers): Please tell me something about your impression of this blog. Even if all you have time for is a quick one-sentence comment, praise for something you like or thoughtful criticism of something you don’t like, or just something that you thought was noteworthy, it will help me improve the blog. No registration is necessary to comment.

If you have time for a more detailed comment, some opinions as to what else you might be interested in seeing here would be helpful. For example, I’m considering trying to do a regular or semi-regular podcast. Would that be of interest? More pictures? More detailed discussions of scientific matters? Naked pictures of myself? (Okay, almost none of you would really want the latter…)

I’m actually more interested in your opinions than your votes, though if I can have both I would obviously be grateful…

I shall return again to the science nerdity intended for a broad not-necessarily-nerdy audience shortly. Thank you.

#1 on Google!

Over on scienceblogs.com’s The World’s Fair, the author has started an amusing meme.

It goes like this: the challenge is to find 5 sets of search terms for which your own blog or site is the #1 hit on a Google search. Note that it is acceptable to quote specific phrases but of course it’s more impressive if you don’t. Here are 8 that (as I type this) for which this blog is the #1 hit (links go to the blog address that is the hit):

There was at least one other which I’m having trouble remembering at the moment. Perhaps I’ll update later if I remember what it was.