Writing a METAR decoder in Python: lessons from the spec

METAR is a 1960s format that encodes current weather observations in a terse coded string. Parsing it in Python is an instructive exercise in dealing with a spec that was written for typewriters.

The format

A METAR looks like: EPKK 121630Z 25015KT 9999 FEW030 BKN090 12/04 Q1015. Each space-separated token represents one observation group, and the order is mostly but not entirely fixed. Some groups are optional, some have optional sub-fields, and there are country-specific variations that diverge from the ICAO standard.

The parsing strategy

A regex-per-group approach is the standard implementation. Each group type has a pattern; you try them in defined order, consume the match, and move on. This works for well-formed METAR but fails silently on malformed input.

StormVeil uses a slightly different approach: a parser combinator that tracks position in the token stream explicitly, backtracks on failure, and records which groups it was unable to parse. The result is that you see a partial decode rather than nothing when a METAR contains a non-standard element.

The duct tape

Runway visual range groups (RVR) have at least four different format variants in the wild. Wind shear groups appear in US METAR but not ICAO. Trend groups at the end of some METARs require a recursive mini-parse. This is the kind of thing that makes parsing interesting and writing parsers a reliable way to become opinionated about format design.