Parsing XML and CSV with CAD LISP

Actually this problem is not so hard. The steps to take:

  • Open a file.
  • Read each line with a while statement.

  • Use wcmatch to get a pattern (like a tag value).

  • When a wcmatch hit occurs:

    • Use vl-string-search to find delimiter positions.

    • Next, use substr to get the values between these delimiters.

The following example parses a line with two values between four comma's (longitude and latitude) and assigns a value to lat and lon:

;; str is a line from the file
(setq
   posns (vl-string-search "," str (+ 1 (vl-string-search "," str))) ; Position Northing Start
   posne (vl-string-search "," str (+ 1 posns)) ; Northing End
   poses (vl-string-search "," str (+ 1 posne)) ; Easting Start
   posee (vl-string-search "," str (+ 1 poses)) ; Easting end
   lat (atof (substr str (+ posns 2) (- posne posns 1))) ; Latitude
   lon (atof (substr str (+ poses 2) (- posee poses 1))) ; Longitude
)

This works fine for CSV files as well. The example below uses " as delimiter...

Pretty Printing

This works fine for files with their XML tags on subsequent lines, i.e. pretty printed files. It becomes a problem when all data is just on one line since wcmatch only supports the first 500 characters.

A cat command shows this at the end at the end of a .GPX file from a Garmin device, many megabytes on one line of text:

...
<trkpt lat="52.981938" lon="5.478888"><ele>-8.0</ele><time>2017-06-09T14:34:51.015Z</time></trkpt><trkpt lat="52.98198933" lon="5.4786485"><ele>-8.0</ele><time>2017-06-09T14:34:52.015Z</time></trkpt><trkpt lat="52.98204033" lon="5.4784125"><ele>-7.0</ele><time>2017-06-09T14:34:53.015Z</time></trkpt><trkpt lat="52.98209017" lon="5.47818"><ele>-7.0</ele><time>2017-06-09T14:34:54.015Z</time></trkpt></trkseg></trk></gpx>

As a consequence, pretty printing is needed before the file is processed.

For Linux users this is easy:

xmllint --format filename

After xmllint it shows the following and is ready to be parsed with CAD LISP:

...
      <trkpt lat="52.981938" lon="5.478888">
        <ele>-8.0</ele>
        <time>2017-06-09T14:34:51.015Z</time>
      </trkpt>
      <trkpt lat="52.98198933" lon="5.4786485">
        <ele>-8.0</ele>
        <time>2017-06-09T14:34:52.015Z</time>
      </trkpt>
      <trkpt lat="52.98204033" lon="5.4784125">
        <ele>-7.0</ele>
        <time>2017-06-09T14:34:53.015Z</time>
      </trkpt>
      <trkpt lat="52.98209017" lon="5.47818">
        <ele>-7.0</ele>
        <time>2017-06-09T14:34:54.015Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>

xmllint on Windows

There are at least two ways for the command line (and automation):

Several GUI solutions exist:

This site is hosted by NedCAD.

De inhoud van deze site wordt aangeboden zoals het is, zonder enige vorm van garantie en heeft verschillende licenties. Meer informatie over licenties staat hier.