Actually this problem is not so hard. The steps to take:
- Open a file.
Read each line with a while statement.
Use wcmatch to get a pattern (like a tag value).
When a wcmatch hit occurs:
Use vl-string-search to find delimiter positions.
Next, use substr to get the values between these delimiters.
The following example parses a line with two values between four comma's (longitude and latitude) and assigns a value to lat and lon:
;; str is a line from the file (setq posns (vl-string-search "," str (+ 1 (vl-string-search "," str))) ; Position Northing Start posne (vl-string-search "," str (+ 1 posns)) ; Northing End poses (vl-string-search "," str (+ 1 posne)) ; Easting Start posee (vl-string-search "," str (+ 1 poses)) ; Easting end lat (atof (substr str (+ posns 2) (- posne posns 1))) ; Latitude lon (atof (substr str (+ poses 2) (- posee poses 1))) ; Longitude )
This works fine for CSV files as well. The example below uses " as delimiter...
Pretty Printing
This works fine for files with their XML tags on subsequent lines, i.e. pretty printed files. It becomes a problem when all data is just on one line since wcmatch only supports the first 500 characters.
A cat command shows this at the end at the end of a .GPX file from a Garmin device, many megabytes on one line of text:
... <trkpt lat="52.981938" lon="5.478888"><ele>-8.0</ele><time>2017-06-09T14:34:51.015Z</time></trkpt><trkpt lat="52.98198933" lon="5.4786485"><ele>-8.0</ele><time>2017-06-09T14:34:52.015Z</time></trkpt><trkpt lat="52.98204033" lon="5.4784125"><ele>-7.0</ele><time>2017-06-09T14:34:53.015Z</time></trkpt><trkpt lat="52.98209017" lon="5.47818"><ele>-7.0</ele><time>2017-06-09T14:34:54.015Z</time></trkpt></trkseg></trk></gpx>
As a consequence, pretty printing is needed before the file is processed.
For Linux users this is easy:
xmllint --format filename
After xmllint it shows the following and is ready to be parsed with CAD LISP:
... <trkpt lat="52.981938" lon="5.478888"> <ele>-8.0</ele> <time>2017-06-09T14:34:51.015Z</time> </trkpt> <trkpt lat="52.98198933" lon="5.4786485"> <ele>-8.0</ele> <time>2017-06-09T14:34:52.015Z</time> </trkpt> <trkpt lat="52.98204033" lon="5.4784125"> <ele>-7.0</ele> <time>2017-06-09T14:34:53.015Z</time> </trkpt> <trkpt lat="52.98209017" lon="5.47818"> <ele>-7.0</ele> <time>2017-06-09T14:34:54.015Z</time> </trkpt> </trkseg> </trk> </gpx>
xmllint on Windows
There are at least two ways for the command line (and automation):
Install Cygwin. This will probably offer the best Linux command line experience possible. See https://en.wikipedia.org/wiki/Cygwin and https://cygwin.com/
Use "Bash on Ubuntu on Windows", this world keeps surprising us, see for example https://www.howtogeek.com/265900/everything-you-can-do-with-windows-10s-new-bash-shell/
Several GUI solutions exist:
"XML Copy Editor" does a good job, open a file, press F11 is all it takes. See http://xml-copy-editor.sourceforge.net/