I’ve been working on my Rudimentary-CSV reader over the past couple of weeks and I’ve made some progress:
- First, I’ve decided to make a prototype using lists as the main data structure instead of vectors.
- Second, I’ve mostly rewritten how it parses the text. It can now properly tell the difference between a string and number when the string looks mostly like a number. It can also read strings that aren’t quoted properly.
- Last, the separator can be anything you want it to be. You can now tell the parser what character is considered to separate fields in the spreadsheet. It can be \#, or \#; or \#g for all it cares.
The only bug that I can think of right now is for our Europeans friends that use a comma as a decimal place. It will think that your number 1000,01 will read as a string, whereas 1000.01 will read as a number. This should be easy to fix based on the user’s choice of separator… But I probably won’t find out until we go on vacation and I get some time to hack on it again.
Anyway, I haven’t posted this newer version to github yet. Before I upload this list-based version, I’m going to
fix the above bug and make a way to save a CSV file that is in memory to disk. Then once I think I’ve squashed most of the bugs, I’m going to convert it back to using vectors again and see what happens.
(Update: May 5th) The ‘decimal place’ bug has now been fixed. When calling the function PARSE-FILE, you can now enter in two different keys:
- SEPARATOR <character> – The entered character will be interpreted as the separator between two cells.
- DECIMAL-POINT <character> – The entered character will be interpreted as the decimal place. So if you enter #\, it should correctly parse “1234,56″
This fix has also a couple of nice side-effects. First, it’s made the code more generic. Second, it’s eliminated a function whose presence was shaky at best. Last, I think it makes Rudimentary-CSV compliant with the ‘CSV standard’ that I mentioned in a previous post.
(Update: May 7th) I’ve been working on a few functions that will enable this script to convert a parsed CSV file back to a string and then save it back to disk. While I was doing this, I found a new bug that was causing strings that “almost look like numbers” (aka “10565-71″) not to parse correctly and come out as a symbol. I’ve now fixed this bug. So, I’m hoping that in the next day or two, I’ll have a working save feature and the list-based script will basically be done.
(Update: May 14th) I think that I’ve got the list-based version of Rudimentary-CSV to a point where I can release it… Problem is that I’m on vacation at a place that doesn’t spread wifi around to all of their ‘caravans.’ This means that I will either have to wait until I get home, or find a place that has free wifi and persuade the rest of the family to go there so that I can get this done.
While I was slowly working on getting the script updated, I ran in to a couple of bugs. The first of which was still causing some numbers to be read as strings. This I fixed by reworking the code that tells us if something looks like a string or a number. When that was fixed, I then noticed that some rows were getting an extra column when converted back to text when their first cell had nothing in it. The problem showed itself to be caused by incompatibility between the code that identifies and works with blank cells and the code that deals with what comes next. The last problem is one that I haven’t quite fixed yet. It, for some reason, is not dealing with some quoted strings correctly. I’ll be giving that my attention next… Hopefully before I upload the latest and greatest.