rustkas

rustkas

Property-Based Testing with PropEr, Erlang, and Elixir: implementation without map module restriction is better choise for CSV parser (page 148)

CSV parsing

Dear author, @ferd, I am solely out of positive motives and a desire to improve the books, I would like to suggest that you think about update the section without the help of the maps module functionality in future editions of your very informative and useful book. As shown by the working tests and the implementation that I did without using this module, but using an older ones (lists, proplists) (and as the last test convincingly showed) - the maps module is not the best and not very visual solution for this task, moreover, it has limitations in which there is no need.

%% @doc this counterexample is taken literally from the RFC and cannot
%% work with the current implementation because maps have no dupe keys
dupe_keys_unsupported_test() ->
    CSV = "field_name,field_name,field_name\r\n"
          "aaa,bbb,ccc\r\n"
          "zzz,yyy,xxx\r\n",
    [Map1, Map2] = bday_csv:decode(CSV),
    %?debugFmt("Map1 = ~p~nMap2 = ~p~n", [Map1, Map2]),
    %?debugFmt("Map2 = ~p~n",[Map2]),
    ?assertEqual(1, length(maps:keys(Map1))),
    ?assertEqual(1, length(maps:keys(Map2))),
    ?assertMatch(#{"field_name" := _}, Map1),
    ?assertMatch(#{"field_name" := _}, Map2).

See what we can get by simplifying our CSV parser implementation:

%% @doc this counterexample is taken literally from the RFC
dupe_keys_unsupported_test() ->
    CSV = "field_name,field_name,field_name\r\n"
          "aaa,bbb,ccc\r\n"
          "zzz,yyy,xxx\r\n",
    Result = bday_csv_tuple:decode(CSV),
    List = lists:flatten(Result),
    ?assertEqual(6, length(List)),
    lists:foreach(fun(Elem) -> ?assertMatch({"field_name", _}, Elem) end, List).

Link to source code

Marked As Solved

ferd

ferd

Author of Property-Based Testing with PropEr, LYSE, & Erlang in Anger

That would make the implementation and testing shorter, but do note that the chapter has chosen to use maps as a datastructure for its ease of use to the callers.

That there is a mismatch between the chosen disk format and the useful code format is one of the interesting things that come up and we have to adjust to: either change the spec, or tweak the tests. You are suggesting the former, the book went for the latter.

There is a last gotcha implicit to the implementation of our CSV parser: since it uses maps, duplicate column names are not tolerated. Since our CSV files have to be used to represent a database, it is probably a fine assumption to make about the data set that column names are all unique. All in all, we’re probably good ignoring duplicate columns and single-columns CSV files since it’s unlikely database tables would be that way either, but it’s not fully CSV compliant.

If your CSV parser now supports multiple duplicate columns, there is now a concern that the code that uses the returned lists is able to deal with the edge case of multiple keys being returned, or that a conversion step that removes (or errors on) duplicates is added and also tested. I tend to like narrowing all of this at the edge of the system (when converting from CSV to what is now safe internally).

Your approach is fine and simplifies the CSV testing (your snippets are cleaner), but you should still expect to add specific testing elsewhere in the application that tackles that mismatch between what CSV supports and what the records represented by a database would support somewhere.

Where Next?

Popular Pragmatic Bookshelf topics Top

jeffmcompsci
Title: Design and Build Great Web APIs - typo “https://company-atk.herokuapp.com/2258ie4t68jv” (page 19, third bullet in URL list) Typo:...
New
GilWright
Working through the steps (checking that the Info,plist matches exactly), run the demo game and what appears is grey but does not fill th...
New
lirux
Hi Jamis, I think there’s an issue with a test on chapter 6. I own the ebook, version P1.0 Feb. 2019. This test doesn’t pass for me: ...
New
curtosis
Running mix deps.get in the sensor_hub directory fails with the following error: ** (Mix) No SSH public keys found in ~/.ssh. An ssh aut...
New
brunogirin
When I run the coverage example to report on missing lines, I get: pytest --cov=cards --report=term-missing ch7 ERROR: usage: pytest [op...
New
brunogirin
When installing Cards as an editable package, I get the following error: ERROR: File “setup.py” not found. Directory cannot be installe...
New
AufHe
I’m a newbie to Rails 7 and have hit an issue with the bin/Dev script mentioned on pages 112-113. Iteration A1 - Seeing the list of prod...
New
a.zampa
@mfazio23 I’m following the indications of the book and arriver ad chapter 10, but the app cannot be compiled due to an error in the Bas...
New
ggerico
I got this error when executing the plot files on macOS Ventura 13.0.1 with Python 3.10.8 and matplotlib 3.6.1: programming_ML/code/03_...
New
dachristenson
@mfazio23 Android Studio will not accept anything I do when trying to use the Transformations class, as described on pp. 140-141. Googl...
New

Other popular topics Top

Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
Exadra37
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
DevotionGeo
I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...
New
AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
New
PragmaticBookshelf
Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
AstonJ
Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New

Sub Categories: