bool unbool and something to do with tests, coverage, listening

Zm4SGE

Hi hello I guess this is about a kind of tidying and refactoring that I often attempt or see attempted, and about how those attempts don't always result states of affairs that I consider extremely satisfying, and then about what has worked best for me here. It's also about stuff like unit/micro testing and test coverage. There's something about: As much as I dislike the way people do meaningless test coverage measurements and then act on the "information" they get in bizarre ways, I also think that these ideas are sometimes useful ideas that we can use when thinking about code. And I kind of want to have an example around where dealing with test coverage stuff is not a "naturally additive" activity that just consists of throwing more tests at stuff.

Also about booleans.

This is a made up example but all the steps are based on actual things I have seen happen multiple times.

The IDs of things

Let's say we're doing Enterprise Software Engineering and we're making a system for things. And let's say that the things have unique identifiers of some kind and that those identifiers have to look a certain way:

traced()

function isValidThingID(str)
  if not str:match("^%u%u[-]%d+$") then
    return false
  end
  if #str < 5 then
    return false
  end
  if #str > 10 then
    return false
  end
  return true
end

The code is a little different from how I'd probably actually implement a function like that. I wanna write Lua, but I wanna make it feel a little "enterprisey" and I also wanna make it play nice with the code coverage thing I hacked together that only deals with line numbers, so it's a little something. (The call to traced makes the code coverage thing care about that chunk of source code btw.) Anyway:

local function valid(str)
  return function() assertEqual(true, isValidThingID(str)) end
end
local function invalid(str)
  return function() assertEqual(false, isValidThingID(str)) end
end

runtests(
  valid("AS-123456"),
  invalid("as-123456"),
  invalid("AS-abcde"),
  invalid("AS-1"),
  invalid("AS-12345678910"))

This is sometimes what it's like in Enterprise Software Engineering. I've seen better and I've seen worse. Sometimes the check to see if a string is a valid ID is implemented inline here and there where it's needed, and then maybe some of those implemetations disagree with each other. Stuff like that happens. Here it's extracted into a function so that we have that knowledge in one place and we also have some tests for that function. It's not all bad or anything.

What would and would not be improvements here is kind of hard to say. It's a made up example without enough made up context. But if we think of the kind of Enterprise Software Engineering code that I have in mind, there is something that we might possibly want to make changes to. There are different ways to frame it, but one way is to say that there's a bit of boolean blindness going on:

A true or false value doesn't "contain" much information and callers of the isValidThingID function will kind of just have to know what those values "mean." For example, having microtests for isValidThingID will not catch some other code using that function in an if-statement and getting the then and else branches switched around.
Also that returned boolean will typically be acted upon and then discarded. If you're writing some code that deals with an ID you often can't know some code has previously called isValidThingID on the string in question and it is "validated" or if you should call the function and validate it yourself.

A thing ID kind of thing

A fairly enterprise-compliant thing we can do is to make a data structure for this kind of ID. We can do some smart constructor kind of stuff so that when we have a value of that kind we know that the string it contains has been through validation and is a proper thing ID.

Maybe we typically had code near the calls to isValidThingID to deal with case sensitivity and whitespace stuff in order to more leniently accept input. Since our constructor is constructing the value that is going to be used, and not just checking some value and answering yes/no, we can bake this stuff into the constructor. That's not always a good idea or anything, but like just for the sake of the example let's do it:

traced()

ThingID_ = {}
function ThingID_.new(str)
  str = trim(str):upper()
  str = str:match("^%u%u[-]%d+$")
  if not str then
    error("bad format")
  end
  if #str < 5 then
    error("few characters")
  end
  if #str > 10 then
    error("many characters")
  end
  return {
    tag = ThingID_,
    identifier = str
  }
end

local function valid(expected, str)
  return function() assertEqual(expected, ThingID_.new(str).identifier) end
end
local function invalid(str)
  return function() assertError(function() ThingID_.new(str) end) end
end

runtests(
  valid("AS-123456", "AS-123456"),
  valid("AS-123456", "as-123456"),
  valid("RT-5432", " RT-5432  "),
  invalid("asd-123456"),
  invalid("QW-1"),
  invalid("ab-123456789"))

I've seen a lot of stuff like this and it seems to usually work out pretty good. Strings can be turned into ThingIDs as soon as is reasonable and code that deals with ThingIDs can trust that those IDs are good. Code using Thing.new can't mess up in the same way that code using isValidThingID(str) could.

In a typed language you typically get some typecheking for it, and in an untyped language you typically get some "fail reasonably fast" kind of deal since you tend to use specific data structures in specific ways: If we pass a string into a function that expects a ThingID things will fail the moment we try to access its identifier, whereas passing a string that has not been validated into a function that expects a string that has been validated is more likely to lead to stuff like storing an invalid string in the database and discovering that it's invalid three years later. (Of course it's also possible to add our own tag-checking here and there if there are places where we want to make very sure we crash very fast.)

I don't love writing tests that checks for errors thrown from a function. Errors can come from further down the call stack than expected and I think tests checking return values are like more "naturally reliable" with regards to breaking when they should and not when they should not. You can make the error-checking ones more reliable by doing more work to check that you're getting the correct error, and then if the function under test isn't too complex things are fine.

Anyway this kind of thing has usually been like good enough and a definite improvement on just passing strings around.

Extracting the validation function back out again

So in this situation, a thing that sometimes happens is that the constructor that potentially throws an error isn't always what you want. Sometimes there's input from a user or something that you should expect to be sometimes mistyped and so on. If a bad ID is something kind of expected that we should handle, and not just something that can be dealt with as a generic error-of-some-kind, I generally prefer to use return values and not "errors" for it. Even if we really enjoy using error handling mechanisms for this kind of stuff, there's good chance we would like some kind of distinction here, and have like:

One function we can use when we our string comes from a trusted source and just want our general "rollback and undo and abort everything" error handling to deal with it if it's bad.
A different function we can use when our string comes from a less trusted source and we want to deal with "bad input" situations more directly.

One of the things I've seen happen here goes kind of like this:

traced()

ThingID__ = {}
local function construct(str)
  return {
    tag = ThingID__,
    identifier = str
  }
end

function ThingID__.isValid(str)
  str = str:match("^%u%u[-]%d+$")
  if not str then
    return false
  end
  if #str < 5 then
    return false
  end
  if #str > 10 then
    return false
  end
  return true
end

function ThingID__.new(str)
  str = trim(str):upper()
  if not ThingID__.isValid(str) then
    error("bad format")
  end
  return construct(str)
end

function ThingID__.maybe(str)
  str = trim(str):upper()
  if ThingID__.isValid(str) then
    return Optional.of(construct(str))
  else
    return Optional.empty()
  end
end

This kind of does the trick. It also reintroduces some of that blindless, although probably as a more isolated thing. It can make some things a little awkward when testing it.

I think it makes sense to want to test the isValid function. That's the function that does the actual checking. It is used in two functions and if we can cover it directly then we shouldn't need to test those two other functions with as many different strings.

local function valid(str)
  return function() assertEqual(true, ThingID__.isValid(str)) end
end
local function invalid(str)
  return function() assertEqual(false, ThingID__.isValid(str)) end
end

runtests(
  valid("AS-123456"),
  invalid("as-123456"),
  invalid("asd-123456"),
  invalid("QW-1"),
  invalid("AB-123456789"))

Like we can do that. We cover like the core checking stuff with tests and that's good. Same as with the isValidThingID in the beginning, testing the validation function doesn't actually give us much confidence that the results are used correctly by callers of the function. In this case the idea is that only new and maybe needs to use the function, so the problem should be a lot smaller.

The new and maybe function have just one if-else each so we should be able to cover them with just two tests for each. In practice, when things are organized like this, I tend to just test the outer functions instead of the inner, but with test data that covers all of the inner one. Something like:

local function valid(expected, str)
  return function()
    assertEqual(expected, ThingID__.new(str).identifier)
    assertEqual(expected, ThingID__.maybe(str).value.identifier)
  end
end
local function invalid(str)
  return function()
    assertError(function() ThingID__.new(str) end)
    assertEqual(nil, ThingID__.maybe(str).value)
  end
end

runtests(
  valid("AS-123456", "AS-123456"),
  valid("AS-123456", "as-123456"),
  valid("RT-5432", " RT-5432  "),
  invalid("asd-123456"),
  invalid("QW-1"),
  invalid("AB-123456789"))

That's okay I think. We can then also consider making the isValid function local/private so that other code don't get to just use it. Then we can kind of know that we have covered it its callers with tests and we're more likely to catch it if other callers start cropping up.

And I think there's something a little unsatisfying about it. It's not majorly major and important stuff, but:

- We are kind of double-testing the isValid function.
- isValid is basically a pure function and uh. It might not be a function we particularly want to expose to the surrounding world anyway, but it does feel maybe a little silly to consider hiding it because it's like dangerous. Like, because it returns booleans, we know that if anyone uses it they're probably going to if-else on it or something like that, and we know that that's a thing that can be done the wrong way around. Like I dunno are we "tainting" callers with the booleans?

Results

Yeah I guess I think that is what is happening: Our function solves a little logic problem. It ifs and it elses and it figures things out. But in the end it sort of returns a tiny logic problem to its caller. Like here's a boolean value for you to deal with, and let's see if you if-else on it the way you should. IME a lot of problems can be avoided by returning something more "usable." In this case I think a "result" value that potentially contains the validated ID string is a good fit:

traced()

ThingID = {}
local function construct(str)
  return {
    tag = ThingID,
    identifier = str
  }
end

function ThingID.valid(str)
  str = trim(str):upper()
  str = str:match("^%a%a[-]%d+$")
  if not str then
    return Result.bad("bad format")
  end
  if #str < 5 then
    return Result.bad("few characters")
  end
  if #str > 10 then
    return Result.bad("many characters")
  end
  return Result.good(str)
end

function ThingID.new(str)
  return construct(ThingID.valid(str).orError())
end

function ThingID.maybe(str)
  return ThingID.valid(str).map(construct)
end

(Since we, when validation succeeds, return a data structure containing the actually validated string, it's possible to move the stuff dealing with casing and whitespace into the validation function. Again, this may or may not be a good idea, depending on circumstances, but now it's an option.)

local function valid(expected, str)
  return function() assertEqual(expected, ThingID.valid(str).value) end
end
local function invalid(str)
  return function() assertEqual(nil, ThingID.valid(str).value) end
end

runtests(
  valid("AS-123456", "AS-123456"),
  valid("AS-123456", "as-123456"),
  valid("RT-5432", " RT-5432  "),
  invalid("asd-123456"),
  invalid("QW-1"),
  invalid("AB-123456789"))

I tend to be happier with designs like this. ThingID.valid feels more complete to me. If I test that one function, or do some other formal or informal analysis of it, it doesn't feel like I'm leaving some important part of it out.

So in this case I'm more comfortable with testing only the ThingID.valid function and not its callers. If I want to cover it more fully, I'll just add one test for each of the outer functions. They basically just compose other functions without having any branching of their own. We could imagine testing for both valid and invalid IDs, but that'd more or less just amount to additional testing of ThingID.valid and Result's map and orError functions.

local function valid(expected, str)
  return function() assertEqual(expected, ThingID.valid(str).value) end
end
local function invalid(str)
  return function() assertEqual(nil, ThingID.valid(str).value) end
end

runtests(
  valid("AS-123456", "AS-123456"),
  valid("AS-123456", "as-123456"),
  valid("RT-5432", " RT-5432  "),
  invalid("asd-123456"),
  invalid("QW-1"),
  invalid("AB-123456789"),
  function()
    assertEqual("AS-123456", ThingID.new("AS-123456").identifier)
  end,
  function()
    assertEqual("AS-123456", ThingID.maybe("AS-123456").value.identifier)
  end)

Are those last two tests important? Depends I guess. I mean, that's the case with all the tests, but. For those last two I think there's stuff like:

How bad are different kinds of errors? I'm usually more worried about errors where the program might carry on and keep acting on bad data than about errors where things throw and stop and crash. Mistakes caught by the last two tests seem more likely to be in the second category. But then in some contexts errors like that are more important than in the ones I'm typically in. Eh...
If we have some larger tests outside of this anyway, maybe we're happy with some of them covering ThingID.new and ThinkID.maybe indirectly instead.
Maybe we trust that some other stuff will catch mistakes here. I guess this could be a case where it might be like "I trust the type system/some tooling thing to catch any mistake that could really be introduced here" in one language/environment while in another it could make more sense to add like one test for each outer function.
...

Bla blah

I don't care about stuff like test coverage as a number, but I think coverage can sometimes be a tool for thinking about the way something feels off or something. Testing a piece of logic directly and "where it actually is" sometimes feels unsatisfactory. That could sometimes be a smell worth investigating, and then thinking about it in terms of coverage could be useful.

A thing: This is an example of something where, to me, it does kind of feel like the tests are "driving" something. I "listen to the tests" and I make some changes. But like it pushes the design of the code in the direction of a design that I happen to like. What a surprise! I sometimes see people, I dunno, "expecting" something from their tests? Like they expect that if they follow some steps in a particular order the design will "improve" in a kind of mechanical way that isn't much related to the tastes and skills of the programmer. I don't think that's the case at all.

Anyway there's one way to look at it where, if we push the branching into like the leaf nodes of our computation, then we can microtest those and e.g. have a few tests covering the branches of one such node, and then get away with single tests covering larger things. Few tests more large things and more tests for small things seems more sustainable than some other approaches at least.

And like booleans are mean.

The post has ended, but here's the testing framework and some stuff:

local files = {}
function traced()
  local n = #files + 1
  local source = debug.getinfo(2).source
  files[source] = n
  files[n] = source
end

web.html([[<style>
  .coveredgood { background-color: #d4fcbc; color: #000000; }
  .coveredbad { background-color: #ffbbbb; color: #555555; }
</style>]])

local good, bad = ' class="coveredgood"', ' class="coveredbad"'

local currentrun
local currenttest

local function applygood(all, res)
  for f, lines in pairs(res) do
    local allf = all[f] or {}
    for l, _ in pairs(lines) do
      if not allf[l] then allf[l] = good end
    end
    all[f] = allf
  end
end

local function applybad(all, res)
  for f, lines in pairs(res) do
    local allf = all[f] or {}
    for l, _ in pairs(lines) do allf[l] = bad end
    all[f] = allf
  end
end

local function trace(event, line)
  if not currenttest then return end
  local file = files[debug.getinfo(2).source]
  if not file then return end
  lastline, lastfile = line, file
  local lines = currenttest[file] or {}
  lines[line] = true
  currenttest[file] = lines
end


local function linehtml(str, class)
  return "  <span" .. (class or "") .. ">" .. str .. "</span><br>"
end

local function blank(s)
  return not not s:match("^%s*$")
end

local function endline(s)
  return not not s:match("^%s*end%s*$")
end

local function elseline(s)
  return not not s:match("^%s*else%s*$")
end

local function reportcoverage(t)
  local res = { "<details><summary>Code coverage report</summary><pre><code>" }
  for n, v in pairs(t) do
    res[#res + 1] = "file " .. n .. ":\n"
    local prev = false
    local i = 0
    for line in files[n]:gmatch("[^\n]*") do
      i = i + 1
      local covered =
        v[i]
        or ((blank(line) or endline(line)) and prev)
        or (elseline(line) and v[i + 1])
      prev = covered
      res[#res + 1] = linehtml(line, covered)
    end
  end
  res[#res + 1] = "</code></pre></details>"
  web.html(table.concat(res))
end

local escapechar = {
  ["<"] = "&lt;",
  [">"] = "&gt;",
  ['"'] = "&quot;",
  ["'"] =  "&apos;",
  ["&"] = "&amp;"
}

local function escape(str)
  local res, _ = (str or ""):gsub("[<>\"'&]", escapechar)
  return res
end

function runtests(...)
  currentrun = {}
  debug.sethook(trace, "l")

  local str = { '<p>' }
  local passes, failures = 0, 0

  for i, f in ipairs({...}) do
    lastline, lastfile = nil, nil
    currenttest = {}
    local success, res = pcall(f)
    local coverage = currenttest
    currenttest = nil
    if success then
      passes = passes + 1
      applygood(currentrun, coverage)
    else
      failures = failures + 1
      table.insert(str, 'test ' .. i .. ' failed: ')
      table.insert(str, escape(tostring(res)) .. '<br>')
      applybad(currentrun, coverage)
    end
  end
  table.insert(str, '<span style="background-color:')
  if failures == 0 then
    table.insert(str, 'green;">this is a green bar ')
  else
    table.insert(str, 'red;">this is a red bar ')
  end
  table.insert(str,  '(' .. passes .. ' tests passed. ')
  table.insert(str,  failures .. ' tests failed)</span></p>')
  web.html(table.concat(str))
  reportcoverage(currentrun)
end

function assertEqual(e, a)
  if e ~= a then
    error("expected: " .. tostring(e).. ", actual: " .. tostring(a))
  end
end

function assertError(f)
  local success, res = pcall(f)
  if success then error("expected: error, got: success") end
  return res
end

Optional = {}
function Optional.of(v)
  return {
    value = v,
    orError = function() return v end,
    map = function(f) return Optional.of(f(v)) end
  }
end

function Optional.empty(m)
  local self
  self = {
    orError = function() error("no such element") end,
    map = function(f) return self end
  }
  return self
end

Result = {}
function Result.good(v)
  return {
    value = v,
    orError = function() return v end,
    map = function(f) return Result.good(f(v)) end
  }
end

function Result.bad(m)
  local self
  self = {
    message = m,
    orError = function() error(m) end,
    map = function(f) return self end
  }
  return self
end

function trim(s)
  return s:match("^%s*(.-)%s*$")
end