Hash constructors are a convenient way to default hash values.
But when you are defaulting hash values to complex objects you might produce unexpected results and hard to find bugs.
Let’s say we have a structure of counters, which we want to increment:
1 2 3 4 5 6 |
|
This way of defaulting the hash has the drawback that the incrementer needs to know about which keys are available, else we’d get a NilError
.
A hash with default values is created via Hash::new.
1 2 3 |
|
Pretty straight forward, right? This works because we told Hash to return the object given as constructor argument whenever a key is not found (the default is nil
).
Hash Constructors with complex objects
Now let’s tackle another example:
1 2 3 4 5 6 |
|
Blech! Hash::new
to the rescue:
1 2 |
|
Neat! But wait:
1 2 |
|
What?? Where did our values go? Weird, we now have a hash with “hidden” values! Even more weirdness:
1 2 3 |
|
The symptoms are obvious: the key :post_requests
should give a default value of []
, but it returns :get_requests
’s value!
To find out why the assignments are “leaking” into other hash values, let’s rewrite the erroneous code by unrolling the constructor call and the array access:
1 2 3 4 5 6 |
|
Can you now see what actually happened? We passed a reference to []
as the default object for the hash constructor. In Ruby, Array
is a complex datatype and is not passed by value, but by reference.
This way we always modify the default value, because Hash is always returning a reference to it when we access an unknown key. We actually never really modified the value for the key :get_requests
!
Proof:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Hash Constructor Blocks
The solution is the block syntax of the Hash constructor. From the ruby core documentation:
1 2 |
|
The first statement alone is not sufficient in our case:
1 2 3 4 5 6 |
|
This made things even worse! We’re losing the values because again what looks like an array-push/assignment is only a read access - and thus
the value won’t get stored (instead, the returned default []
will be modified and then garbage collected!).
The second part of the documentation is very important for complex object default values: we need to store the value in the hash, so that on the next call (and on an immediate assignment!) the default value won’t be returned:
1 2 3 4 5 6 7 8 9 10 |
|
Unresolved Weirdness
Now everytime an unknown hash key is read, the block is evaluated. Note that we not only return a new hash, but we also assign the hash on read. This has the following side effect:
1 2 3 4 5 6 7 8 9 |
|
This might not seem like a problem, since implicitly every hash key of counters
has the value {}
. But it’s
confusing that counters gets modified on read and you definitely wouldn’t expect measurements.keys
returning
different values before and after the read access!
Conclusion
- flat Hash constructors with atomic default values are elegant and easy to use
- always use assignment-block syntax for hash constructors with data types that are passed by reference
- hash constructors with block syntax have side effects
The errors stemming from a seemingly simple use of Hash.new([])
were really hard to find, as always is the case with unexpected behavior or messed up
object references.
This constructor call looks so harmless that I promised myself to steer clear of hash constructors in the future - except for number values!
The correct implementation of supplying a block does have it’s own problems, which might bite anyone who does read access on a defaulting hash and then querying Hash.keys()
.