Next: , Previous: Number processing Macros, Up: Programming in M4sugar


8.3.9 Set manipulation in M4

Sometimes, it is necessary to track a set of data, where the order does not matter and where there are no duplicates in the set. The following macros facilitate set manipulations. Each set is an opaque object, which can only be accessed via these basic operations. The underlying implementation guarantees linear scaling for set creation, which is more efficient than using the quadratic m4_append_uniq. Both set names and values can be arbitrary strings, except for unbalanced quotes. This implementation ties up memory for removed elements until the next operation that must traverse all the elements of a set; and although that may slow down some operations until the memory for removed elements is pruned, it still guarantees linear performance.

— Macro: m4_set_add (set, value, [if-uniq], [if-dup])

Adds the string value as a member of set set. Expand if-uniq if the element was added, or if-dup if it was previously in the set. Operates in amortized constant time, so that set creation scales linearly.

— Macro: m4_set_add_all (set, value...)

Adds each value to the set set. This is slightly more efficient than repeatedly invoking m4_set_add.

— Macro: m4_set_contains (set, value, [if-present], [if-absent])

Expands if-present if the string value is a member of set, otherwise if-absent.

          m4_set_contains([a], [1], [yes], [no])
          =>no
          m4_set_add([a], [1], [added], [dup])
          =>added
          m4_set_add([a], [1], [added], [dup])
          =>dup
          m4_set_contains([a], [1], [yes], [no])
          =>yes
          m4_set_remove([a], [1], [removed], [missing])
          =>removed
          m4_set_contains([a], [1], [yes], [no])
          =>no
          m4_set_remove([a], [1], [removed], [missing])
          =>missing
     
— Macro: m4_set_contents (set, [sep])
— Macro: m4_set_dump (set, [sep])

Expands to a single string consisting of all the members of the set set, each separated by sep, which is not expanded. m4_set_contents leaves the elements in set but reclaims any memory occupied by removed elements, while m4_set_dump is a faster one-shot action that also deletes the set. No provision is made for disambiguating members that contain a non-empty sep as a substring; use m4_set_empty to distinguish between an empty set and the set containing only the empty string. The order of the output is unspecified; in the current implementation, part of the speed of m4_set_dump results from using a different output order than m4_set_contents. These macros scale linearly in the size of the set before memory pruning, and m4_set_contents([set], [sep]) is faster than m4_joinall([sep]m4_set_listc([set])).

          m4_set_add_all([a], [1], [2], [3])
          =>
          m4_set_contents([a], [-])
          =>1-2-3
          m4_joinall([-]m4_set_listc([a]))
          =>1-2-3
          m4_set_dump([a], [-])
          =>3-2-1
          m4_set_contents([a])
          =>
          m4_set_add([a], [])
          =>
          m4_set_contents([a], [-])
          =>
     
— Macro: m4_set_delete (set)

Delete all elements and memory associated with set. This is linear in the set size, and faster than removing one element at a time.

— Macro: m4_set_difference (seta, setb)
— Macro: m4_set_intersection (seta, setb)
— Macro: m4_set_union (seta, setb)

Compute the relation between seta and setb, and output the result as a list of quoted arguments without duplicates and with a leading comma. Set difference selects the elements in seta but not setb, intersection selects only elements in both sets, and union selects elements in either set. These actions are linear in the sum of the set sizes. The leading comma is necessary to distinguish between no elements and the empty string as the only element.

          m4_set_add_all([a], [1], [2], [3])
          =>
          m4_set_add_all([b], [3], [], [4])
          =>
          m4_set_difference([a], [b])
          =>,1,2
          m4_set_difference([b], [a])
          =>,,4
          m4_set_intersection([a], [b])
          =>,3
          m4_set_union([a], [b])
          =>,1,2,3,,4
     
— Macro: m4_set_empty (set, [if-empty], [if-elements])

Expand if-empty if the set set has no elements, otherwise expand if-elements. This macro operates in constant time. Using this macro can help disambiguate output from m4_set_contents or m4_set_list.

— Macro: m4_set_foreach (set, variable, action)

For each element in the set set, expand action with the macro variable defined as the set element. Behavior is unspecified if action recursively lists the contents of set (although listing other sets is acceptable), or if it modifies the set in any way other than removing the element currently contained in variable. This macro is faster than the corresponding m4_foreach([variable], m4_indir([m4_dquote]m4_set_listc([set])), [action]).

          m4_set_add_all([a]m4_for([i], [1], [5], [], [,i]))
          =>
          m4_set_contents([a])
          =>12345
          m4_set_foreach([a], [i],
            [m4_if(m4_eval(i&1), [0], [m4_set_remove([a], i, [i])])])
          =>24
          m4_set_contents([a])
          =>135
     
— Macro: m4_set_list (set)
— Macro: m4_set_listc (set)

Produce a list of arguments, where each argument is a quoted element from the set set. The variant m4_set_listc is unambiguous, by adding a leading comma if there are any set elements, whereas the variant m4_set_list cannot distinguish between an empty set and a set containing only the empty string. These can be directly used in macros that take multiple arguments, such as m4_join or m4_set_add_all, or wrapped by m4_dquote for macros that take a quoted list, such as m4_map or m4_foreach. Any memory occupied by removed elements is reclaimed during these macros.

          m4_set_add_all([a], [1], [2], [3])
          =>
          m4_set_list([a])
          =>1,2,3
          m4_set_list([b])
          =>
          m4_set_listc([b])
          =>
          m4_count(m4_set_list([b]))
          =>1
          m4_set_empty([b], [0], [m4_count(m4_set_list([b]))])
          =>0
          m4_set_add([b], [])
          =>
          m4_set_list([b])
          =>
          m4_set_listc([b])
          =>,
          m4_count(m4_set_list([b]))
          =>1
          m4_set_empty([b], [0], [m4_count(m4_set_list([b]))])
          =>1
     
— Macro: m4_set_remove (set, value, [if-present], [if-absent])

If value is an element in the set set, then remove it and expand if-present. Otherwise expand if-absent. This macro operates in constant time so that multiple removals will scale linearly rather than quadratically; but when used outside of m4_set_foreach, it leaves memory occupied until the set is later compacted by m4_set_contents or m4_set_list. Several other set operations are then less efficient between the time of element removal and subsequent memory compaction, but still maintain their guaranteed scaling performance.

— Macro: m4_set_size (set)

Expand to the size of the set set. This implementation operates in constant time, and is thus more efficient than m4_eval(m4_count(m4_set_listc([set])) - 1).