Sunday, April 12, 2020

bit string permutations with sed(1)


Consider an alphabet with symbols {0,1}, how can we cycle through all permutations? How might we do it with regular expressions?

An example, for w = 2:

1. (00) = (00)
2. (01) = (01), (10)
3. (10) = (01), (10)
4. (11) = (11)

Discarding the simple cases of 1 and 4, we can start to see that we need to also think about the order of the generation for a given starting point. HAKMEM #175 produces these in lexicographic order by due to the cascading effect of addition, but strings are generally read left to right. So, a reflected ordering is perhaps the most straightforward approach here:

sed -n ':p; p; s/^\(1*\)\(0*\)01/\2\110/; tp;'

which produces, in reverse colexicographic (!) order, the permutations of the string. For w = 4, and (0011):

$ echo 0011 | sed -n ':p; p; s/^\(1*\)\(0*\)01/\2\110/; tp;'
0011
0101
1001
0110
1010
1100
$

Note we can rev(1) and tac(1) to put it into lexicographic ordering. However, if we instead choose to invert the symbol replacement in sed(1), we can do reflected ordering at which point it's enough to pipe through rev(1).

01 ⇒ 10 transformations, reverse colexicographic order


$ (echo rev-colex lex colex rev-lex; echo sed tac-rev tac rev
> (echo 0011 |
> sed -n ':p; p; s/^\(1*\)\(0*\)01/\2\110/; tp;' |
> tee >(rev) >(tac) >(tac|rev) | pr -t4)) | column -t
rev-colex  lex      colex  rev-lex
sed        tac-rev  tac    rev
0011       0011     1100   1100
0101       0101     1010   1010
1001       0110     0110   1001
0110       1001     1001   0110
1010       1010     0101   0101
1100       1100     0011   0011
$

10 ⇒ 01 transformations, colexicographic order


$ (echo colex rev-lex rev-colex lex; echo sed tac-rev tac rev
> (echo 1100 |
> sed -n ':p; p; s/^\(0*\)\(1*\)10/\2\101/; tp;' |
> tee >(rev) >(tac) >(tac|rev) | pr -t4)) | column -t
colex  rev-lex  rev-colex  lex
sed    tac-rev  tac        rev
1100   1100     0011       0011
1010   1010     0101       0101
0110   1001     1001       0110
1001   0110     0110       1001
0101   0101     1010       1010
0011   0011     1100       1100
$

Stay safe!

No comments: