Sunday, April 28, 2019

CMWCs in mod w+1

While re-reading TAoCP Volume 2, there's a curious passage during the discussion of mod +/- 1 LCGs regarding the use of x_{n}=a*x_{n-1}+c, which states: "For reasons to be explained later, we will generally want c=0 when m=w+1, so we merely need to compute (aX) mod (w+1)". After a few hours I can understand why. For LCGs it's not so dire, because we can treat 0 simply as w+1, because no operation will result in 0. I touched on this topic a few years ago at [1] for LCGs (mod w+1), and [2] for MWC and CMWCs (mod w-1).

For the case of CMWCs, it's beneficial for us to use use w-1. As noted by Marsaglia, this is because we can have w-1 as a primitive root. Briefly, with g as w-1 and n as a*g+1, g^{phi(n)/f} != 1 for all factors of phi(n). This is made easier when n is prime, because phi(n)=n-1. When n is composite, more work is required for establising the order: we need to find the least common multiple across the order of all the factors.

With w-1=65535, let's consider an a of 65518 (as used in [1]), and then arbitrarily its neighbour 65517.

# 65518*65535+1
65535^{4293722130/2} mod 4293722131 == 4293722130
65535^{4293722130/3} mod 4293722131 == 1070428133
65535^{4293722130/5} mod 4293722131 == 3696002153
65535^{4293722130/17} mod 4293722131 == 2477700139
65535^{4293722130/41} mod 4293722131 == 2876866897
65535^{4293722130/47} mod 4293722131 == 1432659653
65535^{4293722130/257} mod 4293722131 == 1579676435


... which indeed has a period of 4293722130, and we needn't look further because none of the results are 1. However, for the composite 65517*65535+1 we need to perform an exhaustive search.

# 65517*65535+1
65535^{1036399840/2} mod 4293656596 == 1
65535^{1036399840/5} mod 4293656596 == 1
65535^{1036399840/7} mod 4293656596 == 3849485225
65535^{1036399840/19} mod 4293656596 == 318890497
65535^{1036399840/113} mod 4293656596 == 3330338425
65535^{1036399840/431} mod 4293656596 == 2842235365


Walking the generator gives the period `12954998`. Now, to prove it we need to do something like:

n = 4293656596
# 4293656596: 2 2 29 37014281
# ...
# 1:
# 1:
# 28: 2 2 7
# 37014280: 2 2 2 5 19 113 431

# product of 28 and 37014280's factors
phi_n = 1036399840
for f in [2,5,7,19,113,431]:
  print '# factor',f
  for x in xrange(1,phi_n/f):
    if pow(65535,f*x,n) == 1:
    print(f, x)
    break


# (2, 6477499)
# (5, 12954998)
# (7, 1850714)
# (19, 681842)
# (113, 114646)
# (431, 30058)

and then the least common multiple of the orders:

>>> reduce(lcm,(6477499,12954998,1850714,681842,114646,30058))
12954998

Bingo. Now, returning to Knuth's quote.  There is indeed some more work to be done with w+1 when we consider c != 0, but not a lot more than handling w-1.  The catch with using w+1, is that we'll always lose at least a factor of 2 for our order because of word sizes being 2^n.  So, adjusting for w+1 doesn't actually give us a CMWC of longer period than MWC, but CMWCs do have other nice properties such as always being a multiple of the base for instance.  While CMWCs in base 2^n will at least lose a factor of 2^4, 2^n+1 can optimally lose 2^1.

With 2^n+1=65537 here, we can use the following code to generate the expansion 65514*65537+1 with a period of `2146795509`.  We handle the case of a real zero by stepping to the next state, ensuring the 0=256 mapping.

rand    proc near
        push dx
        mov dx,65514
        db 0xb8                 ; mov ax,imm16
x       dw 1
        mul dx
        db 0x05                 ; add ax,imm16
c       dw 1
        adc dx,0

        ; handle wrap
        sub ax,dx
        ja done
        jnz wrap

        ; skip an actual 0
        xchg ax,dx
        mov dx,65514
        sub ax,dx
wrap:   inc ax
        dec dx

done:   neg ax                  ; (b-1-x) == 0-x == neg ax
        mov cs:[x],ax
        mov cs:[c],dx
        pop dx
        ret
rand    end

I wrote it for density, not performance. Modern processors will invalidate their I$ if modified, resulting in a performance hit.  Changing the self-modifying code to point x and c into a data section is left as an exercise for the reader.

[1] - https://b2d-f9r.blogspot.com/2010/10/lcg-mod-65537.html
[2] - https://b2d-f9r.blogspot.com/2010/11/multiply-with-carry-generator-in-16-bit.html

No comments: