Tuesday, August 31, 2010

16 bit xorshift rng (now with more period)

So, in Marsaglia's paper, the basic xorshift rng the code is:
yˆ=(y<<a); y^=(y>>b); return (yˆ=(y<<c));

I looked at this in my last post, and had a brief look at the (a,b,c) triplets to use. However, this only yields a period length of the word size used. In my case, that's 2^16-1 -- not much. However, further on in the paper, section 3.1, a method is described for longer periods through additional state and promotion:
t=(xˆ(x<<a)); x=y; return y=(yˆ(y>>c))ˆ(tˆ(t>>b));

This is listed as 2^64-1 using 32-bit words, but there's no reason we can't use 16-bit words. This reduces the period to 2^32-1, and requires new triplets. I scanned (1,1,1) through till (15,15,15), here's the list:


Values marked with an asterisk score well on the majority of diehard tests. It's probably not a good idea to use the other values, even though they do have a complete period. Notably, all values fail binary matrices tests for 31x31 and 32x32 - I think this is expected, and mentioned in Marsaglia2003.

So, the routine in C:
uint16_t rnd_xorshift_32() {
static uint16_t x=1,y=1;
uint16_t t=(x^(x<<5));
return y=(y^(y>>1))^(t^(t>>3));

and in assembly:
; returns a 16-bit value >0, has a 2^32-1 period
push cx
push dx
db 0b8h ; t=(x^(x shl a))
random_x: dw 1
mov dx,ax
mov cl,5
shl dx,cl
xor ax,dx
mov dx,ax ; y=(y^(y shr c))^(t^(t shr b))
mov cl,3 ; ^^^^^^^^^^^
shr dx,cl
xor ax,dx
push ax ; save t^(t shr b)
db 0b8h
random_y: dw 1
mov cs:[random_x],ax ; x=y
mov dx,ax ; y=(y^(y shr c))^(t^(t shr b))
shr dx,1 ; ^^^^^^^^^^^
xor ax,dx
pop dx
xor ax,dx
mov cs:[random_y],ax
pop dx
pop cx


Ben Jamin' said...

Which triplet passes the most tests? (5,3,1)?

Brad Forschinger said...

From memory the asterisked triplets performed equally well.

Russell Borogove said...

Thanks for this - I was looking for a suitable RNG to run on an AVR microcontroller (8-bit, but the register file is generous enough that the compiler can handle wider operations reasonably) and 64-bit xorshift seemed overkill.

Brad Forschinger said...

To avoid register churn you can use 8-bit sizes and the form mentioned in 3.1:

static uint8_t Q[4]={1,1,1,1};
uint8_t g(unsigned int i, unsigned int j, unsigned int k, unsigned int l)
uint8_t t;
t = (Q[0] ^ (Q[0] << i)) ^
(Q[1] ^ (Q[1] >> j)) ^
(Q[2] ^ (Q[2] << k)) ^
(Q[3] ^ (Q[3] << l));
Q[0] = Q[1]; Q[1] = Q[2]; Q[2] = Q[3]; Q[3] = t;
return t;

where (i,j,k,l) is one of: