Friday, December 24, 2010

Sort of introduction to J.

Firstly, what is J? Yeah, it's a letter in English alphabet. But other than that, J is a programming language. As I suppose, this is a language for lazy mathematicians—so hopelessly lazy that they don't want to type long function names. Thus it's VERY concise language.

Let's consider just one example. Suppose we wanna investigate the sequence of first digits of powers of 2. (1,2,4,8,16,32,64,128,256,...) The question is: what's the frequency of the digit 7? (You can find via search engines several versions of this problem: this, for example.)
Using J, we can find (well, estimate) the answer in one line:
(+%~(#&I.&(7=<.&(10^1|(10^.2)&*&i.)))) 1000000
0.05799
The exact answer is
.
That follows from the equidistribution theorem.

So, what does this mess of symbols mean?


Actually, programs in J are way easier to write than to read. So let's start from the beginning. To start, download the interpreter here (there's versions for Windows, Linux, Mac and even PocketPC).

The first question is, how do we determine the first digit of an arbitrary positive number? The answer is given by the formula

I hope, this is not too hard to understand: since the integer part of the logarithm (with base 10, of course) has nothing to with digits of the number, we deal with the fractional part only.

How to code that in J? We need some standard functions, of course:
x ^. y is base x logarithm of y ;
x | y is y mod x; fractional part of y is thus 1 | y ;
x ^ y is x to the power y;
<. x is the floor of x.

So let's write the function firstDigitOf. 
firstDigitOf =. (<.) & (10&^) & (1&|) & (10&^.)
(Notice that in J a function can't have more than 2 arguments.)

How does that work?
If you write & between two functions you get their composition. And if you write & between the binary function name and one of it's arguments you get unary function: this argument is fixed. Therefore:
(10&^.) is base 10 logarithm;
(1 & |) returns the fractional part;
(10&^) returns 10 to the given power.
Thus, the argument of firstDigitOf goes through the chain of all these functions and also the floor function.

Now, what about 2n? Trying to get its first digit via the composition of firstDigitOf and (2&^) is not very smart since 21024 gives floating-point overflow. So it's better to rewrite part of firstDigitOf function:
firstDigitOf2N =. (<.) & (10&^) & (1&|) & ((10^.2)&*)
(I hope now you are able to understand what's going on here.)
The function testing if 2n starts with 7 looks like that:
(7 = firstDigitOf2N)
Now we need some more J standard functions:
i. x returns list 0 1 2 ... (x-1);
I. x returns indices of 1 in boolean list x;
# x returns the length of x.
The function (7 = firstDigitOf2N) being applied to (i. M) returns a boolean list (from 0s and 1s) of the length M. Then we can count the number of powers of 2 starting with seven as follows:
# I. (7 = firstDigitOf2N) i. 1000000
57990
(The parentheses around i. 1000000 are not needed because J interpreter reads the expression from right to left.)

And the last question is, how to calculate the frequency not typing 1000000 twice?
You can, of course, do that using the variable:
freq7 =. monad : '(# I. (7 = firstDigitOf2N) i. y) % y'
But that's too boring... Let's use another J facility with respect to composing functions: if you write (F G H) x, J treats that as (F x) G (H x). Looking at the definition of freq7, we immediately see that G is % (division), and F is (# & I. & (7 = firstDigitOf2N) & i.) (Notice the ampersands: once we get rid of y, it's no longer a valid expression—to create a function we need to place ampersands where compositions take place.) So, what is H? It's simply + . Really, what (+100500) returns is just 100500. Thus we've got
freq7 =. (# & I. & (7 = firstDigitOf2N) & i.) % +
And it indeed works:
   freq7 1000000
0.05799

No comments:

Post a Comment