Tags

Project Euler #3 looks for the largest prime factor of 400 billion and something. It is a straightforward problem, but it can be a study in efficiency. The trouble is that the given number is so large that the usual brute force method of counting up from 1 and checking every number would take WAY too long. Like it would still be running, and I wrote it weeks ago. Here’s the problem, as posited by projecteuler.net:

`The prime factors of 13195 are 5, 7, 13 and 29.`

`What is the largest prime factor of the number 600851475143 ?`

My first thought was to simply start at 3, count up by 2’s (thus eliminating wasted time spent testing even numbers, which obviously aren’t prime), and test every odd number to see if it is a) prime, and b) a factor of 400 billion whatever. This method actually finds the answer very quickly, but it doesn’t KNOW it has found the correct answer until it tests a bajillion more numbers to prove that they AREN’T prime factors.

So I had to rethink my strategy. And what I came up with was to work backward, using reciprocals. Like this:

1. Start a loop as described above, starting at 3 and counting up by 2’s (a note on where you need to count up to in a moment).
2. On each loop iteration, check the current (odd) number ONLY to see if it’s a factor of 600 bizizzle. If it isn’t, move on.
3. If it is, then we have actually found a really LARGE factor, in the reciprocal. i.e., 600,851,475,143 / 71 = 8,462,696,833. So, by counting up from 3, we quickly find that 71 is a factor; but we also have found that 8,462,696,833 is a factor, too. And not just any factor, but the largest factor.
4. Next, check that reciprocal to see if it happens to be prime. If not, keep going.
5. If it is, then stop and deliver the answer. Because with this method, the very first prime that we come across will necessarily be the largest prime factor.

In other words, this algorithm identifies factors of 600 billionzy, in decreasing order, starting with the largest. My implementation printed them as it went, so the output (spoiler alert) looks like this:

Thing is, this still isn’t the most efficient method. Because in my zeal for working backwards, I completely discarded the smaller factors that I found. (i.e., I tested 8,462,696,833, but ignored 71.)

The unconscious assumption was that the largest prime factor would be among the largest 50% of all factors. But it isn’t.

Another rewrite follows the same algorithm as above, except instead of only looking at the large factors and assuming the first prime it finds will be the largest, it tests the smaller factors on the way up, as well, and keeps track of the largest prime it has found at all times. So when it finds 71, and determines its reciprocal of 8,462,696,833, it tests both for primality before moving on to the next pair.

That change took the execution time from ~1 second to 0.007 seconds.

Now, regarding the loop controls. A big question I had when I started was “how far up do I have to count before I know I’ve found all the factors?” Because 400 billion is big. And I had essentially the same question for the boolean function isPrime() (which checks for primality): “how far up do I have to count before I know the number is prime?”

At first, I had isPrime(x) testing every number up to x/2 looking for factors. (ie, if I called isPrime(17), my original implementation tried to divide {1, 2, 3, … 8} into 17 before it gave up and determined that 17 is prime.

But what I figured out was that you only need to check up to the square root of the number – again, because of reciprocals.

For example, 100 is divisible by 2, 4, 5, 10, 20, 25 and 50. Notice anything? Look at those same numbers like this:

```{2,     4,     5,    10}
{50,   25,    20,    10}```

If I Divide 100 by 2, I get 50. So I know that it’s divisible not just by 2, but by BOTH of those numbers. Similarly, testing 4 yields both 4 and 25 as factors, testing 5 yields both 5 and 20 as factors, and testing 10 reveals 10 as the square root. In terms of our isPrime() algorithm, what that means is that once I test a given number up to its square root, I have effectively tested everything after the square root, as well. Because reciprocals. Make sense?

In any case, that applies to the main function, as well – once you’ve reached the square root, you’ve identified all the factors.

So I started with a perfectly correct solution that would have taken just over 13 days to complete (really – I checked), then found another that takes just under 1 second, and finally arrived at one that takes 0.007 seconds. All valid solutions, but what a difference in efficiency.

Here’s the final code in C++:

```#include <iostream>
#include <math.h>

using namespace std;

bool isPrime(long long x){
if ((x == 2) || (x == 3)){return true;}
if ((x == 1) || (x%2 == 0)) {return false;}
for (long long i = 3; i <= sqrt(x); i+=2){
if (x%i == 0){return false;}
}
return true;
}

int main()
{
long long number = 600851475143;
long long largestFactor = 1;

for (long long i = 3; i < sqrt(number); i+=2){
if (number%i == 0){
cout << endl << "Factor: " << i;
if (isPrime(i)){
cout << " ---------- Prime";
if (i > largestFactor) {largestFactor = i;}
}
cout << endl << "Factor: " << number/i;
if (isPrime(number/i)){
cout << " ---------- Prime";
if (number/i > largestFactor) {largestFactor = number/i;}
}
}
}

cout << endl << endl << "The largest prime factor of 600,851,475,143 is " << largestFactor << "." << endl << endl;
return 0;
}
```

And here it is in Python:

```import math

def isPrime(x):
if (x is 2) or (x is 3):
return True
if (x is 1) or (x%2 is 0):
return False
for i in range(3,int(math.sqrt(x)+1),2):
if (x%i is 0):
return False
return True

number = 600851475143
largestFactor = 1

for i in range(3,int(math.sqrt(number)),2):
if (number%i is 0):
print "Factor: " + str(i)
if isPrime(i):
print "        ^^^^ Prime"
if (i > largestFactor):
largestFactor = i
print "Factor: " + str(number/i)
if (isPrime(number/i)):
largestFactor = number/i
print "        ^^^^ Prime"

print "The largest prime factor of 600,851,475,143 is " + str(largestFactor) + "."
```