1. Introduction
The RSA (Rivest–Shamir–Adleman) public-key primitive named after its inventors is the most widely used in cryptography since its introduction in 1977 [
1]. The generation of RSA keys in public key cryptosystems is based on the modulus of a positive integer
where
is a semi-prime, which is a product of two large primes
and
[
2,
3]. The computationally intractable factorization property of semi-primes when they are large has been the fundamental premise in using RSA keys for computer security in several applications [
4,
5,
6]. Modern cryptosystems make wide use of RSA encryption and digital signatures for secure message exchange and communication over different types of networks within government as well as various industry sectors [
7,
8,
9]. These include popular transactions in various applications including Internet of Things (IoT) such as mobile banking, online shopping, smart card payments, e-health and e-mail communications that are available to the common man over the Internet [
10,
11]. Despite difficulty in breaking RSA keys (i.e., semi-prime factorization), cybercrimes and RSA key attacks are still on the rise [
12,
13,
14].
A cryptanalytic attack of a short RSA key by M. J. Wiener was established as the first of its kind in 1990 [
15,
16]. Hence, the difficulty of factoring RSA modulus
by choosing strong prime factors
and
was considered as a solution to address these attacks. Since then, it has become a common practice to employ 512-bit RSA to provide the required strong primes for many cryptographic security protocols [
17]. Even though 512-bit RSA modulus was first factored in 1999, with the high computational power of today, there is still difficulty with 512-bit factorization in real-time applications [
18]. Therefore, 512-bit RSA keys are actively used by popular protocols for email authentication such as Domain Keys Identified Mail (DKIM), retrieving messages from email servers by the client such as Post Office Protocol 3 (POP3), Simple Mail Transfer Protocol (SMTP) and Internet Message Access Protocol (IMAP), data exchange on the web such as Hypertext Transfer Protocol Secure (HTTPS), connecting two computers such as Secure Shell (SSH), and providing privacy such as Pretty Good Privacy (PGP) [
19,
20,
21].
Enthusiastic amateurs have tried to factor 512-bit RSA keys and in 2012, Zachary Harris factored the 512-bit DKIM RSA keys used by Google and several other major companies in 72 h per key, by using distributed cloud services [
22,
23,
24]. The Factoring as a Service (FaaS) project demonstrates that a 512-bit RSA key can be factored reliably within four hours in a public cloud environment [
19]. Factorization of the 768-bit RSA key has also been demonstrated using various methods such as the number field sieve factoring method and is several thousand times more difficult than 512-bit RSA keys. The higher the key size, the more difficult it is to factor the RSA key. Hence, with new methods and progress in computing power over time, there are risks and implications for the future of RSA. This forms the motivation for studying the mathematical principles of semi-prime factorization in proposing a novel method to increase the likelihood of breaking an RSA key.
Euler’s factorization of a semi-prime
is based on the property of
both congruent to 1 mod 4 [
25]. These constructions are based on Pythagorean primes that are applied in RSA [
26,
27,
28]. These contribute to only a quarter of the possible combinations of primes and the rest of the combinations congruent to 3 mod 4 are found based on the Gaussian prime extension method [
29]. The semi-prime has only two sums of two squares in the range of possible squares from
, and therefore, we extend previously established methods to increase the likelihood that a sum of two squares can be found [
30,
31]. Even though many sums of squares exist, once any two of them are found, we show that it is sufficient to only find two solutions to factorize the original semi-prime. Our enhanced factorization method is an improvement of our previous work. We apply our method for case scenarios and provide the necessary conjectures. Our algorithm is practically simple and is implemented using rudimentary arithmetic operations that require minimal computational memory with search cycle time being a factor to be considered for very large semi-primes. Further, we demonstrate the successful breaking of RSA keys such as 768-bit RSA verified through the implementation of our algorithm in Java. We provide the results highlighting the complexity of our enhanced factorization algorithm and comparing the performance with other factorization algorithms laying the scope for future research as well.
The rest of the paper is organized as follows.
Section 2 discusses related work and the uniqueness of our work.
Section 3 provides our enhanced semi-prime factorization by extending the sum of squares method and specifies our algorithm with implementation steps. In addition, the performance of the algorithm, its complexity and comparison with other factoring methods are described.
Section 4 demonstrates the correctness of our algorithm when applied to break the 768-bit RSA key.
Section 5 discusses the results and finally conclusions along with future research directions are given in
Section 6.
2. Background
The difficulty of the semi-prime factorization problem forms the essential aspect to the security of an RSA cryptosystem. Revisiting RSA cryptography [
1,
3,
17], assuming that
and
are two primes used to generate a semi-prime
, the Euler’s totient function
is given by
In RSA based public key cryptography, two different keys known as public and private keys are used to perform the encryption and decryption of data or messages [
32,
33]. Any sensitive information is encrypted with a public key and it requires a matching private key to decrypt it. The public key
is chosen arbitrarily as a pair
where
is an integer and not a factor of
and
. The private key
is based on tuples
, such that
and
is determined using the extended Euclidean algorithm
. A public key
is used to encrypt a message
, into a cipher text
, such that
To retrieve the original message, the corresponding private key
is used to decrypt the encrypted message such that
RSA cryptosystems make use of public key and private key generation techniques for security of data and end-to-end security of information transmission that cannot be understood by anyone except the intended recipients. These techniques are employed to authenticate the sender and the receiver of a message and to ensure that the integrity of the data or message received without being tampered with. However, the problem of determining from is equivalent to finding factors of RSA modulus . Hence, choosing strong primes for and is very important such that the factorization of the semi-prime , becomes computationally infeasible and nontrivial for an adversary. In practical applications, a smaller private key may be used for a faster decryption algorithm to improve the computational speed of online transactions. Once the private key is found, it can result in a total breaking of RSA posing a great security risk. Hence, an enhanced prime factorization method to attack the small decryption exponent could pose serious security challenges to RSA cryptosystems that are widely adopted even today.
Several studies have attempted to perform a general survey of attacks on the RSA cryptosystem since its introduction [
16,
19,
34]. In 1990, Wiener introduced the first RSA cryptanalytic attack showing that if the decryption exponent is small with an upper bound given by
using the continued fractions method [
35]. Subsequently, Boneh and Durfee [
36] proposed an attack on short decryption exponents with an improved upper bound while the RSA attack by Blomer and May [
37] had demonstrated an upper bound of
with lattices of smaller dimension. Coppersmith’s technique used lattice reduction approach for finding small solutions of modular bivariate integer polynomial equations [
38].
Most of the recent research has also been focusing on extending the number range upper bound of
and
in the RSA private and public keys by working on the limitation of the Wiener and Coppersmith methods by approaching the problem differently. One recent work [
39] considers the prime sum
using sublattice reduction techniques and Coppersmith’s methods for finding small roots of modular polynomial equations, achieving slight improvement in the upper bound with reduced lattice dimension. Another work [
17] uses a small prime difference
method which is then developed into a continuous fraction as per Wiener’s original method. While these extend the range of Boneh’s original limit, the Lenstra–Lenstra–Lovász (LLL) method is still seen as the state of the art which was first proposed by Lenstra and Lenstra in 1991 [
40].
A survey on the history of number theory reveals that it has been explored widely by mathematicians to establish different representations of integers, in particular prime numbers, with a view to arrive at more efficient methods in deriving them [
41,
42,
43,
44,
45,
46,
47]. Up until about a decade ago, there had been a strong mathematical interest in polynomials which generated sums of squares. However, recently there is a reviving interest in their application to practical experimentations for establishing their rudimentary computations and implications to cryptography [
29,
48,
49,
50,
51,
52]. In line with these approaches, it has been proved in our previous work that the semi-primes can be represented as the sum of four squares [
30]. A new factorization method as a faster alternative to Euler’s method [
25] was proposed by establishing the relationship among the four squares. Our interest is to apply this method in a novel way to the semi-prime factorization problem and is part of our ongoing research. This paper aims to explore further, on new findings of the method that once one sum of the squares is known, this can be used to find the other.
3. Proposed Method
We consider the basis of an earlier work [
25,
42] that showed a semi-prime
, constructed from two primes,
and
is also congruent to
. Further, in [
30], it was established that a sum of four squares, could be reduced to two sums of two squares using the Brahmagupta–Fibonacci identity given in [
49]. We note that Gaussian primes are of the form
, and cannot be represented as the sum of two squares [
29]. On the other hand, Pythagorean primes are of the form
[
26,
27,
28,
53]. In accordance with Fermat’s Christmas theorem, an odd prime
can be represented as the sum of two squares of integers
and
if and only if
[
42,
54]. This property was useful to determine which numbers could be represented as the sum of two squares, which was later proved by Euler [
25].
In an earlier work [
30], it was proved that if a semi-prime is constructed using two Pythagorean primes of the form
then Euler’s factorization can be used to find two representations as the sum of two squares. Finding these two representations is non-trivial and computationally intensive for large numbers even with high performance computers. We make use of a previously established property that all Pythagorean triples can be represented as
[
28]. This equation provides a computationally simpler search using increments of
and fine convergence using
. In this paper, we extend our related works that were reported previously to show that once one sum of the squares is known, it can be used to find the other. We provide our proposed method taking a step by step approach. First, we apply our method to semi-prime factorization of two simple case examples and arrive at a conjecture. Following this, we apply our proposed method to two more large semi-primes as case examples. Finally, we demonstrate in the next section, the breaking of 768-bit RSA using our proposed method as the final result achieved.
Case Example 1: Consider the semi-prime
The semi-prime 65 consists of two primes
By applying Brahmagupta identity [
49],
Case Example 2: Consider the semi-prime
The semi-prime 2501 consists of two primes
By applying Brahmagupta identity [
49],
From Case Example 1 and Case Example 2, a general equation can be derived as follows:
By using the Brahmagupta identity [
49], we get
For Case Example 1,
For Case Example 2,
Let
Considering Case Example 2, we have,
Conjecture: A composite consisting of unique primes , has sums of squares
Let
The four possible combinations are expanded to produce the 4 sums of 4 squares. Using the Brahmagupta identity [
49] and Equations (1)–(3), we obtain 8 sums of 2 squares.
In accordance with the conjecture, we can arrive at the following:
By employing Equations (4)–(8), when the 8 sums of 2 squares can be derived as follows:
Let us consider which approximates to 403.
The first 3 sums of two squares can easily be found by decrementing from such that a perfect square remains. By applying Equations (1)–(5), we arrive at the following sequence of steps:
In an earlier work [
30], the modified Euler factorization was given by
Consider the solutions (402,31), (401,42)
Consider the solutions (401,42), (399,58)
A factorization of
In this way, by using the modified Euler factorization from previous work [
30], the factors of a compound number can be found. This can be extended to compound numbers where the factorization is not known.
Factorizing a Semi-Prime
With an increase in the application of number theory in information security, it has drawn the attention of researchers to explore the interesting problem of factorizing a semi-prime, which is a positive integer that has two prime factors, and forming . Encryption algorithms such as RSA and public-key cryptosystems rely on special large prime factors of a semi-prime for encoding a sender’s message and decoding it at the receiver end. Since only one of the two prime factors of the semi-prime is known at either end of sender and receiver, even if the semi-prime is revealed, an interceptor is required to know both prime factors and only then the message can be decoded. Hence, with the evolution of information and communication technologies, a fast and efficient method factorization of very large semi-primes forms much interest among mathematicians and information security researchers.
With the current state of knowledge, we apply our proposed method of using the sum of squares (SoS) to factorize a semi-prime and illustrate our algorithm with case examples. In broad terms, our proposed algorithm for factorizing a semi-prime consists of three parts as given below:
Algorithm of N-SoS1 involves performing the following seven key steps:
- Step 1.
Generate a special number () such that
- Step 2.
Multiply × , recalling
- Step 3.
Find × )
- Step 4.
Subtract Integer part of Square root
- Step 5.
Test for Perfect Square
- Step 6.
If perfect square recover N-SoS1 for GOTO P-SoS2
- Step 7.
If NOT perfect square, Increment and GOTO Step 1.
In the above algorithm, we take advantage of
having special attributes which when known allows for the recovery of N-SoS1 using “simple” algebra as given below:
This has special properties to be discussed in a future paper. In short, it generates two numbers
of the form:
Numbers whose squares are one apart as per and enable special factorizing properties to be maintained. Hence, multiplying a semi-prime () to be factored by such a number allows for the recovery of N-SOS1 for . It is conjectured in this paper that by multiplying by , leads to the determination of N-SoS1 more quickly. The algebra takes the form of 8 possible equations as explained below:
We posit that once an SoS for the product of × is found, N-SoS1 for can be recovered using one of 8 equations.
Let × =
The result of these equations is then the greatest common divisor (GCD) with ( × ) and if the GCD result is for any one (or two) of these equations then N-SOS1: = is recovered.
This is the essence of the N-SoS1 algorithm we have proposed in this work.
Once an N-SoS1 for ( × ) becomes known a simple GCD test with a result of determines which of the 8 equations will yield N-SOS1 for . Only a simple division is then required to yield the N-SoS1 solution for . Once N-SoS1 is known, this is then used to find P-SoS2. Further, once N-SoS1 and P-SoS2 are known, a modified Euler’s factorization using sum of squares (E-SoS3) is able to yield and and hence factorization is achieved.
A Java implementation of N-SoS1 uses simple arithmetic operations such as +, −, * and / as well as GCD and is provided as Supplementary Resources.
It is possible to avoid P-SoS2 once the first square becomes known as shown in Case Example 3 (below), by decrementing the square from (. However, Case Example 4 illustrates that even for small numbers not using P-SoS2 creates unpredictability of finding the second SoS. The RSA case illustrates that as per Case Example 3 and 4 a solution exists but poor selection of leads to an intractable result for RSA768.
Case Example 3: Consider a semi-prime along with an upscaling number
Let
We provide the results obtained after search iterations as follows:
Iteration 1:
Iteration 16:
Iteration 19:
(18 search cycles required)
A factorization of
From which the factorization of .
Case Example 4: Consider another semi-prime along with an upscaling number
Let
We provide the results obtained after search iterations as follows:
Iteration 1:
Iteration 14:
Iteration 27:
Iteration 155:
(154 search cycles required)
A factorization of
However, this did not lead to the factorization of
and an additional sum of squares is required.
A factorization of
Hence, from the above, the factorization of
4. Results
We employ our proposed method to demonstrate the factorization of a large RSA key. The commonly adopted 512-bit RSA key has been attempted to be attacked by successfully completing factorization several times with different factorization methods [
22,
23,
24]. To take the challenge of RSA key attack further in this work, we consider a higher key size of RSA such as 768-bit RSA key to apply our proposed semi-prime factorization method with the enhanced sum of squares of primes.
4.1. The Case of 768-bit RSA Key (RSA768)
Consider the semi-prime
Let
From the modified Euler factorization of previous work [
30], we have
p1
The factorization of
The RSA case illustrates that a solution exists but poor selection of leads to an intractable result for RSA768. The selection of of can be better determined. As presented in this paper, as n is incremented, a range of m values is searched through. This is continued until a perfect square is found. The attributes of have been determined via experimentation. The authors are currently characterizing with a view to reducing the search field for a suitable . This is an area of ongoing research.
4.2. Performance and Comparison
In
Table 1, we summarize the cost performance of our proposed method of the extended sum of squares approach to factorize semi-primes of different lengths (key sizes). For the various case examples of semi-primes including RSA768 that were computed in this work, we provide the cost factors of memory used and the non-optimized search time taken in terms of decrement loops of the square for completing the semi-prime factorization using our method. If a linear search is used from the square root of
the number of iterations is given thus. However, this can be significantly reduced by using
congruent to
.
Table 1 demonstrates that the memory required for our algorithm is minimal and the search cycle time is the main factor that needs to be contained for large RSA keys. It is important to note that each search cycle refers to only six steps involving rudimentary algebra consisting of multiplication, addition, subtraction and division operations exhibiting very low processing time. Next, we elaborate on the complexity of our algorithm and its comparison to other factoring algorithms.
4.2.1. Complexity of the Algorithm
We summarize the complexity of our proposed semi-prime factorization algorithm in terms of memory and computational time. These complexity measures are followed in similar lines to existing methods reported in the literature [
55].
The memory requirement of our algorithm is very minimal with most computations operating on the accumulator (using BigInteger arithmetic). The number of memory variables used for each major part and step involved in our algorithm are given below:
The manipulation of () to generate requires the use of three variables.
The resultant SoS ( requires the use of two variables.
The recovery ( requires the use of two variables.
Hence, these requirements occupy 7 BigInteger values in memory.
Studying the time complexity of our proposed algorithm, we find that each of the algorithm steps is rudimentary. Steps 3 and 5 use a function, which is the most expensive function in this algorithm. Only the integer part is considered so a Newton–Rapson algorithm is used here. Step 6 uses a GCD function. The remaining parts are simply multiplication, addition, subtraction and division operations that require minimal processing time.
4.2.2. Comparison to other Factoring Algorithms
Part 1 of our proposed factorization algorithm (N-SoS1) currently requires two input variables is usually small and : .
The current algorithm calculates and each cycle of creates a larger and larger range for . when . For .
Currently, a brute force search is conducted from ( until an SoS is found. This stochastic search is no better than Fermat’s factorization method. However, the following Case Example 5 illustrates the gain in the search cycle time performance of our proposed factorization method when is contained within a search field.
Case Example 5: For when defined achieves the result in much shorter overall search cycle time.
A result for = 15325 generates the perfect square required.
From this, an SoS of is obtained.
Further, N-SoS1 =
P-SoS2 =
(using E-SoS3)
Continued research is underway to characterize so that a reduced stochastic search can be undertaken in a reduced search field. Our proposed semi-prime factorization method is simple to understand and to implement. It uses rudimentary algebra of multiplication, addition, subtraction, and division operations that require minimal processing time unlike other algorithms that focus on lattice reduction (LR) techniques requiring time consuming operations. While this work is limited to an empirical study of our proposed method, formal proofs are quite extensive and are reserved for future work.
5. Discussion
By factoring the modulus of an RSA private key an attacker can compute the corresponding public key or vice versa. Various studies have surveyed RSA key size and limitations across public key infrastructure and new methods for attacking commonly used 512-bit and 768-bit RSA keys are continuing to interest researchers [
19,
56]. Most of the previous studies have studied attacks on RSA cryptosystems with specific focus. For instance, partial key exposure (PKE) attacks were studied in [
57,
58]. A PKE attack on low public-exponent RSA key of a variant of the RSA public-key cryptosystem was found to be less effective than for standard RSA. However, the large public exponent RSA key was reported to be more vulnerable to such attacks than for standard RSA. Some generalized studies have also been conducted comprehensively at a practical Internet scale to survey vulnerabilities of RSA key generation for various protocols such as SSH, HTTPS, and DKIM [
19,
20,
21,
59]. However, in this paper, we have considered the number theoretic properties of semi-primes that form the underlying principle behind any RSA key generation to be the fundamental cause for any RSA attack. For instance, by choosing an RSA modulus with a small difference of its prime factors, private key attacks could be improved [
60]. Some early studies have considered polynomial equations to study low exponent RSA vulnerabilities and their variants [
61,
62]. Recent number theory-based studies in this domain have focused on lattice reduction (LR) techniques with prime number theory to study RSA key attacks [
31,
38,
39]. Such LR based techniques come under the general category of Coppersmith attacks. However, the uniqueness of our contribution is that we propose factorization using the sum of squares that is more in line with Euler’s sum of squares approach rather than Coppersmith’s approach.
Whilst there is active research in LR methods, sum of squares remains an area of interest and continues to yield surprises, and it has been the motivation of this work. The subtleties in the value of are an area of interest that we explored in this study. The value of in this paper has special properties, which are described briefly. was constructed with two semi-primes. Each of the primes has special properties. They are all congruent to . Their squares are one apart. In the case of both semi-primes, the highest square of the smallest prime was the lowest square in the larger prime. Both semi-primes are perfect squares when 1 was subtracted. Multiplying the two semi-primes together created 8 sums of 2 squares, three of which are very close together (402, 401, 399). When such a number is multiplied with a semi-prime to be factored, 32 sums of 2 squares are available. As was shown previously, only two sums of two squares are required to factor the semi-prime in question. This increased the likelihood of finding two such sums of squares. Equations (4)–(8) describe the general form for . In this case
It would appear counter-intuitive to make the semi-prime to be factored
larger, however this is offset by the likelihood of finding two sums of two squares near the square root of
. In the case of RSA768 the first solution was some way off from the square root. The search can be restricted by only considering congruent co-prime sums of squares that are consistent with a previous study [
30]. In this way only sums of two squares which approximate
need be tested and of those, only the ones equal to
yield a valid solution. This substantially reduces the search cost. Since a linear search is conducted, the search field can easily be divided up and this lends itself very well to parallel processing. The mathematics in our approach is rudimentary, consisting of multiplications, additions and subtractions. The perfect square test does require a square root which is computationally costly, but this too can be avoided if the approach in [
30] is used to find solutions equivalent to
all of which are congruent to
.
Overall, the search area to find the first sum of squares is key to finding the second. Hence, the search is focused on finding the first sum of two squares. The relationship between the two squares of the sum of two squares of the semi-prime
determines the spread of, and the likelihood of, finding the first sum of two squares for
.
Table 1 shows the search cycle time for different key sizes explored in this work. The search area can be limited by
and
. This can be narrowed by considering the properties of
and the closeness of the sums of squares of
(402, 401, 399). The smearing of
across
by
determines the search field. This field is then divided over a number of parallel processes running a very simple (but fast)
algorithm. This search field to find the first sum of two squares is an area of ongoing research.