Algebraic Structures Induced by the Insertion and Detection of Malware

Cañadas, Agustín Moreno; Mendez, Odette M.; Vega, Juan David Camacho

doi:10.3390/computation11070140

Open AccessArticle

Algebraic Structures Induced by the Insertion and Detection of Malware

by

Agustín Moreno Cañadas

^1,*

,

Odette M. Mendez

²

and

Juan David Camacho Vega

¹

Departamento de Matemáticas, Universidad Nacional de Colombia, Edificio Yu Takeuchi 404, Kra 30 No. 45-03, Bogotá 11001000, Colombia

²

Departamento de Matemáticas, Universidad Nacional de Colombia, La Nubia, Manizales 170003, Colombia

^*

Author to whom correspondence should be addressed.

Computation 2023, 11(7), 140; https://doi.org/10.3390/computation11070140

Submission received: 7 June 2023 / Revised: 7 July 2023 / Accepted: 7 July 2023 / Published: 11 July 2023

Download

Browse Figures

Versions Notes

Abstract

:

Since its introduction, researching malware has had two main goals. On the one hand, malware writers have been focused on developing software that can cause more damage to a targeted host for as long as possible. On the other hand, malware analysts have as one of their main purposes the development of tools such as malware detection systems (MDS) or network intrusion detection systems (NIDS) to prevent and detect possible threats to the informatic systems. Obfuscation techniques, such as the encryption of the virus’s code lines, have been developed to avoid their detection. In contrast, shallow machine learning and deep learning algorithms have recently been introduced to detect them. This paper is devoted to some theoretical implications derived from these investigations. We prove that hidden algebraic structures as equipped posets and their categories of representations are behind the research of some infections. Properties of these categories are given to provide a better understanding of different infection techniques.

Keywords:

additive functor; cybersecurity; computer virus; malware; metamorphic virus; poset; poset representation

MSC:

68M25; 16G20; 16G30; 16G60

1. Introduction

Nowadays, the daily life of human being is significantly affected by computers and informatic systems, making cybersecurity one of the main concerns to be addressed by government agencies and companies to protect users from diverse threats arising from Internet use. Such threats are mainly provoked by malware, i.e., a malicious software designed to perform some unauthorized, often harmful or undesirable acts. Computer viruses, trojan horses, worms, and ransomware are examples of malware [1,2,3,4].

According to Cohen [5], who is considered the pioneer researcher in computer viruses, a virus is a program that is able to infect other programs by modifying them to include a possibly evolved copy of itself. Cohen wrote the first program of this type which is currently known as the Stoned boot virus. Another example of a computer virus is Stuxnet [6] considered the first cyber-warfare weapon ever.

Perhaps the simplest kind of malware is a Trojan horse which tries to appeal to and interest the user with some useful functionality to entice the user to run the program. In particular, they have been used to steal passwords. Rootkits, AIDS TROJAN DISK, Qbot (malware specialized in stealing user data) and TrickBot (malware focused on stealing financial data) are examples of Trojan horses.

Worms are also examples of malware. They are network viruses, primarily replicating on networks. Usually, these programs execute themselves automatically on a remote machine with minimal user intervention. Particularly, worms do not require a host program. SQL Slammer, Melissa (which is a macrovirus), Morris, as well as Netbus, Subseven, Deep Throat, Back Orifice and Concept, are some of the most known worms [1].

Ransomware is a kind of malware which encrypts data on a computer to prevent users from accessing their computer files or systems. Cybercriminals hold the data until a ransom is paid. It is worth pointing out that the FBI has observed that one of the most frequent attacks carried out over the last few years by cybercriminals is realized via some ransomware. Wannacry, LockBit, Cryptolocker, Sodinokibi/REvil, and Phobos are examples of this type of attack. According to Ploszek et al. [7], crypto Ransomware is the most dangerous among the different Ransomware attacks. These attacks allow the encryption of images, videos and any valuable user files.

At the beginning of the antivirus industry, malware detection was based on heuristic features that identified particular malware by creating a reliable fingerprint. During the detection, an antiviral engine checked the presence of the malware fingerprint in a file against known malware fingerprints stored in the antivirus database. String Scanning, Wildcards and Mismatches are examples of the first virus detection programs. Wildcards were used to detect the metamorphic virus w32/Regswap. These techniques allowed finding the sequence 83EB 0274 1683 EBOE 74OA 81EB 0301 0000 which identifies the w32/Beast virus [1].

Fingerprints associated with infected files were sensitive to small changes in files. Furthermore, malware writers invented metamorphic and polymorphic viruses, which give rise to hundreds of thousands of new virus versions, making the previous detection approaches ineffective. In this line, malware detection systems have been developed based on traditional machine learning (support vector machines, decision trees, naive Bayes classifier, etc.) and deep learning algorithms based on recurrent neural networks (RNNs) [8,9].

It is worth pointing out that several authors need to be more convinced of the RNN’s effectiveness for intrusion detection due to their vulnerability against adversarial attacks. These authors have preferred the use of images and to train convolutional neural networks to learn feature malware [10,11,12]. Another advancement in dealing with the use of machine learning in NIDS was proposed by Iglesias and Criado [13], who used time series, visibility graphs and multiplex networks to analyze the behavior of attackers’ computers. They pointed out that tools such as Snort used to analyze network traffic and protocol have disadvantages (e.g., no zero-day attacks detection [14]) to network intrusion detection.

Kaspersky [15] developed detection malware tools based on machine learning techniques. In such a case, hash functions, and unsupervised learning confluent to extract file features that can be computed quickly and directly retrieved from the structure of the executable, like a file format description. Authors refer to [16,17] for good surveys regarding recent trends of the deep learning use for malware detection, in particular, for descriptions of cloud-based malware detection, mobile-device-based malware detection, and IoT-based malware detection.

1.1. Motivations

Currently, there needs to be more malware investigations dealing with its relation to the theory of representation of algebras. A comprehensive algebraic study of malware insertion detection will give rise to a better understanding of different cyber-attacks; works in this direction have been proposed by Webster [18]. This paper proves that attacks of type Linux/Slapper or Scalper and some other metamorphic attacks in confluence with detection techniques as those presented by Kaspersky based on machine learning methods give rise to categories of representations of partially ordered sets. In particular, obfuscation techniques associated with metamorphic attacks define categorical equivalences between these categories.

1.2. Contributions

The main results of this paper are Theorems 2–4, and Corollary 1. Theorems 2 and 3 prove that some malware insertion-detection algorithms associated with some hierarchical attacks define particular families of partially ordered sets (posets).

Corollary 1 proves that posets introduced in Theorem 3 define hierarchical attacks without hidden malware.

Theorem 4 proves that malware insertion-detection algorithms give rise to categorical equivalences between categories of representations of posets.

This paper is structured as follows; Main definitions and notation are given in Section 2, we present an overview of definitions and notation regarding malware (Section 2.1) and posets (Section 2.2). We present the main results in Section 3. Section 4 gives an example of the results obtained in Section 3. Concluding remarks are given in Section 5.

2. Preliminaries

This section is devoted to revising basic definitions and notation regarding malware insertion and detection, as well as, partially ordered sets and their

F

-linear representations [1,2,3,4,19,20,21,22].

2.1. Malware

As explained in the introduction, malware is malicious software designed to perform some unauthorized, often harmful or undesirable acts [1]. The development of malware research has encouraged the introduction of sophisticated infection-detection malware techniques. Recently discovered computer viruses and worms such as Stuxnet [6] and its variations are examples of the research progress on the subject.

2.1.1. Computer Viruses

The typical structure of a computer virus consists of the following three subroutines [1]:

Infect-executable. This routine finds available executable files to infect them by copying its code.
Do-damage or Payload. This is responsible for delivering the malicious part of the virus.
Trigger-Pulled. Determines whether all the conditions required to deliver the payload are satisfied.

In the earlier stages of the antivirus industry, malware detection on computers had as a main goal to create a reliable fingerprint of a malicious file via its heuristic features. For instance,

Code fragments.
Hashes of code fragments or the whole file.
File properties.
Combinations of these features.

The obtained fingerprint is compared with those stored in an antivirus database. However, malware writers introduced new versions of code virus for which the fingerprint approach is inefficient. Currently, computer viruses include decryptors to hide their functionality, encryption keys can be generated in different ways, such as constant, random but fixed, sliding, and shifting, often the encryption is carried out by applying an xor operation (e.g., W95/Memorial virus). However, other encryption techniques dealing with symmetry key cryptography (e.g., the IDEA family of viruses) and public-key cryptography have been used to encrypt viruses. Polymorphic and metamorphic computer viruses are examples of the use of decryptors.

Polymorphic viruses can mutate their decryptors to a high number of different instances that can take millions of different forms. The 1260 virus is an example of a polymorphic virus, it includes two sliding keys to decrypt its body and some junk instructions, which are nothing but garbage in the code [1].

Metamorphic viruses create new virus generations that look different. They have one single-code body that carries data as code.

Formally speaking a metamorphic virus can be defined as follows [4]:

Let

Ψ_{P} (d, p)

be a function computed by a computer program P. Then a pair v and

v^{'}

of recursive functions are said to be a metamorphic virus if it satisfies the following identities:

Ψ_{v (δ)} (d, p) = \{\begin{matrix} D (d, p), & if T (d, p), \\ Ψ_{δ} (d, p (v^{'} (S (p)))), & if I (d, p), \\ Ψ_{δ} (d, p) & otherwise . \end{matrix}

and

Ψ_{v^{'} (δ)} (d, p) = \{\begin{matrix} D^{'} (d, p), & if T (d, p), \\ Ψ_{δ} (d, p (v^{'} (S^{'} (p)))), & if I^{'} (d, p), \\ Ψ_{δ} (d, p) & otherwise . \end{matrix}

where

(d, p)

is a running environment consisting of data d and programs p stored on computers.

D (d, p)

,

D^{'} (d, p)

, and

S (p)

are recursive functions. Whereas,

T (d, p)

is called the injury condition and

I (d, p)

,

I^{'} (d, p)

are called infection conditions.

The difference between polymorphic and metamorphic viruses is that each form of a polymorphic virus has the same kernel and forms associated with metamorphic viruses have their own kernel.

As an example, the following are two generations of the metamorphic virus W95/Regswap:

1.: 5A pop edx 58 pop, eax
2.: BF04000000 mov edi, 0004h BB04000000 mov ebx, 0004h
3.: 8BF5 mov esi, ebp 8BD5 mov edx, ebp
4.: B80C000000 mov eax, 000Ch BF0C000000 mov edi, 000Ch
5.: 81C288000000 mov add, edx, 0088h 81C088000000 mov add, eax, 0088h
6.: 8B1A mov ebx, [edx] 8B30 mov esi, [eax]
7.: 899C8618110000 mov [ $δ$ ] 89B4BA18110000 mov [ $δ^{'}$ ]
8.: $δ =$ esi+eax*4+00001118, ebx $δ^{'} =$ edx+edi*4+00001118, esi.

Figure 1 shows examples of different generations produced by metamorphic viruses.

Konstantinou [4] implemented a Hidden Markov Method to detect metamorphic attacks. He implemented (via a virus construction kit) code obfuscation techniques, like instruction reordering and garbage insertion, to produce the metamorphic versions of a virus. We remind the readers that, instruction substitution, instruction permutation, garbage code, variable substitution, and altering control flow are examples of obfuscation techniques. They have been used by viruses and worms as Evol (2000), Zmist, Zperm, Regswap, and Methaphor [1,2,3,4].

Some computer worms like Linux/Scalper develop so-called hierarchical attacks to control remote networks. In such a case, each infected node receives crucial information, such as the IP address of the adversary host and the addresses of the infected nodes. This type of information is provided to the remaining nodes until all target network nodes are infected.

Classical approaches to detect malware based on its fingerprint became ineffective due to its vulnerability to zero-day attacks. Recently, Kaspersky [15] implemented machine learning methods to detect packed routines. Their method consists of analyzing file features resistant to small changes. According to this approach, the machine learns suitable hash values

h (x)

associated with scanned files, and a similarity function is defined to determine whether or not two of these files are similar.

Similar files constitute a so-called hash bucket. These hash buckets classify the scanned files into two regions, named simple regions or hard regions. Files in simple regions of a hash bucket are either pure benign or pure malware, and no further feature analysis is required. Similarity pairs in these regions are of the form

(h (x_{1}), 0)

or

(0, h (x_{2}))

. In hard regions of a hash bucket, the files can be benign and malware, and deep feature analysis is developed for more precise detection. Similarity pairs in hard regions are of the form

(h (x_{1}), h (x_{2}))

.

Suppose the infection builds a hierarchical network, as in the case of a scalper attack. Each node contains benign and malware files in simple and hard regions. If vectors consisting of bits are used to denote such files, then a fixed node

N_{i}

has a structure

c (N_{i})

of the form.

\begin{matrix} c (N_{i}) & = S_{i} \cup H_{i}, 1 \leq i \leq s_{i}, f_{r s}, g_{l n}, h_{l^{'} n} \in {0, 1} . \\ S_{i} & = {f_{1 i}, f_{2 i}, \dots, f_{m i}} . \\ H_{i} & = {(g_{l_{1} ((t_{i} + j))}, h_{l_{1}^{'} ((t_{i} + j))}), (g_{l_{1} + 1 ((t_{i} + j))}, h_{l_{1}^{'} + 1 ((t_{i} + j))}), \dots, (g_{l_{1} + r_{i} ((t_{i} + j))}, h_{l_{1}^{'} + r_{i} ((t_{i} + j))})}, \\ H_{i, 1} & = {g_{l_{1} ((t_{i} + j))}, g_{l_{1} + 1 ((t_{i} + j)}, \dots, g_{l_{1} + r_{i} ((t_{i} + j))}}, \\ H_{i, 2} & = {h_{l_{1}^{'} ((t_{i} + j))}), h_{l_{1}^{'} + 1 ((t_{i} + j))}), \dots, h_{l_{1}^{'} + r_{i} ((t_{i} + j))})} . \end{matrix}

(1)

where

S_{i}

(

H_{i}

) denote the set of simple (hard) files contained in

N_{i}

. It is assumed that all the files have the same size.

Figure 2 shows the entries of the matrix

c (N_{i}) = (\begin{matrix} S_{i} & 0 \\ 0 & H_{i, 1} \\ 0 & i H_{i, 2} \end{matrix})

of the node

N_{i}

, where

S_{i}

,

H_{i, 1}

and

H_{i, 2}

are matrix blocks of suitable size associated with files

S_{i}

and

H_{i}

(see identities (1)). In this case, we add as many zeroes as possible to satisfy restrictions related to the inclusion of garbage instructions and size of the files. We also assume the notation

g + i h^{'}

for each pair of the form

(g, h^{'})

.

Obfuscation techniques as xor and row and column permutations can be applied to the elements of the node

c (N_{i})

to obtain new versions of the detected viruses.

A node

N_{i}

in a hierarchical attack is said to be strong (weak) if its files belong to a simple (hard) region. Files in a strong node are either of pure benign type or pure malware type. We let ⊙ (⊖) denote a strong (

w e a k

) node in a hierarchical attack.

Henceforth, we will assume that nodes associated with a hierarchical attack have the structure given by the matrix shown in Figure 2.

2.1.2. Using Information Theory to Detect and Insert Malware

As we have seen in previous sections, polymorphisms make infeasible static detection of viruses. We remind the reader that there are two kinds of polymorphisms (those obtained by data encryption and those obtained by data compressing). Machine learning methods have been developed to detect different file features such as N-grams, statistical features, and entropy. Particularly, entropy features are based on the entropy computation of the file or some of its areas. Bearing in mind that benign files tend to have low entropy values, whereas obfuscated or packed files tend to have high entropy values [23].

Lyda and Hamrock [24] introduced the idea of using entropy (over the entire file) to classify packed malware. It is worth noting that nowadays, distinguishing between packed and non-packed executable files is a strong line of investigation for malware analysts. For instance, Mantovani et al. [23] implemented a machine-learning classifier based on the union of features to identify different forms of packing. Lee et al. [25] used machine learning to recover original files from backup system files (infected with ransomware) via entropy techniques. Perdisci et al. [26] proposed studying specific packer features in the portable executable file format. Whereas, Ugarte-Pedrero et al. [27] suggested that entropy is the main feature of detecting packed files. They used the Zeus botnet, one of the first bot families to adopt low entropy packing schemes.

Raphel et al. [28] used entropy to recognize polymorphic samples which use xor-based encoders. Their approach is based on five steps (extraction of files or appropriated file fragments, computation and concatenation of such fragments, computation of the entropy for concatenated fragments and construction of a suitable similarity distance matrix).

We also recall that Lim et al. [29] proposed to analyze the different files as vectors or streams of bytes to analyze some statistical features.

Entropy has also been used as a helpful feature to insert malicious files. In such a case, the analyst splits a target file into shares or chunks to insert a low entropy pattern of bytes between each share; then, the malicious file is reconstructed in memory to bypass the action of high entropy file detectors. Menéndez et al. [30,31] used the entropy-based tools EnTS and EEE to detect and conceal malicious files into executables. They also used VirusTotal to reproduce the behavior of some anti-virus engines. Detect It Easy (DIE), PEiD, PackerID, NFD, ExeScan, and Manalyze are popular tools to analyze malware. In particular, DIE and PEiD have a component for entropy analysis [23,32].

Nowadays, an interesting problem in cryptography is proving the leakage resilience of cryptographic implementations. Side-channel attacks (SCA) may be one of these implementations’ most significant threats [33]. In this kind of attack, a secret key implemented in a device (e.g., a smart card) is retrieved by analyzing the side channel signals obtained from its physical implementation. Low entropy masking schemes (LEMS) have been introduced to guarantee high security against SCA attacks with less randomness than traditional masking schemes. Analysis of these types of schemes has been implemented by Li et al. [34], who studied leakage characteristics of multiplicative LEMS. Whereas Zhang et al. [35] trained deep learning assisted with a new metric to improve SCA attacks. Security of LEMS has also been studied by Grosso et al. [36], Ye et al. [37], and Zhang et al. [38].

Network security and channel capacity have been studied by Hua et al. [39], Adesso et al. [40], And Yilmaz et al. [41], who introduced a method to estimate the maximum amount of information leakage by some signals generated by the execution of some instructions in a processor.

2.2. Partially Ordered Sets and Their Representations

A partially ordered set or poset is a pair

(P, \leq)

, where

P

is a possibly empty set endowed with a relation ≤, which is

Reflexive, i.e., $x \leq x$ , for any $x \in P$ ,
Antisymmetric, i.e., $x \leq y$ and $y \leq x$ implies $x = y$ , for any $x, y \in P$ .
Transitive, i.e., $x \leq y$ and $y \leq z$ implies $x \leq z$ , for any $x, y, z \in P$ .

Henceforth, if there is no confusion, we will write

P

instead of the pair

(P, \leq)

to denote a poset.

Often, finite posets are described by their Hasse diagram, which is a system of sets with the form

{P; C_{r}, L}

, where r is a fixed positive real number (small enough) and for each point

p \in P

, it is defined a unique point

(x_{0}, y_{0}) \in R^{2}

and a unique circle

c \in C_{r}

with center

(x_{0}, y_{0})

and radius r.

The set

L

consists of non-horizontal lines connecting circles of

C_{r}

, according to the following rule:

A line l connects two circles c and $c^{'}$ with centers $(x_{0}, y_{0})$ and $(x_{0}^{'}, y_{0}^{'})$ associated with points p and $p^{'}$ in $P$ if and only if p and $p^{'}$ is a covering (i.e., if there is $z \in P$ such that $x \leq z \leq y$ then $x = z$ or $y = z$ ).

As an example, Figure 3 shows a Hasse diagram of a finite partially ordered set

P = {a, b, c, d, e, f}

such that

a < d

,

b < d

,

b < e

,

c < e

, and

c < f

.

A poset

P

is said to be a chain if for pair of points

x, y \in P

, it holds that

x \leq y

or

y \leq x

(i.e., any pair of points in a chain are comparable). A poset

P

is an antichain if its points are incomparable.

The width

w (P)

of a poset

P

is the size of its largest antichain (e.g., the width of a chain is 1).

If R is a commutative ring and

P

is a finite poset then a

P

-subspace U is a system of modules with the form

U = (U_{0}; U_{x} ∣ x \in P)

(2)

where

U_{0}

is an R-module,

U_{x}

is a submodule of

U_{0}

for any

x \in P

, and

U_{x} \subseteq U_{y} provided that x \leq y in P .

(3)

If R is a field then a

P

-subspace is said to be an R-linear representation (or representation) of the poset

P

[19,20,21,22].

If

U = (U_{0}; U_{x} ∣ x \in P)

and

V = (V_{0}; V_{x} ∣ x \in P)

are representations of a poset

P

then their sum

U \oplus V

is a representation given by the following identity.

U \oplus V = (U_{0} \oplus V_{0}; U_{x} \oplus V_{x} ∣ x \in P) .

(4)

The representation 0 has 0 as ground vector space. Furthermore, a representation U is said to be indecomposable if whenever

U = U_{1} \oplus U_{2}

then either

U_{1} = 0

or

U_{2} = 0

. Otherwise, U is said to be decomposable.

A morphism between two representations

U = (U_{0}; U_{x} ∣ x \in P)

and

V = (V_{0}; V_{x} ∣ x \in P)

is an R-linear map

φ : U_{0} \to V_{0}

such that

φ (U_{x}) \subseteq V_{x}

.

φ

is an isomorphism if

φ (U_{x}) = V_{x}

, for any

x \in P

.

The composition of morphisms between representations is given by the usual composition of R-linear morphisms. The identity morphism associated with a representation U is denoted

1_{U}

such that if

φ : U \to V

is a morphism then

φ 1_{U} = φ = 1_{V} φ

.

We let

rep P

denote the category of representations of a poset

P

, which is a Krull-Schmidt category.

\underset{̲}{\dim} U

denotes the dimension of a representation U of a poset

P

. It is an integral vector of the form

\underset{̲}{\dim} U = (d_{0}; d_{x} ∣ x \in P)

(5)

where

d_{0}

is the dimension

\dim_{R} U_{0}

of the vector space

U_{0}

as vector space. Whereas,

d_{x} = \dim_{R} U_{x} / \sum_{z \in x_{▴}} U_{z}

, for any

x \in P

.

\begin{matrix} x_{△} & = {z \in P ∣ z \leq x}, x_{▴} = x_{△} ∖ {x} . \\ x^{▿} & = {z \in P ∣ x \leq z}, x^{▾} = x^{▿} ∖ {x} . \end{matrix}

(6)

One of the problems regarding the theory of representation of posets consists of giving a complete description of the indecomposable representations of the categories

rep P

defined by finite posets

P

.

Up-to-date, the algorithms of differentiation have been the main tool to classify posets, the algorithm of differentiation with respect to a maximal point introduced by Nazarova and Roiter and the algorithm of differentiation with respect to a suitable pair of points are the most remarkable algorithms to reach such a classification. They are functors with the main goal of reducing the dimension of the posets involved in the classification process.

The following is the definition of the algorithm of differentiation with respect to a suitable pair of points also known as

D - I

or

D I

[19]: Let a and b be two points in a finite poset

P_{(a, b)}

then the pair

(a, b)

is said to be suitable for

D I

, if

P_{(a, b)}

can be written as a sum of the form

P_{(a, b)} = a^{▿} + b_{△} + C

(7)

where

C = c_{1} < c_{2} < \dots < c_{n}

is an n-point chain (

n \geq 0

).

The derived poset

P_{(a, b)}^{'}

is a subset of the modular lattice generated by

P_{(a, b)}

such that

P_{(a, b)}^{'} = (P_{(a, b)} + C^{+} + C^{-} + b_{△}) ∖ {C}

(8)

where

C^{+}

and

C^{-}

are n-point chains such that

C^{+} = c_{1}^{+} < c_{2}^{+} < \dots < c_{n}^{+}

, and

C^{-} = c_{1}^{-} < c_{2}^{-} < \dots < c_{n}^{-}

.

c_{i}^{-} < c_{i}^{+}

for all

1 \leq i \leq n

,

a < c_{1}^{+}

. Points in

P ∖ {C}

inherit the relations given by

P_{(a, b)}

. In particular, relations between these points and points

c_{i}^{+}

and

c_{i}^{-}

are given by the relations between them and points

c_{i}

.

Figure 4 shows Hasse diagrams of a poset

P_{(a, b)}

with a suitable pair of points and its corresponding derived poset.

Differentiation

D I

or

D_{(a, b)} : rep P_{(a, b)} \to rep P_{(a, b)}^{'}

is defined by the following identities for a representation

U = (U_{0}; U_{x} ∣ x \in P_{(a, b)})

:

\begin{matrix} D_{(a, b)} (U) & = U^{'} = (U_{0}^{'}; U_{x}^{'} ∣ x \in P_{(a, b)}^{'}), \\ U_{0}^{'} & = U_{0}, \\ U_{c_{i}^{+}}^{'} & = U_{c_{i}} + U_{a}, \\ U_{c_{i}^{-}}^{'} & = U_{c_{i}} \cap U_{b}, \\ U_{x}^{'} & = U_{x}, for the remaining points x \in P_{(a, b)}^{'}, \\ φ^{'} & = φ \in {Hom}_{R} (U, V), for any morphism - linear transformation φ : U \to V \in rep P . \end{matrix}

(9)

The following theorem is the main result regarding

D I

. For each i,

1 \leq i \leq n

,

p (a, c_{i}) = (U_{0}; U_{x} ∣ x \in P_{(a, b)})

is an indecomposable representation for which,

U_{0} = R

is a field.

U_{x} = R

is a field, for any

x \in {a, c_{i}}^{▿}

. It is zero for the remaining points in the poset.

Theorem 1

(Theorem 5.6, [19]). The two-point differentiation with completion functor

F_{(a, b)} = C_{(a, b)} D_{(a, b)}

induces a categorical equivalence between quotient categories

rep P / 〈 p (a, c_{1}), p (a, c_{2}), \dots, p (a, c_{n}) 〉 \tilde{\to} rep {\bar{P}}_{(a, b)}^{'} / 〈 p (a) 〉 .

(10)

where

〈 p (a, c_{1}), p (a, c_{2}), \dots, p (a, c_{n}) 〉

(

p (a)

) is the ideal consisting of morphisms which pass through direct sums of objects

p (a, c_{i})

(

p (a)

).

The Matrix Problem

The indecomposable representations of a poset

P

can be obtained as solutions of a matrix problem. To do that, we note that each representation of

P

gives rise to a matrix

M = M_{P}

(a matrix representation) whose columns are partitioned into strips

M_{x}

labeled by the points of the poset. Columns contained in the strip associated

M_{x}

consists of coordinates with respect to a fixed basis

B

of

U_{0}

of generators of the subspace

U_{x}

. In this case, if

C_{x}

is the set of columns in the strip

M_{x}

then

s p a n C_{x} = U_{x}

.

If M and

M^{'}

are matrix representations of a poset

P = {x_{i} ∣ 1 \leq i \leq n}

with

M = \begin{array}{c} M_{x_{1}} & \dots & M_{x_{t}} \end{array},

M^{'} = \begin{array}{c} M_{x_{1}}^{'} & \dots & M_{x_{t}}^{'} \end{array}

then the direct sum

M \oplus M^{'}

of M and

M^{'}

is given by the formula

M \oplus M^{'} = \begin{array}{c} M_{x_{1}} & ⋮ & 0 & \dots & M_{x_{t}} & ⋮ & 0 \\ 0 & ⋮ & M_{x_{1}}^{'} & \dots & 0 & ⋮ & M_{x_{t}}^{'} \end{array}

Two representations M and

M^{'}

are said to be equivalent if one can be obtained from the other using the following admissible transformations:

Elementary transformations of rows of the whole matrix.
Elementary column transformations of the columns within each vertical strip.
Addition of columns of a strip $M_{x_{i}}$ .

Equivalent matrices give rise to isomorphic representations of the associated poset.

3. Main Results

We remind readers that an equipped poset

P

is a poset whose points define a partition of the form

P = P^{⊖} + P^{⊙}

. If

x \in P^{⊙}

(

x \in P^{⊖}

) then x is said to be a strong point (weak point). Relations R in equipped posets are partitioned into two sets

R = R^{⊖} + R^{⊙}

, if a pair

(x, y) \in R^{⊙}

(

(x, y) \in R^{⊖}

) then we write

x ⊴ y

(

x ⪯ y

). In such a case if

x \leq y

, i.e.,

(x, y) \in R

and

y ⊴ z

, i.e.,

(y, z) \in R^{\circ}

then

x ⊴ z

. Also, if

x ⊴ y \leq z

then

x ⊴ z

[20,22].

We assume that the hierarchical attack (see Figure 2) model satisfies the following additional condition:

1.: All the files associated with the malware infecting a network belong to an isolated strong node denoted M.
2.: Each infected node x is encoded by finite sets of ${0, 1}$ -vector columns, $S_{x} \cup H_{x}$ . Columns in $S_{x}$ encode either benign files or malware. Columns in $H_{x}$ encode hidden malware in hard regions.
3.: The files in the malware node M are distributed among a fixed set of weak nodes $N_{0}, N_{1}, N_{2}, \dots N_{n}$ , where $N_{0}$ denotes the initial stage of the infection (hidden malware associated with hard regions are contained in $H_{0}$ ). $c (N_{j}) = S_{j} \cup H_{j} \subset S_{j + 1} \cup H_{j + 1}$ , for any $0 \leq j \leq n - 1$ .
4.: If a node P in the attacked network is infected by a node $N_{j}$ for some $0 \leq j \leq n$ then it holds that $S_{0} \cup H_{0} \subset S_{p}$ , where P is encoded by $S_{p} \cup H_{p}$ . Particularly, if P is also infected by a weak node $N_{j}$ , it holds that either $S_{j} \cup H_{j} \subset S_{p} \cup H_{p}$ or $S_{j} \cup H_{j} \subset S_{p}$ .

The following result proves that a hierarchical attack structured by matrices

c (N_{i})

(Figure 2) defines an equipped poset.

Theorem 2.

A hierarchical attack defined by a strong node M as defined above and weak nodes

N_{0}, N_{1}, \dots, N_{n}

with the structure given by a matrix

c (N_{i})

(see Figure 2) and conditions (1)–(4) defines an equipped poset.

Proof.

We note that nodes in the infected network are the points in the equipped poset

P

. Strong nodes correspond to strong points, and weak nodes correspond to weak points in

P

. The stages of the infection start in

N_{0}

, continue to

N_{1}

and so on. Since, for any pair of weak nodes

N_{i}

and

N_{j}

,

i < j

it holds that

S_{i} \cup H_{i} \subset S_{j} \cup H_{j}

with

H_{j} \neq ⌀

then

N_{i}

and

N_{j}

are weakly related. Moreover,

N_{0}, N_{1}, N_{2}, \dots, N_{n}

constitute a weak chain, in the sense that its points and relations between them are weak, we write

C = N_{0} ⪯ N_{1} ⪯ N_{2} ≺ \dots ≺ N_{n}

. Condition (3) proves that relations between weak points

N_{j}

and another point in

P

are either weak or strong. Whereas, relations between

N_{0}

and points

x \in P ∖ C + M

are strong. Finally, we note that by definition the strong point M is incomparable with the other points of the poset

P

. Therefore,

P

can be written as a sum with the form

P = N_{0}^{▿} + C + M

. Where

N_{0}^{▿} = {x \in P ∣ N_{0} ⊴ x}

. Figure 5 shows an example of an equipped poset induced by a hierarchical attack. Double (single) lines denote strong (weak) relations. In this case, N represents an arbitrary set of infected nodes related to the weak chain

N_{0} ⪯ N_{1} ⪯ \dots ⪯ N_{n}

. □

If

N_{x}

is a node infected by a hierarchical attack with files of type

f_{i j}

,

(g_{k l}, h_{m n})

for suitable indexes

i, j, k, l, m, n

, then

s p a n_{Z_{2}} {f_{i j}, g_{k l)}, h_{m n}}

is said to be the hull of

N_{x}

, we let

\tilde{U_{N_{x}}}

denote the hull of the node

N_{x}

. Note that,

s p a n {S_{x} \cup H_{x}} = U_{x} \subseteq \tilde{U_{N_{x}}}

.

\tilde{U_{N_{x}}} = U_{N_{x}}

if and only if

N_{x}

is strong.

U_{x}^{-}

denotes the strong subspace

s p a n {S_{x}}

of the subspace

U_{x}

. In such a case,

\tilde{U_{x}^{-}} = s p a n_{Z_{2} + i Z_{2}} {S_{x}}

.

According to the definition of a hierarchical attack and its properties (1)–(4). We note that hidden malware associated with hard regions can be pinpointed by xoring files in

U_{N_{0}}

with files in weak nodes

U_{N_{j}}

. In such a case, it is built the span sum

\tilde{U_{N_{0}}} + U_{N_{j}}

,

0 \leq j \leq n

.

The detection procedure determines the matrix

D

shown in Figure 6, labeled above by the infected nodes (

N_{i}

, M and N) of a network. The bottom part (under the bold line) is labeled by corresponding symbols

N_{i}^{-}, N_{i}^{+}

. Such symbols denote subspaces spanned by columns whose entries are elements over

Z_{2} + i Z_{2}

. We let

U_{x}

denote the subspace associated with a point x.

U_{N_{i - 1}^{-}} \subset U_{N_{i}^{-}}

,

1 \leq i \leq n

(these columns are denoted with the symbol ∗), these subspaces encode pure malware (and weak relations) associated with nodes

N_{i}

.

U_{N_{j}^{+}} \subset U_{N_{j + 1}^{+}}

,

1 \leq j \leq j - 1

. Columns associated with symbols

H_{i}

encode hidden malware pinpointed by the detection process. Such malware can be inserted into the node

N_{0}

by adding some garbage entries denoted

I

in the matrix

D

. Relations between subspaces associated with points

x \in P ∖ C = {N_{0} ⪯ N_{1} ⪯ \dots N_{n}}

keep without changes their relations with the other points of

P

.

Relations between the infected files allow us to give the next result.

Theorem 3.

The insertion-detection matrix

D

constitute an equipped poset

P^{d} = C^{+} + C^{-} + M + N

. Where,

C^{+} = N_{0}^{+} ⊴ N_{1}^{+} ⪯ \dots ⪯ N_{n}^{+}

and

C^{+} = N_{0}^{-} ⪯ N_{1}^{-} ⪯ \dots ⪯ N_{n}^{-}

are chains, M and N and their relations are defined as for the poset

P

.

Proof.

By definition, point

N_{0}^{+}

is a strong point. Furthermore, since files associated with the nodes

N_{i}^{-}

constitute malware satisfying the condition

U_{N_{i - 1}^{-}} \subset U_{N_{i}^{-}}

,

1 \leq i \leq n

, then points

N_{j}^{-}

,

0 \leq j \leq n

constitute a weak chain. In particular,

N_{j}^{-} ⪯ M

, for any j. The same argument for subspaces

N_{j}^{+}

allow us to infer that

N_{0}^{+} ⊴ N_{1}^{+}

. Since

N_{i}^{+} ⪯ N_{i + 1}^{+}

,

1 \leq i \leq n - 1

, it holds that

N_{0}^{+} ⊴ N_{i}^{+}

. Moreover,

N_{j}^{-} ⪯ N_{j}^{+} ⪯ N_{j + 1}^{+}

, for any

0 \leq j \leq n - 1

,

N_{n}^{-} ⪯ N_{n}^{+}

. Since relations between points

N_{j}^{-}

,

N_{j}^{+}

and points in subset

N \cup {M}

are inherited by the relations that these points have with points

N_{0}, \dots, N_{n}

. The following Figure 7 shows the poset

P^{d}

defined by the insertion-detection matrix

D

. □

Corollary 1.

The hierarchical attack defined by an equipped poset of type

P^{d}

has no hidden malware.

Proof.

The malware used in this type of attack is encoded by subspaces

U_{N_{j}^{-}}

and

H_{j}

which are induced by simple regions. □

In a more general setting, we can define a functor

D_{(N_{0}, M)}

induced by a hierarchical attack defined by an equipped poset of type

P

and its associated detection algorithm defined by a corresponding equipped poset

P^{d}

. The following Figure 8 shows the poset

P

and its detector

P^{d}

.

If we replace the field

Z_{2}

for the real numbers field and

Z_{2} + i Z_{2}

for the complex numbers field. Then

(R, C)

-column transformations between rows and columns of the matrices induced by the linear structure of posets

P

and

P^{d}

give rise to categories of representations of the equipped posets

P

and

P^{d}

. In such a case, a representation U of an equipped poset

P

is a system of

C

-subspaces of the form

U = (U_{0}, U_{x} ∣ x \in P)

, with

U_{x} \subseteq U_{y}

(

\tilde{U_{x}} \subset U_{y}^{-}

) provided that

x ⪯ y

(

x ⊴ y

).

A morphism

φ : U \to V

between two representations U and V of an equipped poset

P

is a

C

-linear transformation such that

\tilde{φ} (U_{x}) \subset V_{x}

. Note that,

\tilde{φ} (u + i v) = φ (u) + i φ (v)

, for any pair of appropriated vectors.

φ

is an isomorphism if and only if

\tilde{φ} (U_{x}) = V_{x}

for any

x \in P

.

Each representation U over the pair of fields

(R, C)

of an equipped poset can be represented by a matrix

M

with entries over

C

separated into vertical strips

(M_{x}; x \in P)

labeled by the points of

P

. Columns in

M_{x}

are generators of

U_{x}

.

The matrix problem associated with an equipped poset

P

is defined as follows:

Two matrix representations of an equipped poset are said to be equivalent, if one can be obtained from the other via the following admissible transformations:

Elementary transformations over $C$ of rows of whole matrix.
Elementary column transformations over $C$ within each vertical strip.
Additions of columns of a strip $M_{x}$ to the columns of $M_{y}$ if $x ⪯ y$ .
Independent additions of the real and imaginary part of the columns of a strip $M_{x}$ to the real and imaginary parts of a strip $M_{y}$ if $x ⊴ y$ .

Note that, if

P = c_{1} ≺ c_{2} ≺ \dots c_{n - 1} ≺ c_{n}

is a weak chain then ⌀,

P (c_{i})

,

1 \leq i \leq n

,

T (c_{i})

, and

T (c_{i}, c_{j})

,

1 \leq i < j \leq n

are its only indecomposable representations, where

$P (c_{i}) = (C; {(P (c_{i}))}_{x} ∣ x \in P)$ , ${(P (c_{i}))}_{x} = C$ , $x = c_{j}$ , $i \leq j$ , ${(P (c_{i}))}_{x} = 0$ , if $j < i$ .
$P (⌀) = (C; {(P (c_{i}))}_{x} = 0 ∣ x \in P)$ .
$T (c_{i}) = (C; {(T (c_{i}))}_{x} ∣ x \in P)$ , ${(T (c_{i}))}_{x} = s p a n {{(1, i)}^{t}}$ , $x = c_{j}$ , $i \leq j$ , ${(T (c_{i}))}_{x} = 0$ , if $j < i$ .
$T (c_{i}, c_{j}) = (C; {(T (c_{i}, c_{j}))}_{x} ∣ x \in P)$ , ${(T (c_{i}, c_{j}))}_{x} = s p a n {{(1, i)}^{t}}$ , $x = c_{s}$ , $i \leq s < j$ , ${(T (c_{i}, c_{j}))}_{x} = \tilde{C} = s p a n {{(1, 0)}^{t}, {(0, 1)}^{t}}$ , if $j \leq s \leq n$ . ${(T (c_{i}, c_{j}))}_{x} = 0$ , if $s < i$ .

Theorem 4.

The insertion-detection matrix

D

defined over the pair of fields

(R, C)

associated with the equipped posets

P

and

P^{d}

induces the functor

D_{(N_{0}, M)} : rep P \to rep P^{d}

such that for

U = (U_{0}; U_{x} ∣ x \in P) \in rep P

it holds that

\begin{matrix} D_{(N_{0}, M)} (U) & = (U_{0}^{d}; U_{x}^{d} ∣ x \in P), \\ U_{0}^{d} & = U_{0}, \\ U_{N_{i}^{+}}^{d} & = \tilde{U_{N_{0}}} + U_{N_{i}}, 0 \leq i \leq n, \\ U_{N_{i}^{-}}^{d} & = U_{N_{i}} \cap U_{M}, 0 \leq i \leq n, \\ U_{x}^{d} & = U_{x}, for the remaining points x \in P, \\ φ^{d} : U^{d} \to V^{d} & = φ : U \to V, for any linear map - morphism φ : U_{0} \to V_{0} . \end{matrix}

(11)

Moreover,

D_{(N_{0}, M)}

is a categorical equivalence between the quotient categories

C = rep P / J

and

C^{d} = rep P^{d} / J^{d}

. Where, for fixed

U, V \in rep P

,

J

is the ideal of

rep P

consisting of morphisms

φ : U \to V

that pass through direct sums of the indecomposable objects

P (N_{0})

,

T (N_{0})

, and

T (N_{0}, N_{i}))

, i.e.,

J = 〈 P (N_{0}), T (N_{0}), T (N_{0}, N_{i})) ∣ 1 \leq i \leq n 〉

. The ideal

J^{d}

is defined in the same fashion, i.e.,

J^{d} = 〈 N_{0}^{+} 〉

.

Proof.

Firstly, we note that

D_{(N_{0}, M)}^{d}

is an additive functor provided that for all morphisms

φ : U \to V

and

ψ : V \to W

, it holds that,

D_{(N_{0}, M)}^{d} (ψ φ) (U_{x}) \subseteq W_{x}

, for any

x \in P

,

D_{(N_{0}, M)}^{d} (1_{U}) = 1_{U^{d}}

, and for any

U, V \in rep P

,

H o m (U, V)

is a

C

-vector space by definition.

Note that, for fixed

U, V \in rep P

, it holds that,

J (U, V) \subset H o m (U, V) \subseteq H o m (U^{d}, V^{d})

and

J (U, V) \subset J^{d} (U, V)

. Moreover, if

[X, Y]

denotes the morphism-subspace of

H o m (U, V)

whose elements satisfy the condition

φ \in [X, Y] if and only if X \supseteq \ker φ and img φ \subset Y .

(12)

Then it is easy to see that for fixed

U, V \in rep P

, and a morphism

φ : U \to V

it holds that

\begin{matrix} H o m (U^{d}, V^{d}) & = H o m (U, V) + J (U^{d}, V^{d}), \\ H o m (U, V) \cap J (U^{d}, V^{d}) & = J (U, V) . \end{matrix}

(13)

Thus, any linear morphism

δ : C (U, V) \to C^{d} (U^{d}, V^{d})

is an isomorphism.

The density of the functor

D_{(N_{0}, M)}

follows from the same ideas used to carry out a hierarchical attack to the network defined by the equipped poset

P

. In such a case, we consider that

U_{N_{0}^{+}}^{d} = U_{N_{0}^{+}}^{d} \cap U_{M} \oplus X_{0}

, where

X_{0}

is complementary subspace,

X_{0} = s p a n {z_{1}, z_{2}, \dots, z_{s}}

. For these vectors we define corresponding vectors

w_{1}, w_{2}, \dots, w_{s}

.

We note that for each i,

1 \leq i \leq n

. Any subspace

U_{N_{i}^{+}}^{d}

of the poset

P^{d} \cup {N_{0}^{+} ⊴ M}

can be written in the form

U_{N_{i}^{+}}^{d} = U_{N_{i - 1}^{+}}^{d} \oplus U_{N_{i}^{-}}^{d} \oplus H_{i} \oplus Y_{i}

(14)

where

Y_{i}

denotes an appropriated complementary subspace. Note that,

H_{i} \subset U_{N_{0}^{+}} \cap U_{M}

(corresponds to hidden malware) and

U_{N_{i}^{-}} \subset U_{M}

(corresponds to pure malware).

Let

{h_{i 1}, h_{i 2}, \dots, h_{i n_{i}}}

be a fixed basis then it is possible to define vectors of the form

e_{i 1} + i h_{i 1}, \dots, e_{i n_{i}} + i h_{i n_{i}}

, for some suitable vectors

e_{i 1}, \dots, e_{i n_{i}}

.

Let

I_{0} = s p a n {w_{1}, \dots, w_{s}, e_{i 1}, \dots, e_{i n_{i}} ∣ 1 \leq i \leq n}

and

{y_{i 1}, \dots, y_{i m_{i}}}

a basis of the subspace

Y_{i}

then the representation

L \in rep P

such that

\begin{matrix} L_{0} & = U_{0}^{d} \oplus I_{0}, \\ L_{N_{0}} & = U_{N_{0}^{+}}^{d} \cap U_{M} \oplus s p a n {z_{1} + i w_{1}, z_{2} + i w_{2}, \dots, z_{s} + i w_{s}} \oplus \underset{j = 1}{\sum^{n_{i}}} \underset{i = 1}{\sum^{n}} e_{i j} + i h_{i j}, \\ L_{N_{i}} & = U_{N_{i - 1}} \oplus U_{N_{i}^{-}}^{d} \oplus H_{i} \oplus \underset{j = 1}{\sum^{n_{i}}} \underset{i = 1}{\sum^{n}} e_{i j} + y_{i j}, \\ L_{x} & = U_{x}^{d} for the remaining points x \in P . \end{matrix}

(15)

is such that

L^{d} = U^{d} \oplus {(P (N_{0}^{+}))}^{\dim_{C} I_{0}}

. We are done. □

4. Experimental Data

This section applies Theorems 3 and 4 and Corollary 1 to insert and detect images. Firstly, we show a 256 × 256 original image I extracted from specialized datasets such as FERET and Kagle. We then create subspaces associated with a poset

P = N_{0} ≺ N_{1} ≺ N_{2} ≺ N_{3} ≺ N_{4} ≺ N_{5} \cup {M}

as follows:

The subspace $U_{N_{0}}$ associated with the weak point $N_{0}$ is given by a linear combination of images with the form

$U_{N_{0}} = \underset{i = 0}{\sum^{9}} D W_{(0, i)} + A_{10} + 0.01 M_{10}$

(16)

where $D W_{(0, i)} = α_{i} A_{i} + β_{i} M_{i}$ , $0.01 \leq α_{i}, β_{i} \leq 0.02$ .
For $1 \leq j \leq 5$ , each subspace $U_{N_{j}}$ (associated with a weak point $N_{j}$ ) is given by linear combinations of images with the form

$U_{N_{j}} = γ_{j} N_{j - 1} + δ_{j} M_{10 + j} + A_{10 + j}, 0.01 \leq γ_{j}, δ_{j} \leq 0.02 .$

(17)

The embedded images $M_{j}$ , $0 \leq j \leq 15$ span the subspace $U_{M}$ (associated with the strong point M). They are considered malware for images $A_{j}$ , the construction of the subspace $U_{N_{j}}$ is considered the infection stage.
For the detection process, we note that the subspaces $U_{N_{j}^{-}}$ are given by the images $M_{10 + j}$ (i.e., $U_{N_{j}^{-}} = s p a n {M_{10 + j}}$ ), $1 \leq j \leq 5$ . These constructions constitute the first step for the detection process.
The second step of the detection process consists on building subspaces $U_{N_{j}^{+}}$ given by linear combinations of images with the form

$U_{N_{j}^{+}} = \underset{i = 0}{\sum^{10 + j}} α_{j} A_{j} + \underset{i = 0}{\sum^{10 + j}} δ_{j} M_{j} .$

(18)

If $t \in {0, \dots, 5}$ and for $0 \leq j \neq t \leq 5$ , it holds that $0.01 \leq δ_{j} \leq 0.02$ , $δ_{t} = 1$ then $U_{N_{j}^{+}}$ reveals $M_{t}$ as a kind of malware infecting the image, $A_{j}$ .

Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 show examples of images

A_{j}

(original images), associated with subspaces

U_{N_{j}}

,

U_{N_{j}^{-}}

(first step), and

U_{N_{j}^{+}}

(denoted

H_{j}

in the second step) for

0 \leq j \leq 5

. We compare the associated histograms. We note that the histograms associated with the second step suggest embedded malware.

5. Concluding Remarks and Future Work

Hierarchical attacks designed for peer-to-peer remote control via metamorphic worms induce different algebraic structures. On the one hand, the infection process defines so-called equipped posets. These posets constitute a mathematical model of a hierarchical attack where the nodes are either weak or strong, accordingly of whether the node represents an infection with either hidden malware or pure malware. Pure malware is relatively easy to detect, whereas hidden malware requires deep scanning analysis. Modeling such an analysis gives rise to categories of representations of equipped posets over the pair of fields

(R, C)

, and malware insertion-detection defines a categorical equivalence between quotient categories.

Future Work

Since this work focuses on the algebraic properties of hierarchical attacks, it remains an open problem to determine the properties associated with more general types of infections and NIDS based on deep learning algorithms.

Another task to develop in the future is to apply the proposed theoretical framework to the real field of the intrusion and detection of malware.

Author Contributions

Investigation, writing, review and editing, A.M.C., O.M.M., J.D.C.V. All authors have read and agreed to the published version of the manuscript.

Funding

Center of Excellence in Scientific Computing (CoE-SciCo) Universidad Nacional de Colombia.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

$R$	(Real numbers)
$C$	(Complex numbers)
DI	(Algorithm of differentiation with respect to a suitable pair of points)
Poset	(Partially Ordered Set)
$P$	(Equipped poset)
⊙	(Strong point)
⊖	(Weak point)
$\| \|$	(Strong relation in an equipped poset)

References

Szor, P. The Art of Computer; Virus Research and Defense; Pearson Education Inc.: Hoboken, NJ, USA, 2005. [Google Scholar]
Venkatachalam, S. Detecting Undetectable Computer Viruses. Master’s Thesis, San José State University, San José, CA, USA, 2010. [Google Scholar]
Alzarooni, K.M.A.Y. Malware Variant Detection. Ph.D. Thesis, University College London, London, UK, 2012. [Google Scholar]
Konstantinou, E. Metamorphic Virus: Analysis and Detection; Technical Report; Royal Holloway, University of London: London, UK, 2008. [Google Scholar]
Cohen, F.B. A Short Course on Computer Viruses; Wiley Professional Computing: New York, NY, USA, 1994. [Google Scholar]
Matrosov, A.; Rodionov, E.; Harley, D.; Malcho, J. Stuxnet under the microscope. ESET LLC 2010, 6, 1–85. [Google Scholar]
Ploszek, R.; Švec, P.; Debnár, P. Analysis of encryption schemes in modern ransomware. Rad Hazu Maematičke Znanosti 2021, 25, 1–13. [Google Scholar]
Cannarile, A.; Carrera, F.; Galantucci, S.; Iannacone, A.; Pirlo, G. A study on malware detection and classification using the analysis of API calls sequences through shallow learning and recurrent neural networks. In Proceedings of the TASEC’22: Italian Conference on Cybersecurity, Rome, Italy, 20–23 June 2022; Volume 3260, pp. 1–11. [Google Scholar]
Amer, E.; Zelinka, I. A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence. Comput. Secur. 2020, 92, 1–15. [Google Scholar] [CrossRef]
Hu, W.; Tang, Y. Black-box attacks against RNN based malware detection algorithms. In Proceedings of the AAAI Workshops, New Orleans, LA, USA, 2–7 February 2018; pp. 245–251. [Google Scholar]
He, K. Malware Detection with Malware Images using Deep Learning Techniques. Bachelor’s Thesis, University of Canterbury, Canterbury, UK, 2018. [Google Scholar]
Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images: Visualization and automatic classification. In VizSec ’11: Proceedings of the 8th International Symposium on Visualization for Cyber Security; ACM: Pittsburg, PA, USA, 2011; pp. 1–7. [Google Scholar]
Iglesias Perez, S.; Criado, R. Increasing the effectiveness of network intrusion detection systems (NIDSs) by using multiplex networks and visibility graphs. Mathematics 2023, 11, 107. [Google Scholar] [CrossRef]
Kumar, J.; Subbiah, G. Zero-day malware detection and effective malware analysis using shapley ensemble boosting and bagging approach. Sensors 2022, 22, 2798. [Google Scholar] [CrossRef]
Kaspersky Enterprise Cybersecurity. Machine Learning for Malware Detection. 2017. Available online: media.kaspersky.com (accessed on 7 June 2023).
Tayyab, U.-E.-H.; Khan, F.B.; Durad, M.H.; Khan, A.; Lee, Y.S. A Survey of the Recent Trends in Deep Learning Based Malware Detection. J. Cybersecur. Priv. 2022, 2, 800–829. [Google Scholar] [CrossRef]
Aslan, Ö.A.; Samet, R. A comprehensive review on malware detection approaches. IEEE Access 2020, 8, 1–23. [Google Scholar] [CrossRef]
Webster, M.; Malcom, G. Detection of metamorphic and virtualization-based malware using algebraic specification. J. Comp. Virol. 2009, 5, 221–245. [Google Scholar] [CrossRef] [Green Version]
Zavadskij, A.G. On Two Point Differentiation and its Generalization. Algebr. Struct. Their Represent. AMS Contemp. Math. Ser. 2005, 376, 413–436. [Google Scholar]
Zavadskij, A.G. Tame equipped posets. Linear Algebra Appl. 2003, 365, 389–465. [Google Scholar] [CrossRef] [Green Version]
Cañadas, A.M.; Gaviria, I.D.M. Categorical Properties of Some Algorithms of Differentiation for Equipped Posets. Algebra Discret. Math. 2022, 33, 38–86. [Google Scholar]
Cañadas, A.M.; Vargas, V.C. On the apparatus of differentiation DI-DV for posets. São Paulo J. Math. Sci. 2019, 9, 249–286. [Google Scholar] [CrossRef]
Mantovani, A.; Aonzo, S.; Ugarte-Pedrero, X.; Merlo, A.; Balzarotti, D. Prevalence and impact of low-entropy packing schemes in the malware ecosystem. In Network and Distributed Systems Security (NDSS) Symposium; NDSS: San Diego, CA, USA, 2020; pp. 1–15. [Google Scholar]
Lyda, R.; Hamrock, J. Using entropy analysis to find encrypted and packed malware. IEEE Secur. Priv. 2007, 5, 40–45. [Google Scholar] [CrossRef]
Lee, K.; Lee, S.-Y.; Yim, K. Machine learning based file entropy Analysis for ransomware detection in backup systems. IEEE Access 2019, 7, 110205–110215. [Google Scholar] [CrossRef]
Perdisci, R.; Lanzi, A.; Lee, W. Classification of packed executables for accurate computer virus detection. Pattern Recognit. Lett. 2008, 29, 1941–1946. [Google Scholar] [CrossRef] [Green Version]
Ugarte-Pedrero, X.; Santos, I.; Sanz, B.; Laorden, C.; Bringas, P.G. Countering entropy measure attacks on packed software detection. In Proceedings of the Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 14–17 January 2012; pp. 164–168. [Google Scholar]
Raphel, J.; Vinod, P. Information theoretic method for classification of packed and encoded files. In Proceedings of the 8th International Conference on Security of Information and Networks, SIN’15, Sochi, Russia, 8–10 September 2015; ACM: New York, NY, USA, 2015; pp. 296–303. [Google Scholar]
Lim, C.; Ramli, K.; Cheng, W.; Kotualubun, Y.S. Mal-flux: Rendering hidden code of packed binary executable. Digit. Investig. 2019, 28, 83–95. [Google Scholar] [CrossRef]
Menéndez, H.D.; Bhattacharya, S.; Clark, D.; Barr, E.T. The arms race: Adversarial search defeats entropy used to detect malware. Expert Syst. Appl. 2019, 118, 246–260. [Google Scholar] [CrossRef]
Menéndez, H.D.; Llorente, J.L. Mimicking anti-viruses with machine learning and entropy profiles. Entropy 2019, 21, 513. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chen, S.-W.; Chuang, T.-H.; Tien, C.-W.; Chen, C.-W. An experience in enhancing machine learning classifier against low-entropy packed malwares. Comput. Sci. Inf. Technol. 2021, 11, 4. [Google Scholar]
Cheng, W.; Guilley, S.; Carlet, C.; Danger, J.L.; Mesnager, S. Leakages in code-based masking: A unified quantification approach. Iacr Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 465–495. [Google Scholar] [CrossRef]
Li, Y.; Liu, S.; Guilley, S.; Tang, M. Analysis of multiplicative low entropy masking schemes against correlation power attack. IEEE Trans. Inf. Forensics Secur. 2021, 16, 4466–4481. [Google Scholar] [CrossRef]
Zhang, Z.; Ding, A.A.; Fei, Y. A guessing entropy-based framework for deep learning-assisted side-channel analysis. IEEE Trans. Inf. Forensics Secur. 2023, 18, 3018–3030. [Google Scholar] [CrossRef]
Grosso, V.; Standaert, F.X.; Prouff, E. Low entropy masking schemes, Revisited. In Smart Card Research and Advanced Applications; CARDIS, 2013; Lecture Notes in Computer Science; Fr, A., Rohatgi, P., Eds.; Springer: Cham, Switzerland, 2014; Volume 8419. [Google Scholar]
Ye, X.; Eisenbarth, T. On the vulnerability of low entropy masking schemes. In Proceedings of the Smart Card Research and Advanced Application Conference, Berlin, Germany, 27–29 November 2013. [Google Scholar]
Zhang, Z.; Dofe, J.; Yu, Q. Improving power analysis attack resistance using intrinsic noise in 3D ICs. Integration 2020, 73, 30–42. [Google Scholar] [CrossRef]
Hua, J.; Zhou, Z.; Zhong, S. Flow misleading: Worm-hole attack in software-defined networking via building in-band covert channel. IEEE Trans. Inf. Forensics Secur. 2021, 16, 1029–1043. [Google Scholar] [CrossRef]
Adesso, P.; Cirillo, M.; Di Mauro, M.; Matta, V. ADVoIP: Adversarial detection of encrypted and concealed VoIP. IEEE Trans. Inf. Forensics Secur. 2020, 15, 943–958. [Google Scholar] [CrossRef]
Yilmaz, B.B.; Callan, R.L.; Prvulović, M.; Zajić, A.G. Capacity of the EM covert/side-channel created by the execution of instructions in a processor. IEEE Trans. Inf. Forensics Secur. 2018, 13, 605–620. [Google Scholar] [CrossRef]

Figure 1. Generations of a complex metamorphic virus [1].

Figure 2. Matrix

c (N_{i})

associated with a node in a hierarchical attack.

Figure 2. Matrix

c (N_{i})

associated with a node in a hierarchical attack.

Figure 3. Hasse diagram of the poset

P = {a, b, c, d, e, f}

.

Figure 3. Hasse diagram of the poset

P = {a, b, c, d, e, f}

.

Figure 4. Hasse diagrams of a poset

P_{(a, b)}

with a suitable pair of points

(a, b)

and its corresponding derived poset

{P^{'}}_{(a, b)}

.

Figure 4. Hasse diagrams of a poset

P_{(a, b)}

with a suitable pair of points

(a, b)

and its corresponding derived poset

{P^{'}}_{(a, b)}

.

Figure 5. Diagram of an equipped poset

P

induced by a hierarchical attack.

Figure 5. Diagram of an equipped poset

P

induced by a hierarchical attack.

Figure 6. Diagram of an equipped poset

P^{d}

induced by a malware detection.

Figure 6. Diagram of an equipped poset

P^{d}

induced by a malware detection.

Figure 7. Diagram of an equipped poset

P^{d}

induced by a malware detection.

Figure 7. Diagram of an equipped poset

P^{d}

induced by a malware detection.

Figure 8. Diagrams of hierarchical attacks with and without hidden malware.

Figure 9. Images associated with the subspace

N_{0}

, in the first step of the malware detection process, we extract simple malware of type

M_{j}

generating subspaces

U_{N_{j}^{-}}

. In the second step, the algorithm extracts hidden (hard) malware

M_{j}

defining subspaces

U_{N_{j}^{+}}

also denoted

H_{j}

.

Figure 9. Images associated with the subspace

N_{0}

, in the first step of the malware detection process, we extract simple malware of type

M_{j}

generating subspaces

U_{N_{j}^{-}}

. In the second step, the algorithm extracts hidden (hard) malware

M_{j}

defining subspaces

U_{N_{j}^{+}}

also denoted

H_{j}

.

Figure 10. Images associated with subspaces

U_{N_{1}}

,

U_{N_{1}^{-}}

, and

U_{N_{1}^{+}}

.

Figure 10. Images associated with subspaces

U_{N_{1}}

,

U_{N_{1}^{-}}

, and

U_{N_{1}^{+}}

.

Figure 11. Images associated with subspaces

U_{N_{2}}

,

U_{N_{2}^{-}}

, and

U N_{2}^{+}

.

Figure 11. Images associated with subspaces

U_{N_{2}}

,

U_{N_{2}^{-}}

, and

U N_{2}^{+}

.

Figure 12. Images associated with subspaces

U_{N_{3}}

,

U_{N_{3}^{-}}

, and

U_{N_{3}^{+}}

.

Figure 12. Images associated with subspaces

U_{N_{3}}

,

U_{N_{3}^{-}}

, and

U_{N_{3}^{+}}

.

Figure 13. Images associated with subspaces

U_{N_{4}}

,

U_{N_{4}^{-}}

, and

U_{N_{4}^{+}}

.

Figure 13. Images associated with subspaces

U_{N_{4}}

,

U_{N_{4}^{-}}

, and

U_{N_{4}^{+}}

.

Figure 14. Images associated with subspaces

U_{N_{5}}

,

U_{N_{5}^{-}}

, and

U_{N_{5}^{+}}

.

Figure 14. Images associated with subspaces

U_{N_{5}}

,

U_{N_{5}^{-}}

, and

U_{N_{5}^{+}}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cañadas, A.M.; Mendez, O.M.; Vega, J.D.C. Algebraic Structures Induced by the Insertion and Detection of Malware. Computation 2023, 11, 140. https://doi.org/10.3390/computation11070140

AMA Style

Cañadas AM, Mendez OM, Vega JDC. Algebraic Structures Induced by the Insertion and Detection of Malware. Computation. 2023; 11(7):140. https://doi.org/10.3390/computation11070140

Chicago/Turabian Style

Cañadas, Agustín Moreno, Odette M. Mendez, and Juan David Camacho Vega. 2023. "Algebraic Structures Induced by the Insertion and Detection of Malware" Computation 11, no. 7: 140. https://doi.org/10.3390/computation11070140

APA Style

Cañadas, A. M., Mendez, O. M., & Vega, J. D. C. (2023). Algebraic Structures Induced by the Insertion and Detection of Malware. Computation, 11(7), 140. https://doi.org/10.3390/computation11070140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Algebraic Structures Induced by the Insertion and Detection of Malware

Abstract

1. Introduction

1.1. Motivations

1.2. Contributions

2. Preliminaries

2.1. Malware

2.1.1. Computer Viruses

2.1.2. Using Information Theory to Detect and Insert Malware

2.2. Partially Ordered Sets and Their Representations

The Matrix Problem

3. Main Results

4. Experimental Data

5. Concluding Remarks and Future Work

Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI