1. Introduction
The growing technological development of recent years and the evolution of communication devices have changed our daily habits. The introduction of smartphones and tablets in our lives has allowed greater globalization, in fact, anyone can access the same information at the same time, thus reducing cultural differences and barriers.
However, the technology spread too fast due to the possibility of buying models of smartphones (or tablets) of medium–high quality at affordable prices but also allowing users to download many applications freely.
As result of this rapid spread, users have not had adequate “training” for the correct use of devices. People use smartphones without thinking about the problems related to their use, because these devices handle data of different types everyday, among which, we have personal information, the principal target of cybercriminals [
1,
2].
Among the various operating systems for the mobile environment, the most popular is Android. Android is widespread and has an “open nature”, allowing attackers to hit a significant amount of users quite easily [
3,
4].
On mobile devices powered by the Android operating system, it is possible to download applications of various kinds and with different functionalities. Usually, people consider the official marketplace (for instance, Google Play in the Android environment) when attempting to download desired applications, but sometimes, they use unofficial markets (for instance, AppChina) to obtain applications for free, which usually require payment on the official store, or users will install applications not available on the official Android market [
5].
In most cases, third-party markets turn out to be untrustworthy [
6]; however, we can find infected applications even in official markets [
7]. In this scenario, there are two main actors: the
attackers, characterized by cybercriminals who implement malicious code to attack users and steal their information, and the
defenders, characterized instead by creating tools, usually called anti-malware, that are able to detect the presence of threats [
8]. Anti-malware applications, however, are unable to identify new threats, since recognition can only take place if the information about the threat is already known and stored in their repository [
9].
In order to increase the complexity of the malicious payloads (i.e., the code aimed to perform the harmful action) and thus be capable of evading anti-malware programs, the malware writers developed a new threat called Colluding Attack. A colluding attack consists of dividing the malicious action between two or more applications. In this way, the anti-malware, analyzing any single application will not able to identify the potential threat, since it is the communication between these applications that will launch the attack.
The attack occurs in the following way: we have two infected applications that communicate with each other when the user performs a certain action or when an event occurs in the system. The first application reads the user’s sensitive data and then sends them to the second one, which then transmits them to a 3rd party. We can deduce that the first application has the permission to read the data, while the second one has the permission to connect to a network, thus sharing the information.
In the Android environment, the applications may communicate through the Inter-Component Communication (ICC): this mechanism is useful for the developer by allowing functionality reuse [
10]. The presence of this mechanism means that applications are not always independent from each other, because the inter-application collaboration expects an information exchange between components belonging to the same application or to different applications [
11]. The second possibility mentioned can be used by malicious applications to attack users’ data [
12,
13].
To detect and verify the presence of colluding applications on Android, we developed a method based on the use of model checking techniques [
9]. We propose an heuristic function that reduces the number of candidate applications to the analysis. The function is defined based on the
-calculus temporal logic and model checking. In this work, we focus our attention on different ICC, in particular:
SharedPreferences,
ExternalStorage,
BroadcastReceiver, and
Remote Procedure Calls (RPC).
This kind of communications can be used to launch an attack, as follows:
- -
with the SharedPreferences, it is possible to store key–value pairs of data, configuration, and preferences. So an application can save some settings (containing data information) into a shared preference file, which could be read by the receiving application;
- -
with the ExternalStorage, we consider the ability to store the information in a file. In this case, for example, an application can read data and send them to a second application via an intent. The second application, using the external storage, can send the data though an external network (for instance, Internet);
- -
with the
BroadcastReceiver, it is possible to be notified of an event occurrence at the system level (e.g., receiving an SMS). It is useful for the instant management of special events. Broadcast receivers respond to broadcast messages sent using Intent objects, which can come from the same application or from others applications installed on the device. So, with this component, it is possible to have applications that receive data both via broadcast receivers and SMS messages [
10];
- -
with the RPC, it is possible for an Android application to call a method to execute in a different address space (typically on another Android application), which is coded as if it were a normal (i.e., local) procedure call, without the developer explicitly coding the implementations for the remote interaction: the developer writes the same code whether the subroutine is local to the executing application or remote. Android provides several distinctive components, for instance, Activity and Service, which are able to perform this kind of communication by also sharing data between these components. In this attack, the collusion happens when an Android component (for instance an Activity) starts a component in another application (by invoking a Service in the second application) by also sharing data between the Activity and the Service components.
We chose ICC attacks since they are the most common ones used to launch colluding interactions, as demonstrated in [
14,
15,
16,
17].
The detection of colluding behavior in the Android environment, by exploiting model checking, was already explored by the authors of [
18]. Below, we itemize the main points introduced in this paper with respect to the work in [
18]:
- -
in this paper, we consider the detection of four colluding attacks (i.e.,
SharedPreferences,
ExternalStorage,
BroadcastReceiver, and
RPC) whereas the work in [
18] focused only on the
SharedPreferences colluding attack. This represents the main contribution of the work and, at the best of knowledge authors, this is the first contribution devoted to the detection of these four different harmful colluding attacks;
- -
we propose an algorithm for the automatic detection of collusion. The authors in [
18] propose a set of manually designed properties for the detection of
SharedPreferences collusion. In this work, we propose an algorithm for the automatic generation of properties for the detection of whether two or more applications are performing a colluding attack through
SharedPreferences,
ExternalStorage,
BroadcastReceiver, or
RPC;
- -
the proposed method can also detect colluding attacks performed by more than two applications, whereas the method in [
18] can detect collusion between only two applications. This is important, as colluding attacks are usually perpetrated by more than two applications [
16,
19] in order to have more chances to activate the malicious behavior;
- -
we developed and evaluated 20 (10 aimed to send data and the remaining 10 for data receiving) Android applications performing a colluding attack by exploiting the RPC communication channel;
- -
we experimented with an extended set of colluding and not-colluding Android applications to evaluate the proposed method, also taking into account the colluding applications aimed to share information by exploiting
ExternalStorage,
BroadcastReceiver, and
RPC, and we obtain better performance-related results as compared to [
18], where an accuracy equal to 0.99 is obtained; our proposed method reaches an accuracy equal to 1.
It should be noted that the other works in this research field are focused on the analysis of one application at a time, whereas our method, by analyzing multiple applications together, allows us to identify the presence of an inter-application communication. Furthermore, the other research base their work on the use of machine learning techniques, unlike ours, which is based on model checking techniques. The paper proceeds as follows: in the next section, we provide background information regarding the model checking technique, so that readers can fully grasp the information presented in this paper; in
Section 3, the proposed method for the detection of colluding inter-app communication is presented;
Section 4 presents the experimental analysis results to demonstrate the effectiveness of the proposed approach; the current literature regarding collusion attack detection is discussed in
Section 5; in the last section, conclusions and future research directions are presented.
2. Model Checking Background
Preliminary concepts about the model checking technique implemented in our proposed approach are provided below. To apply model checking, we first represent a system using a formal specification language. In the following, we use the calculus of communicating systems (CCS) in a slightly extended form [
20]. Systems are described in terms of processes. Processes perform actions, evolving to become new (and usually different) processes after each action. The syntax to describe processes is shown in (1):
where
ranges over a finite set of actions
. The action
is called the
internal action. The set of
visible actions,
, ranged over by
l, is defined as
. The set
L, in processes of the form
, is a set of actions such that
; whereas the relabeling function
f, in processes of the form
, is a total function,
, such that the constraint
is respected. Each action
(resp.
) has a
complementary action (resp.
l). It holds that
. Given
, with
we denote the set
. Variable
x ranges over a set of
constant names: each constant
x is defined by a constant definition
, where
p is called the
body of
x.
is the constant defined as
that corresponds to a process whose task is to terminate performing only the action
. The process
cannot perform any action; it is also said to be “deadlocked”. We denote the set of all processes by
.
Given a set
of constant definitions, the standard
operational semantics is given by a relation
.
(
for short) is the least relation defined by the rules in
Table 1. Given a process
p the semantic of
p is the automaton is called
standard transition system for
p and is denoted
.
We now informally explain the semantics of an extended CCS process. Note that there is no rule for the process
, which thus cannot perform any action. In the
Done rule, shown in (2) in
Table 1, the process can perform
and then reaches a deadlocked state.
In
Act rule, shown in (3) in
Table 1, the process
can perform the action
to become the process
p.
Seq and
Seq rules represent the sequentialization of two processes, as shown in (4) and (5) in
Table 1. The
q process can start its execution only when the
p process has terminated its execution by performing the
action.
The
Sum rule, shown in (6) in
Table 1, states that
p and
q are alternative choices for the behavior of
. The operator ∥ expresses the parallel execution. The
Par rule, shown in (7) in
Table 1, shows how processes in a parallel composition can behave autonomously: if the process
p performs
and becomes
, then
performs
and becomes
(similarly for
q). When
Com and
Com rules (shown in (8) and (9) in
Table 1) is used, we say that a
handshake occurs. A handshake occurs only if two processes can simultaneously execute the complementary actions; the handshake results in an internal communication (the action
). When both possesses
p and
q have terminated their execution, the
process becomes
. The operator
, in the
Res rule (shown in (9) in
Table 1), prevents actions in
to be done: if
p can perform
to become
, then
can perform
to become
only if
. In the
Rel rule, the operator
renames actions by means of the relabeling function
f: if
p can perform
to become
, then
can perform
to become
. Finally, a constant
x behaves as
p if
as stated in the rule
Con, (shown in (12) in
Table 1). Roughly speaking, the
Con rule states that a process behaves like its definition.
Once we obtain the formal model of a system
S, we have to prove properties about
S. This is accomplished by using a temporal logic [
21]. We use
mu-calculus logic [
21]—the syntax is reported in (13). We suppose that
Z ranges over a set of variables,
K and
R range over sets of actions
.
The satisfaction of a formula by a state s of a transition system, denoted by , is so defined:
- -
each state satisfies
tt and no state satisfies
ff (as shown by (14) in
Table 2);
- -
a state satisfies
(
) if it satisfies
or (and)
(as shown by (15) in
Table 2).
- -
and
are the modal operators (as shown by (17) and (18) in
Table 2):
is satisfied by a state, which, for every performance of an action in
K, evolves in a state obeying
; while
is satisfied by a state, which can evolve to a state obeying
by performing an action in
K.
In
Table 2 is reported the precise definition of the satisfaction of a closed formula
by a state
s (denoted
).
and
are the fixed point formulae, where
(
) (as shown by (19) and (20) in
Table 2)
binds free occurrences of
Z in
. An occurrence of
Z is free if it is not within the scope of a binder
(
). A formula is
closed if it contains no free variables.
is the least fix-point of the recursive equation
, while
is the greatest one. A transition system
T satisfies a formula
, denoted
, if and only if
, where
q is the initial state of
T. A CCS process
p satisfies
if
.
In (21)–(23), we consider the following abbreviations (where
K ranges over sets of actions and
is the set of all actions):
Finally, once defined the model and the temporal logic properties, we need something enabling us to check whether the model satisfies the defined properties. To this aim, formal verification is considered, a system process exploiting mathematical reasoning to verify if a system (i.e., the model) satisfies some requirement (i.e., the temporal logic properties).
In recent years, several verification techniques were proposed, e.g., in [
22], model checking is considered.
In the model checking technique, the properties are formulated in temporal logic: each property is evaluated against the system i.e., the Labeled Transition System (LTS)-based model. The model checker accepts as input a model and a property, it returns “true” whether the system satisfies the formula and “false” otherwise. The performed check is an exhaustive state space search that is guaranteed to terminate since the model is finite. In this paper, we use the Concurrency WorkBench of the New Century (CWB-NC) (
https://www3.cs.stonybrook.edu/~cwb/), a widespread formal verification environment. The (CWB-NC) is our model checker and checks the presence of the PUT and GET properties in the applications, which finally verifies if there is a collusion. Note that we can easily use
CWB-NC for the extended CCS since, as shown in [
20], the new operators can be easily translated using the standard CCS ones.
3. Colluding Inter-App Communication Detection
In this section, we present the proposed method for the detection of Android-based colluding applications. We remark that our proposal represents the first method designed for the detection of different kinds of colluding attacks: ExternalStorage, SharedPreferences, BroadcastReceiver, and RPC.
Figure 1 shows an overview about the proposed approach.
We began our investigation with an Android
Device, where the installed applications are gathered by obtaining the APKs at the
/data/app url in the device internal memory that is possible to read without root permission [
23]: the permissions on that directory are
rwxrwx–x. The APK extension indicates an Android Package file. This file format, basically a variant of the JAR file, is considered for the distribution and installation of bundled components on the Android mobile platform [
24]. The APK contains all the resources for the application running, for instance, external libraries, images, audio, and the set of .class files of the application, stored in the
classes.dex file, in a format translated for Dalvik, the Android virtual machine.
Once we obtain the APKs of the
Installed Applications, through a reverse engineering process, we obtain the Java bytecode for each APK. We obtain the Java bytecode representation for each method of Android applications by invoking the
(
https://github.com/pxb1988/dex2jar) tool to obtain from the classes.dex file (the executable file targeting the Android virtual machine) the .jar one (the executable targeting the Java virtual machine), and the Byte Code Engineering Library (BCEL) (
https://commons.apache.org/proper/commons-bcel/), to obtain the classes stored in the .jar file their Java bytecode representations. We consider the Java bytecode because is already possible to obtain even the APKs were obfuscated [
25,
26,
27,
28], for instance, using the R8 compiler built-in in the last Android studio release (
https://developer.android.com/studio/build/shrink-code), providing a kind of obfuscation to shorten the names of the classes and methods, but also optimization techniques that can make the code more difficult to read, applying aggressive strategies to reduce the size of the developed application.
Once we obtained the Java bytecode related to the installed APKs, we generated a model for each application (i.e.,
Automata Generation step in
Figure 1). In the following, we explain how we generate the automata. We considered a CCS model for each method of the APK. This is obtained by translating each Java bytecode instruction in a CCS process. The precise definition of the translation can be found in [
6,
7].
With regard to the sequential Java bytecode instructions, the translation is the following:
where
represents the current instruction under analysis, whereas
is the process representing the successive instruction and finally,
is the name of the Java bytecode instruction. Note that to express constant definitions we use the syntax of the Concurrency Workbench of New Century [
20], the formal verification environment used for the experimentation. Thus, instead of
we write
.
An example of CCS translation from the translation of sequential op-code instructions is shown in Listings 1 and 2.
Listing 1. CCS process for Listing 1. |
proc P1 | = | store .M2 |
proc P2 | = | load .M3 |
proc P3 | = | return . nil |
Listing 2. CCS process for Listing 2. |
proc P1 | = | dup .M2 |
proc P2 | = | invokesubstring .M3 |
proc P3 | = | pushConstant .M4 |
proc P4 | = | newStringBuilder .M5 |
proc P5 | = | invokeinit .M6 |
proc P6 | = | pop .M7 |
proc P7 | = | return . nil |
In Listing 1, there are only three actions i.e., store, load, and return, whereas in Listing 2, there are more actions, for instance, related to method invocations (i.e., invokesubstring and invokeinit) to the definition of new Java objects (i.e., newStringBuilder), stack instruction as pop (to discard the top value on the stack) or dup (to duplicate the value on top of the stack), but also the push instruction, aimed to push a variable into the stack (in this case the variable is “Constant”).
Branch instructions are used to change the sequence of the instruction execution. We consider, as explained in
Section 2, the + operator to manage the choice.
A CCS process was built for each method of the application under analysis. Let be an application under analysis. Supposing that the has n methods, i.e., , the CCS representation has n CCS processes.
Once generated, the automata, in terms of CCS process, are considered as an input to the Model Checker with a set of formulae aimed to verify whether the applications exhibit a GET or a PUT operation related to an ExternalStorage, SharedPreferences, BroadcastReceiver, or RPC. For this verification, we exploited a series of Pre Filtering Properties. The automata that give the result TRUE from the Model Checker will compose the Candidate Colluding Apps.
With regard to the SharedPreferences, and similarly for the ExternalStorage, the BroadcastReceiver, and the RPC, an application can execute two different operations on a shared resource: PUT and GET. For the SharedPreferences, we encode the following actions by exploiting the -calculus logic:
- -
when an application executes a PUT action on a shared resource, the formula (
Table 3—
) results true if the following actions are performed:
invokegetSharedPreferences,
invokeedit,
invokeputString/
invokeputInt/
invokeputFloat, and
invokecommit;
- -
when instead an application executes a GET action on a shared resource, the formula (
Table 3—
) results true if the following actions are performed:
invokegetSharedPreferences,
invokegetString/
invokegetInt/
invokegetFloat.
Table 4 shows the
and the
detect the PUT and GET operations, respectively, on the
ExternalStorage. In this case, we encode the following actions by exploiting the
-calculus logic:
- -
when an application executes a PUT action on a shared resource, the formula (
Table 4—
) results true if the following actions are performed:
invokegetExternalStorageDirectory and
invokewrite;
- -
when instead an application executes a GET action on a shared resource, the formula (
Table 4—
) results true if the following actions are performed:
invokegetExternalStorageDirectory and
invokereadFully.
Table 5 shows the
and the
detects PUT and GET operations, respectively, on the
BroadcastReceiver. In this case, we encode the following actions by exploiting the
-calculus logic:
- -
when an application executes a PUT action on a shared resource, the formula (
Table 5—
) results true if the following actions are performed:
invokeputExtra,
invokesendBroadcast;
- -
when instead an application executes a GET action on a shared resource, the formula (
Table 5—
) results true if the following actions are performed:
invokegetStringExtra,
invokestart.
In
Table 6, we show the
and the
detect PUT and GET operations, respectively, when the
RPC channel is exploited. In this case, we encode the following actions by exploiting the
-calculus logic:
- -
when an application executes a PUT action on a Remote Procedure Call (RPC) channel, the formula (
Table 6—
) results true if the following actions are performed:
invokeputExtra,
invokestartService;
- -
when instead an application executes a GET action on a RPC channel, the formula (
Table 6—
) results true if the following actions are performed:
invokegetStringExtra and
invokevirtual.
We highlight that the CWB-NC is exploited in two different times by the proposed method: the first one to detect the methods candidate for the collusion (by detecting and ), and the second time to detect the applications that collude with each other, by identifying also the shared resources.
The idea behind these properties is to obtain in a short time window four different sets of applications, each set related to a different collusion attacks (i.e., SharedPreferences, ExternalStorage, BroadcastReceiver, and RPC), and in each set we have the classes (i.e., the CCS automata) verifying the PUT and the GET properties for each collusion attack. The reduction in processing costs is given thanks to model checking (which considers as input a class modeled in terms of CCS and a temporal logic formula), which performs a screening and selects only the classes that could potentially generate a collusion. In this way, the number of classes to be tested is significantly reduced.
Once obtained from the
Installed Applications the set of the applications that can potentially exhibit colluding behavior, the next steps depicted in
Figure 1 detects whether two or more applications can effectively share information, thus performing a collusion attack.
Thus, for each automata belonging to the Candidate Colluding Apps, we generate a simplified version containing only the action that can be involved in the collusion attack: for this reason, the actions that cannot be involved in the collusion attacks are set with actions. In particular, the models resulting from the -automata step contains the push action, symptomatic of read and write operations on a variable and a set of actions discriminating the SharedPreferences, the ExternalStorage, the BroadcastRecevier, and the RPC colluding attacks. From the -automata models we automatically generate a set of properties, aimed to verify whether two or more applications can effectively perform a colluding attack.
Let us better understand how we built the -automata with an example. In Listing 3, we show an example of automaton related to a PUT operation of an ExternalStorage collusion attack, and in Listing 4, we show the related -automaton.
Listing 3. CCS process for Listing 3. |
proc M1 | = | invokeinit .M2 |
proc M2 | = | invokegetExternalStorage .M3 |
proc M3 | = | pushConstant1 .M4 |
proc M4 | = | store .M5 |
proc M5 | = | pushConstant2 .M6 |
proc M6 | = | load .M7 |
proc M7 | = | pushConstant3 .M8 |
proc M8 | = | invokewrite .M9 |
proc M9 | = | return . nil |
Listing 4. CCS process for Listing 4. |
proc M1 | = | t .M2 |
proc M2 | = | invokegetExternalStorage .M3 |
proc M3 | = | pushConstant1 .M4 |
proc M4 | = | t .M5 |
proc M5 | = | pushConstant2 .M6 |
proc M6 | = | t .M7 |
proc M7 | = | pushConstant3 .M8 |
proc M8 | = | t .M9 |
proc M9 | = | t . nil |
The automaton shown in Listing 3 results in
true to the property related to the
ExternalStorage PUT (i.e.,
in
Table 4): as a matter of fact it shows an
action and, after an undefined number of actions, the
one. Listing 4 shows the
-automaton related to the automaton shown in Listing 3: all the actions are translated into
actions (not of interest for the collusion detection), with the exception of the actions involving a
push operation (in this example
,
, and
, respectively, to push into the stack the Constant1, Constant2, and Constant3 variables) and the
one. In fact, we recall that for the GET and PUT automata, in addition to the push actions, the
-automata consider the
action for the
ExternalStorage colluding attack whereas, for the
SharedPreferences attack we consider the
action and, for the
BroadcastReceiver attack we consider the
,
(for the PUT),
,
(for the GET), and the push under analysis (for both the GET and PUT
BroadcastReceiver properties).
Once obtained, all the
-automata for the GET and the PUT operations for the three collusion attacks we considered, we designed an algorithm, the flowchart of which is shown in
Figure 2 for the automatic
Colluding Properties Generation step, in particular, for the generation of the properties for the
ExternalStorage, the
SharedPreferences, the
BroadcastReceiver, and the
RemoteProcedureCall colluding detection.
The idea behind the designed algorithm is to automatically infer a set of properties and, whether a PUT and a GET automata (related to SharedPreferences, ExternalStorage, BroadcastReceiver, or the RemoteProcedureCall) result in true to the properties including the same push action, in this case, we mark the PUT and the GET automata as performing a colluding attack.
Below, we explain the algorithm steps in detail by considering the flowchart shown in
Figure 2.
For each attack, we consider i.e.,
SharedPreferences,
ExternalStorage,
BroadcastReceiver, and
RPC, we take as input the set of
-automata related to the PUT operation (
set of PUT t-automata in the flowchart in
Figure 2). For each PUT automaton, we obtain all the PUSH operations (by performing a
operation for displaying the set of visible actions of the automata (
http://courses.cs.vt.edu/cs5204/fall00/CWB/top-level-coms.html)) and, for each action considering a push operation we automatically generate rules involved the push and following actions:
- -
with regard to the SharedPreferences, we consider the and the push under analysis;
- -
with regard to the ExternalStorage, we consider the and the push under analysis;
- -
with regard to the BroadcastReceiver, we consider the , (for the PUT property), , and (for the GET property);
- -
with regard to the RemoteProcedureCall, we consider the , (for the PUT property), , and (for the GET property).
The properties are automatically generated by looking only the PUT
-automata, as shown from the flowchart in
Figure 2.
For instance, considering the PUT
ExternalStorage -automaton shown in Listing 4, the properties automatically generated by the proposed algorithm are shown in
Table 7: for each push action of the
-automaton a property is generated.
This process is repeated for each push action belonging to the PUT
-automata considered. When all the properties are generated for all the push actions of all the
-automata, the properties are stored in a file (
write generated properties in the flowchart in
Figure 2). Subsequently, the model checker with the PUT and GET
t-automata and the properties obtained from the
Colluding Properties Generation step in
Figure 1 is invoked: the properties are first verified on the PUT
t-automata and the formulae resulting TRUE are then checked with the GET
t-automata: if the same property is resulting TRUE also on the GET
-automata there is a collusion. In this way, we found that the colluding attack is possible between the PUT and GET
t-automata i.e., between the (two or more) applications, whose PUT and GET models result in TRUE to the automatically generated property.
Once we explained how the proposed approach works, below we present an example to better understand how the CCS automaton are built and how the automatic properties can detect the colluding attack. We consider two real-world applications performing a colluding attack by exploiting the ExternalStorage, in particular we consider a application (identified by the hash) and a application (identified by the hash).
In
Figure 3, we show the bytecode related to the Android application performing the
with the
ExternalStorage (1-Bytecode PUT in
Figure 3), the CCS automaton generated by this bytecode snippet (2-CCS PUT in
Figure 3) and the related
-automaton (indicated as 3-CCS
-PUT in
Figure 3). The instructions in the bytecode performing the
ExternalStorage are in lines 146 (the invocation for the SD card access) and 151 (the resource used fro data writing) and are translated with the processes in lines 4 and 8 of the CCS automa, as shown from
Figure 3.
Figure 4 shows the properties automatically generated. In particular, coherently with the formulae explained in
Table 7, for each
actions a formula is generated. In the method related to the
ExternalStorage behaviors there are push actions related to following variable:
, 2500,
r,
,
,
, and
.
As shows from
Figure 4 for each
action a property is generated. These properties, automatically generated, are evaluated on the
-automaton and, whether a property is resulting
, we consider this
application as candidate to perform a
in a colluding action. In this case, the property resulting in
by the model checker is the one marked by the red arrow in
Figure 4.
Once the
model is detected, with the relative property, able to potentially perform a collusion, in order to verify whether the collusion can be successfully perpetrated, there is the need to find the
application.
Figure 5 shows a snippet belonging to an Android application performing a
operation on an
ExternalStorage. In particular, we show the bytecode related to the Android application performing the
with the
ExternalStorage (1-Bytecode GET in
Figure 5), the CCS automaton generated by this bytecode snippet (2-CCS GET in
Figure 5) and the related
-automaton (indicated as 3-CCS
-GET in
Figure 5).
To detect whether a collusion attack is possible, we have to invoke the model checker and verify whether the property resulting on the -automaton is also resulting on the -automaton: this is symptomatic of a collusion attack, in this case, based on the ExternalStorage, which is perpetrated between these two applications.
Once collusions are detected, the proposed method shows a report as an output, where we indicate the applications involved in the collusion, the classes and methods of these applications that are related to the collusion, the name of the shared variable (i.e., the push action) and the kind of collusion attack (i.e., SharedPreferences, ExternalStorage, BroadcastReceiver, and/or RemoteProcedureCall).
5. State-Of-The-Art Literature
The anti-malware detection techniques actually converge their attention on the analysis of one sample at a time, which excludes the presence of communication channels between the applications. The research community is working on Inter-Component Communication to develop innovative tools and methods to identify these channels.
A particular kind of colluding attack is characterized by
covert channel; the authors in [
31], developed a multichannel communication mechanism to transfer sensitive data securely on mobile devices, called the Multichannel Communication System (MSYM). It uses the VpnService interface (
https://developer.android.com/reference/android/net/VpnService) provided by the Android platform, and thus is able to intercept the network data sent and then split it into different parts that will be disordered and encrypted through multiple transmission channels.
The accelerometer sensor represents a type of covert channel and is able to generate signals that reflect users’ motions. These data can be read by malicious applications, but correctly using the device’s vibration engine, the stolen data can be encrypted, so one application can cause acceleration data to be received and decrypted by another application. For this reason, two Android applications (representing the source and the sink, respectively) were developed and [
32] tested on three different smartphone models to verify this type of communication.
Usually the covert channels are divided into the covert storage channel and covert timing channel [
33]. The difference is that:
- -
the covert storage channel hides the secret information in the data transmission protocol and achieves covert communication using payload or non-payload fields:
- -
the covert timing channel encodes the secret information and achieves the covert communication using the time gap between packets.
There are many methods able to detect specific covert timing channels, but the authors in [
34] worked to perform the filtering, grouping, and other operations regarding the communication packets. They also chose a suitable machine learning algorithm to train the classification model, obtaining good detection performance.
The magnetometers, better known as magnetic sensors found in devices such as smartphones and tablets are typically used for positioning and orientation. They can be exploited by the attackers to exfiltrate information from isolated and non-networked computers. MAGNETO [
35] is a proposal of covert communication about air-gapped systems and nearby smartphones via magnetic fields generated from the CPU. The magnetic signals can be generated from the computers changing the CPU workload, but this covert channel works only for short distances and with low transmission rate. It is effective in unconstrained environments having the wireless communications blocked.
TaintDroid [
1] is an extension of the Android operating system and specifically tracks the flow of privacy-sensitive data passing through third-party applications. In most cases, downloaded third-party applications are not reliable, which is why the approach monitors, in real time, how these applications access users’ personal data and manipulate them.
Inter-Component Communication based Taint Analysis tool (IccTA) [
36] is a tool that uses the static taint analysis technique, its goal is to recover paths where privacy and sensitive information are sent outside without users being aware and therefore without their permission. The approach is able to detect paths within a single component or between multiple components. For testing IccTA, the developer investigated 22 applications containing ICC-based privacy leaks.
Concerning the IccTA, the researchers in [
37] have developed Amandroid, a tool that focuses its attention on the detection of leaks via an ICC analysis. The execution of the ICC analysis involves the generation of two components, called Inter-Component Data Flow Graph (ICDFG) and Data Dependence Graph (DDG), respectively. For each component, it performs a data flow and data dependency analysis, and also tracks the communication exchange between them, providing a customized analysis on Android applications.
An approach called DroidSafe [
38] performs static information flow analysis, reporting any sensitive data leaks from Android applications. The Soot Java Analysis Framework was used to develop DroidSafe and it works by analyzing one application at a time, for this reason, it is not able to detect a collusion attack that is caused by two or more applications.
Another methodology for the inter-application communication threats detection is MR-Droid [
39]. It is focused, in particular, on intent spoofing and collusion, and to work uses a framework based on MapReduce to execute a compositional application analysis.
The authors of SneakLeak [
40] developed a new tool for detection of application collusion, based on model checking. It works by analyzing multiple applications simultaneously and is able to identify set of suspicious applications that may be involved in a collusion attack. The tests were executed on a set of Android applications belonging to the DroidBench data-set, which show collusion through inter-application communication.
In [
41], researchers developed a tool to inspect the manifest file of an APK to dynamically detect an application collusion. It identifies the permission request with respect to the permission shown to the user during installation and it executes a dynamic check on the APIs used within the applications and with which other applications the original APK is communicating.
AppHolmes [
42] is a static analysis tool to detect app collusion. This tool extracts the manifest and the smali code from the APKs. Then, it performs a static analysis on the smali code, identifying possible Intent values from all callsites of startService(), bindService(), and sendBroadcast(). These callsites can cause collusion with the corresponding Intents. Finally, it execute a match between all detected Intents and IntentFilters obtained with the manifest files. If an Intent from app A matches to an IntentFilter from app B, it is reported a collusion by AppHolmes.
UpDroid [
43] is a data-set entirely composed by update attacks. It contains 21 families and 2479 samples. This data-set consists of malicious applications that use updating techniques in order to elude the detection systems. In an update attack, the original application is considered benign, so in not the installation of the application to launch the attack. The malicious payload is introduced during the application update. To address this problem, the authors in [
44] propose a technique that applies formal methods for the update attack identification, which are able to localize the code portion that implements the download. The tool transform the APKs in CCS file and then translates the Java bytecode into CCS process specifications, which will be transformed into formal models.
The formal methods are also used by the authors in [
45], where they propose a technique based on formal methods to detect ransomware on smartphone devices through the Java bytecode. Instead, the authors in [
46] propose another approach based on gathering information about system behavior using a virtual machine for the execution. If there is a Python script on the machine, it is likely that the script and its generation could be destroyed. When the program obtains information about the process, it sends it to a monitor via a secure connection channel. The monitoring systems is based on Linux in order to minimize ransomware infection.