SlideShare a Scribd company logo
In-network Aggregation Techniques for Wireless
Sensor Networks: A Survey
Elena Fasolo†
, Michele Rossi†
, J¨org Widmer and Michele Zorzi†
†
DEI, University of Padova, via Gradenigo 6/B – 35131, Padova, Italy
DoCoMo Euro-Labs, Landsberger Strasse 312 – 80687 Munich, Germany
Abstract— In this paper, we provide a comprehensive review
of the existing literature on techniques and protocols for in-
network aggregation in wireless sensor networks. We first define
suitable criteria to classify existing solutions, and then describe
them by separately addressing the different layers of the protocol
stack while highlighting the role of a cross-layer design approach,
which is likely to be needed for optimal performance. Throughout
the paper we identify and discuss open issues, and propose
directions for future research in the area.
I. INTRODUCTION
Recent advances in technology make it feasible to mass
produce small sensor nodes with sensing, computation, and
communication capabilities. This has spurred a substantial
amount of research on wireless sensor networks over the past
few years. For ease of deployment, sensor devices should be
inexpensive, small, and have a long lifetime, which makes
it important to develop very efficient software and hardware
solutions. For this reason, protocols for sensor networks should
be carefully designed so as to make the most efficient use of
the limited resources in terms of energy, computation, and
storage. These restrictions are likely to remain, since in many
cases it is desirable to exploit technological improvements
to develop smaller and more energy efficient devices rather
than making them more powerful. Typical applications envi-
sioned for sensor networks (e.g., environmental monitoring,
surveillance, tracking, etc.), along with the already mentioned
resource-constrained character of sensor nodes, usually result
in very different network requirements and communications
patterns compared to other types of ad hoc network scenarios.
The area of communications and protocol design for sensor
networks has been widely researched in the past few years,
and many solutions have been proposed and compared.
In this survey paper we focus instead on another important
aspect of sensor networks, namely in-network aggregation and
data management. These techniques allow to trade off commu-
nication for computational complexity. Given the application
area, network resource constraints, and the fact that local
computation often consumes significantly less energy than
communication, in-network data aggregation and management
are at the very heart of sensor network research. In particular,
resource efficiency, timely delivery of data to the sink node,
and accuracy or granularity of the results are conflicting goals
and the optimal tradeoff among them largely depends on the
specific application.
Initially, in-network aggregation techniques involved differ-
ent ways to route packets in order to combine data coming
from different sources but directed towards the same desti-
nation(s). In other words, these protocols were simply routing
algorithms which differed from more traditional ad hoc routing
protocols in the metric they used to select the routing paths.
More recently, many additional studies have been published,
addressing not only the routing problem but also mechanisms
to represent and combine data more efficiently. In-network
data aggregation is a complex problem that involves many
layers of the protocol stack and different aspects of protocol
design, and a characterization and classification of concepts
and algorithms is still lacking in the literature.
The aim of the present paper is to provide a taxonomy of
in-network aggregation by defining the main concepts and
covering the most important and recent work in the field.
Our major contributions are, on the one hand, to define
criteria to classify existing solutions and, on the other hand, to
identify and propose directions for future research in this area.
Compared with well-researched topics in sensor networks,
such as for example MAC and routing protocol design, data
aggregation does not seem to have received as much attention,
and we think that it provides many interesting opportunities
for relevant contributions. The goal of this paper is to help
people to get an updated view of this area and to provide a
motivation and a starting point for researchers and students
who are interested in these issues.
The paper is organized as follows. In Section II we define
the in-network aggregation paradigm, by identifying the main
problems involved and giving some criteria to classify existing
algorithms. In Section III we discuss theoretical performance
limits of in-network aggregation techniques. In Section IV we
introduce some protocol issues in the presence of in-network
processing, classify the most recent solutions, and discuss
their advantages and disadvantages. In Section V we focus on
possible techniques to combine data by means of aggregation
functions, highlight how these interact with routing protocols,
and discuss the benefits arising from a cross-layer design
(routing and aggregation). Finally, in Section VI we summarize
the in-network aggregation approaches discussed throughout
the paper, and give directions and motivations for future
research.
II. BASICS OF IN-NETWORK AGGREGATION
In typical sensor network scenarios, data is collected by
sensor nodes throughout some area, and needs to be made
available at some central sink node(s), where it is processed,
analyzed, and used by the application. In many cases, data
generated by different sensors can be jointly processed while
being forwarded towards the sink, e.g., by fusing together
sensor readings related to the same event or physical quantity,
or by locally processing raw data before this is transmitted.
In-network aggregation deals with this distributed processing
of data within the network. Data aggregation techniques are
tightly coupled with how data is gathered at the sensor nodes
as well as how packets are routed through the network,
and have a significant impact on energy consumption and
overall network efficiency (e.g., by reducing the number of
transmissions or the length of the packets to be transmitted).
In-network data aggregation can be considered a relatively
complex functionality, since the aggregation algorithms should
be distributed in the network and therefore require coordina-
tion among nodes to achieve better performance. Also, we em-
phasize that data size reduction through in-network processing
shall not hide statistical information about the monitored event.
For instance, when multiple sensors collaborate in observing
the same event, the number of nodes reporting it and the
timings of the reports may reveal the event’s size and/or
dynamics, respectively.
We define the in-network aggregation process as follows:
In-network aggregation is the global process of
gathering and routing information through a multi-
hop network, processing data at intermediate nodes
with the objective of reducing resource consumption
(in particular energy), thereby increasing network
lifetime.
We can distinguish between two approaches:
• In-network aggregation with size reduction refers to the
process of combining and compressing data coming from
different sources in order to reduce the information to
be sent over the network. As an example, assume that
a node receives two packets from two different sources
containing the locally measured temperatures. Instead of
forwarding the two packets, the sensor may compute the
average of the two readings and send it in a single packet.
• In-network aggregation without size reduction refers to
the process of merging packets coming from different
sources into the same packet without data processing:
assume to receive two packets carrying different physical
quantities, e.g., temperature and humidity. These two
values cannot be processed together but they can still be
transmitted in a single packet, thereby reducing overhead.
The first approach is better able to reduce the amount of data
to be sent over the network but it may also reduce the accuracy
with which the gathered information can be recovered at the
sink. After the aggregation operation, it is usually not possible
to perfectly reconstruct all of the original data.1
The second
approach, instead, preserves the original information (i.e., at
the sink, the original data can be perfectly reconstructed).
Which solution to use depends on many factors including the
type of application, the data rate, the network characteristics,
and so on. Both of the above strategies may involve the
treatment of data at different network layers.
In-network aggregation techniques require three basic in-
gredients: suitable networking protocols, effective aggregation
functions and efficient ways of representing the data (see
1This actually depends on the type of aggregation function in use, i.e., lossy
or lossless.
Fig. 1). In the remainder of this section we briefly introduce
each of these aspects.
Routing Protocols [1]–[9]: The most important ingredient for
in-network aggregation is a well designed routing protocol.
Data aggregation requires a different forwarding paradigm
compared to classic routing. Classic routing protocols typically
forward data along the shortest path to the destination (with re-
spect to some specified metric). If, however, we are interested
in aggregating data to minimize energy expenditure, nodes
should route packets based on the packet content and choose
the next hop in order to promote in-network aggregation. This
type of data forwarding is often referred to as data centric
routing. According to the data centric paradigm, as a node
searches for the relay nodes, it needs to use metrics which
take into account the positions of the most suitable aggregation
points, the data type, the priority of the information, etc.
Altogether, the application scenario, routing scheme, and data
aggregation mechanism are closely interrelated.
Moreover, in-network aggregation techniques may require
some form of synchronization among nodes. In particular, the
best strategy at a given node is not always to send data as soon
as it is available. Waiting for information from neighboring
nodes may lead to better data aggregation opportunities and,
in turn, improved performance. Timing strategies are required
especially in the case of monitoring applications where sensor
nodes need to periodically report their readings to the sink.
These strategies usually involve data gathering trees rooted at
the sink. The main timing strategies proposed so far in the
literature are summarized below [10]:
• Periodic simple aggregation requires each node to wait
for a pre-defined period of time, to aggregate all data
items received, and then to send out a packet with the
result of the aggregation.
• Periodic per-hop aggregation is quite similar to the
previous approach, the only difference being that the
aggregated data is transmitted as soon as the node hears
from all of its children. This requires that each node
knows the number of its children. In addition, a timeout
is used in case some children’s packets are lost.
• Periodic per-hop adjusted aggregation adjusts the timeout
of a node, after which it sends the aggregated data,
depending on the node’s position in the gathering tree.
Note that the choice of the timing strategy strongly affects
the design of the routing protocol [10]–[12].
Aggregation Functions [8], [13]–[20]: One of the most
important functionalities that in-network aggregation tech-
niques should provide is the ability to combine data coming
from different nodes. There are several types of aggregation
functions and most of them are closely related to the specific
sensor application. Nevertheless, we can identify some com-
mon paradigms for their classification:
• Lossy and lossless: Aggregation functions can compress
and merge data according to either a lossy or a lossless
approach. In the first case the original values can not
be recovered after having merged them by means of
the aggregation function. In addition, we may lose in
precision with respect to transmitting all readings un-
compressed. In contrast, the second approach (lossless)
allows to compress the data by preserving the original
information. This means that all readings can be perfectly
reconstructed from their aggregate at the receiver side.
• Duplicate sensitive and duplicate insensitive: An inter-
mediate node may receive multiple copies of the same
information. In this case, it may happen that the same
data is considered multiple times when the information is
aggregated. If the aggregation function in use is duplicate
sensitive, the final result depends on the number of
times the same value has been considered. Otherwise, the
aggregation function is said to be duplicate insensitive.
For instance, a function that takes the average is duplicate
sensitive, whereas a function that takes the minimum
value is duplicate insensitive.
Good aggregation functions for wireless sensor networks need
to meet additional requirements. In particular, they should take
into account the very limited processing and energy capabili-
ties of sensor devices, and should therefore be implementable
by means of elementary operations. Also, different devices
may be suitable for different types of operations, depending
on their energy resources and computation capabilities. These
facts need to be considered in the design of aggregation
functions and routing protocols.
Data representation [21]–[24]: Due to its limited storage
capabilities, a node may not be able to store all the re-
ceived/generated information in its internal buffer. It there-
fore needs to decide whether to store, discard, compress,
or transmit the data. All these operations require a suitable
way to represent the information. The corresponding data
structure may vary according to the application requirements.
Finally, even though the data structure is usually common to
all nodes, it should be adaptable to node-specific or location
specific characteristics. A recent and promising method to deal
with data representation and compression is distributed source
coding techniques, that compress data on the basis of some
knowledge about its correlation. More details on the approach
are given in section V-F.
Although we described routing, aggregation and data rep-
resentation in isolation, they are intimately related and should
be designed and implemented jointly for optimal performance.
Most of the related work in the literature covers only partial
aspects of the joint optimization of these functionalities, and
often neglects or oversimplifies some of the others. Further
work on cross-layer optimization for in-network aggregation
should therefore be appreciated as innovative and is very
much needed. In the sequel, we thoroughly review each of
the aforementioned functionalities. We start with a review of
recent work on the theoretical limits of aggregation techniques
in the next section.
III. THEORETICAL LIMITS OF IN-NETWORK
AGGREGATION TECHNIQUES
Several theoretical studies provide limits and bounds on the
performance of in-network data aggregation techniques and
thus assist in the design of suitable algorithms. The efficiency
of these algorithms depends on the correlation among the
data generated by different information sources (sensor units).
Such a correlation can be spatial, when the values generated
by close-by sensors are related, temporal, when the sensor
readings change slowly over time, or semantic, when the
contents of different data packets can be classified under the
same semantic group (e.g., the data is generated by sensors
placed in the same room). The gains of in-network data
aggregation can be best demonstrated in the extreme case when
data generated by different sources can be combined into a
single packet (e.g., when the sources generate identical data).
If there are K sources all close to each other and far away
from the sink, the combination of their data into a single packet
leads, on average, to a K-fold reduction in transmissions with
respect to the case where all data are sent separately. Generally,
the optimal joint routing and compression structure is a Steiner
tree, which is known to be NP hard [25]. However, there exist
polynomial solutions for special cases where the information
sources are close to each other [26]. The authors in [27]
propose a model to describe the spatial correlation in terms
of joint entropy. They analyze a symmetric line network with
different degrees of correlation among neighboring nodes. For
the uncorrelated case, the authors show that the best routing
strategy is to forward packets along shortest paths. In contrast,
in case of completely correlated information, the best strategy
is to aggregate data as soon as possible. After that, a single
packet (formed by the aggregated data) is sent to the sink along
the shortest path. In all the intermediate cases, clustering-based
solutions may be the optimal choice, although no formal proof
is given in the paper.
In [28] the authors study the impact of data correlation on
the energy expenditure of data distribution protocols. They
focus on various energy aware data aggregation trees under
different network conditions, such as node density, source
density, source distribution, and data aggregation degree. The
findings of the study are in agreement with the results in [27]
but in addition provide more quantitative results. In particular,
the authors focus on tree structures and compare the Minimum
Steiner Tree (MST) with the Shortest Path Tree (SPT). The
MST is found to be the optimal aggregation tree structure.
Although the SPT guarantees low delays and can be built
in an online fashion, its performance in terms of aggregation
effectiveness is largely inferior to that of the MST.
In addition, in [28] opportunistic aggregation is compared
to systematic aggregation in terms of cost ratio which is
the cost of the correlation unaware (SPT) tree over that of
the correlation aware (MST) tree considering the same set
of sources and sinks. The authors prove, using an analytical
model, that the expected cost improvement of MST over SPT
in sensor networks increases as O(
√
log N), where N is the
number of nodes in the network. This result makes SPT a
viable solution for many practical cases (small networks).
Based on this study, the authors propose a hybrid tree structure
called SCT (Semantic/Spatial Correlation Tree) [29]. SCT is
based on the identification of an aggregation backbone which
is used to generate efficient aggregation trees, regardless of
sources distribution and density. The aim is to efficiently build
and maintain a network structure for data aggregation. To this
end, the authors of [29] propose a ring-sector subdivision of
the network. A subset of nodes is elected as aggregation nodes
and they are organized in a spanning tree to form the data
aggregation backbone. Each node belonging to the backbone
aggregates messages coming from a certain sub-area.
A further tree-based aggregation algorithm that exploits data
correlation is presented in [30]. It is based on shallow light
trees (SLT) that unify the properties of MST and SPT. In an
SLT, the total cost of the tree is only a constant factor larger
than that of the MST, while the distances (delays) between
any node and the sink are only a constant factor larger than
the shortest paths. In [31], the authors analyze aggregation
properties of a tree structure that is based on an SPT of nodes
close to the sink node, while nodes that are further away are
connected to the leaves of the SPT via paths found by an
approximation algorithm for the traveling salesman problem.
Simulations show that these trees outperform SLTs in many
scenarios.
IV. NETWORKING PROTOCOLS AND HIERARCHIES FOR
IN-NETWORK AGGREGATION
Most of the work done so far on in-network aggregation
deals with the problem of forwarding packets in order to fa-
cilitate the in-network aggregation of the information therein.
Initially, the main ideas were to enhance existing routing
algorithms in such a way to make data aggregation possible. To
this respect, many studies proposed solutions exploiting tree-
based (or hierarchical) structures. These consist of routing
algorithms based on a tree rooted at the sink. Trees are
usually SPTs but some approaches exist which consider more
complex tree constructions. The tree based approaches are
referred to in this paper as classical approaches. Sometimes
the tree structure can be optimized to the type of data to be
gathered. Also, the nodes can be locally grouped into clusters
for improved efficiency. Recently, a few notable exceptions
looked at the problem from a different angle. These papers
address the weaknesses of the tree-based approach by focusing
on multi-path routing. Finally, some very recent schemes
implement a mixture of tree-based and multi-path solutions.
These are referred to here as hybrid approaches to emphasize
the adaptive nature of their routing algorithms.
In the following, we focus on each class of routing protocols
separately (tree-based, cluster-based, multi-path and hybrid)
by reviewing the main concepts and briefly commenting the
pros and cons of each scheme. As seen from the number
of schemes discussed in each subsection, many solutions are
proposed in the tree-based and cluster-based categories. On
the other hand, very few studies use the multi-path and hybrid
approaches. This leaves room for further work in this area.
A. Tree-based Approaches
Classic routing strategies [32], [33] are usually based on
a hierarchical organization of the nodes in the network. In
fact, the simplest way to aggregate data flowing from the
sources to the sink is to elect some special nodes which work
as aggregation points and define a preferred direction to be
followed when forwarding data.
In addition, a node may be marked as special depending on
many factors such as its position within the data gathering
tree [34], its resources [35], the type of data stored in its
queue [36], [37], or the processing cost due to aggregation
procedures [38]. According to the tree-based approach [1],
[3], [6] a spanning tree rooted at the sink is constructed
first. Subsequently, such a structure is exploited in answering
queries generated by the sink. This is done by performing in-
network aggregation along the aggregation tree by proceeding
level by level from its leaves to its root. Thus, as two or more
messages get to a given node, their aggregate can be computed
exactly. However, this way of operating has some drawbacks
as actual wireless sensor networks are not free from failures.
More precisely, when a packet is lost at a given level of the
tree, e.g., due to channel impairments, the data coming from
the related subtree are lost as well. In fact, a single message at
a given level of the tree may aggregate the data coming from
the whole related subtree. In spite of the potentially high cost
of maintaining a hierarchical structure in dynamic networks
and the scarce robustness of the system in case of link/device
failures, these approaches are particularly suitable to design
optimal aggregation functions and perform efficient energy
management. In fact, there are some studies where the sink
organizes routing paths to evenly and optimally distribute the
energy consumption while favoring the aggregation of data at
the intermediate nodes [36], [39], [40]. In [39] the authors
compute aggregation topologies by taking into account the
residual energy of each node through linear programming.
Further algorithms can be found in [34], [35], [41], [42].
In [41] the authors investigate which nodes in the network can
be exploited as aggregation points for optimal performance.
In [34], [42] the focus is on the nodes that should be entrusted
with the transmission of the sensed values, whereas in [35] the
emphasis is put on the proper scheduling of sleeping/active
periods. Often, optimal paths are calculated in a centralized
manner at the sink by exploiting different assumptions on
the data correlation and selecting the best aggregation points
by means of cost functions [43]. Recently, also tree-based
schemes for real time or time-constrained applications have
been proposed [44]–[46].
Finally, a last approach based on aggregation trees relies
on the construction of connected dominating sets [47]. These
sets consist of a small subset of nodes which form a connected
backbone and whose positions are such that they can collect
data from any point in the network. Nodes that do not belong
to these sets are allowed to sleep when they do not have data
to send. Some rotation of the nodes in the dominating set is
recommended for energy balancing.
In the following paragraphs, we review the main routing
approaches based on aggregation trees.
TAG [5] - The Tiny AGgregation (TAG) approach is a
data centric protocol. It is based on aggregation trees and is
specifically designed for monitoring applications. This means
that all nodes should produce relevant information periodically.
Therefore, it is possible to classify TAG as a periodic per-
hop adjusted aggregation approach. The implementation of
the core TAG algorithm consists of two main phases: 1)
the distribution phase, where queries are disseminated to the
sensors, and 2) the collection phase, where the aggregated
sensor readings are routed up the aggregation tree.
For the distribution phase, TAG uses a tree based routing
scheme rooted at the sink node. The sink broadcasts a message
asking nodes to organize into a routing tree and then sends its
queries. In each message there is a field specifying the level,
or distance from the root, of the sending node (the level of the
root is equal to zero). Whenever a node receives a message
and it does not yet belong to any level, it sets its own level to
be the level of the message plus one. It also elects the node
from which it receives the message as its parent. The parent is
the node that is used to route messages towards the sink. Each
sensor then rebroadcasts the received message adding its own
identifier (ID) and level. This process continues until all nodes
have been assigned an ID and a parent. The routing messages
are periodically broadcast by the sink in order to keep the tree
structure updated. After the construction of the tree, the queries
are sent along the structure to all nodes in the network. TAG
adopts the selection and aggregation facilities of the database
query languages (SQL). Accordingly, TAG queries have the
following form:
SELECT{agg(expr), attrs} from SENSOR
WHERE{selPreds}
GROUP BY{attrs}
HAVING{havingPreds}
EPOCH DURATION i
In practice, the sink sends a query, where it specifies the
quantities that it wants to collect (attrs field), how these must
be aggregated (agg(expr)) and the sensors that should be
involved in the data retrieval. This last request is specified
through the WHERE, GROUP and HAVING clauses [5]. Fi-
nally, an EPOCH duration field specifies the time (in seconds)
each device should wait before sending new sensor readings.
This means the readings used to compute an aggregate record
all belong to the same time interval, or epoch.
During the data collection phase, due to the tree structure,
each parent has to wait for data from all of its children before
it can send its aggregate up the tree. Epochs are divided into
shorter intervals called communication slots. The number of
these slots equals the maximum depth of the routing tree. The
slot mechanism gives a nice benefit. As the time is slotted,
sensor nodes can be put to sleep until the next scheduled
transmission interval. In practice, a node goes back to sleep
soon after it has finished sending its readings to its parent.
Data aggregation is performed by all intermediate nodes.
However, in order not to limit TAG to the few and very
simple aggregation functions defined by the SQL language
(such as COUNT, MIN, MAX, SUM, and AVERAGE) a
more general classification is accounted for by partitioning
aggregates according to the Duplicate Sensitivity, Exemplary
and Summary and Monotonic properties [5].
As for most tree-based schemes, TAG may be inefficient in
case of dynamic topologies or link/device failures: as discussed
above, trees are particularly sensitive to failures at intermediate
nodes as the related subtree may become disconnected. In
addition, as the topology changes, TAG has to re-organize the
tree structure and this means high costs in terms of energy
consumption and overhead.
Directed Diffusion [1] - Directed Diffusion is a reactive data
centric protocol. The routing scheme is specifically tailored
for those applications where one or few sinks ask some
specific information by flooding the network with their queries.
Directed Diffusion is organized in three phases (see Fig. 2,
originally shown in [1]): 1) interest dissemination, 2) gradient
setup and 3) data forwarding along the reinforced paths
(path reinforcement and forwarding). When a certain sink is
interested in collecting data from the nodes in the network,
it propagates an interest message (interest dissemination),
describing the type of data the node is interested in, and
setting a suitable operational mode for its collection. Each
node, on receiving the interest, re-broadcasts it to its neighbors.
In addition, the node sets up interest gradients, i.e., vectors
containing the next hop that has to be used to propagate the
result of the query back to the sink node (gradient setup).
As an illustrative example (see Fig. 2), if the Sink sends an
interest which reaches nodes a and b, and both forward the
interest to node c, then node c sets up two vectors indicating
that the data matching that interest should be sent back to
a and/or b. The strength of such a gradient can be adapted,
which may result in a different amount of information being
redirected to each neighbor. To this end, various metrics such
as the node’s energy level, its communication capability and
its position within the network can be used. Each gradient is
related to the attribute it has been set up for. As the gradient
setup phase for a certain interest is complete, only a single
path for each source is reinforced and used to route packets
towards the sink (path reinforcement and forwarding).
Data aggregation is performed when data is forwarded to
the sink by means of proper methods, which can be selected
according to application requirements. The data gathering tree
(i.e., reinforced paths) must be periodically refreshed by the
sink and this can be expensive in case of dynamic topologies.
A tradeoff, depending on the network dynamics, is involved
between the frequency of the gradient setup (i.e., energy
expenditure) and the achieved performance. A valuable feature
of Directed Diffusion consists of the local interaction among
nodes in setting up gradients and reinforcing paths. This allows
for increased efficiency as there is no need to spread the
complete network topology to all nodes in the network.
We observe that attention is to be paid to MAC Layer
design. Consider as an example the IEEE802.11 wireless
technology. As said above, queries are propagated by means
of broadcasts (basic access in IEEE802.11). However, data is
sent back to the sink via unicast transmissions. This means
that when either the node density increases or the duplicate
suppression rule is not used, due to MAC collisions and
subsequent backoffs, the delay may become excessively large.
Hence, the local traffic should be kept at an acceptably low
level in order to avoid collisions. Several approaches [36], [48],
[49] have been proposed to reduce the control traffic generated
by the local interactions among nodes with Directed Diffusion.
In these solutions, the authors use properly defined aggregation
trees with the main purpose of reducing both traffic and delay.
In [48] a modified version of Directed Diffusion, Enhanced
Directed Diffusion (EDD), is proposed. This protocol jointly
exploits Directed Diffusion to collect data and a cluster-based
architecture to increase the efficiency of the local interactions
(decreasing local traffic and related collisions). A similar
approach is investigated in [50].
PEGASIS [3] - The key idea in Power-Efficient GAthering
in Sensor Information Systems (PEGASIS) is to organize the
sensor nodes in a chain. Moreover, nodes take turns to act as
the chain leader, where at every instant the chain leader is the
only node allowed to transmit data directly to the sink. In this
way, it is possible to evenly distribute the energy expenditure
among the nodes in the network. The chain can be built either
in a centralized (by the sink) or distributed manner (by using
a greedy algorithm at each node). In both cases, however, the
construction of the chain requires global knowledge of the
network at all nodes. The chain building process starts with the
node furthest from the sink. Then the closest neighbor to this
node is chosen as the next one in the chain and so on. Nodes
take turns to act as leader according to the following rule:
node i is elected as the leader in round i. If there are N nodes
in the network, rounds cyclically take values in {1, 2, . . . , N}
according to a TDMA schedule. As a consequence, the leader
is not always the same but, during each transmission round, it
is at a different position in the chain. Note that in this scheme
a direct communication channel from each sensor to the sink
is required.
In PEGASIS, each node receives data from a neighbor and
aggregates it with its own reading by generating a single
packet of the same length. Subsequently, such an aggregate
is transmitted to the next node in the chain until the packet
reaches the current chain leader. At this point the leader
includes its own data into the packet and sends it to the sink.
A possible drawback of the scheme comes from the distance
among neighbors. In fact, when the neighbors along the chain
are too distant the energy expenditure can be very high.
In addition, transmission energies are not evenly distributed
but depend on the actual distances between the nodes and
their neighbors, i.e., nodes with distant neighbors dissipate
more energy. PEGASIS can therefore be enhanced by not
allowing such nodes to become leaders, for example using a
threshold-based leader election policy. The main disadvantages
of PEGASIS are the necessity of having a complete view of the
network topology at each node for a proper chain construction
and that all nodes must be able to transmit directly to the sink.
This makes the scheme unsuitable for those networks with a
time varying topology. In addition, also link failures and packet
losses may affect the performance of this protocol. In fact, the
failure of any intermediate node compromises the delivery of
all data aggregated and sent by the previous nodes in the chain.
Hence, some improvements to the scheme may be needed in
order to increase its robustness.
DB-MAC [7] - A different approach to route packets by
performing data aggregation is presented in [7], where the
routing and the MAC protocols are jointly designed. The
primary objective of the Delay Bounded Medium Access
Control (DB-MAC) [7] scheme is to minimize the latency
for delay bounded applications while taking advantage of
data aggregation mechanisms for increased energy efficiency.
DB-MAC adopts a CSMA/CA contention scheme based on
an RTS/CTS/DATA/ACK handshake. The protocol is most
suitable for those cases where different sources sense an event
almost at the same time and, due to the delay constraints, have
to send their measurements right away to the sink. In such
cases, the generated data flows can be dynamically aggregated
while routing them towards the sink. This gives rise to an ag-
gregation tree, which is built on the fly and without having any
knowledge about the network topology. The MAC protocol is
very similar to the IEEE802.11 RTS/CTS Access [51] with
some minor modifications: RTS/CTS messages are exploited
to perform data aggregation and backoff intervals are com-
puted by taking into account the priorities assigned to different
transmissions. In particular, each node takes advantage of the
transmissions from other nodes by overhearing CTSs in order
to facilitate data aggregation. This leads to choosing the relay
node among those nodes that already have some packets to
transmit in their queue. This is implemented to promote data
aggregation with the information stored along the path.
As an example, refer to the scenario in Fig. 3. We have
two nodes S1 and S2 which want to transmit their packets
to the sink using one of their neighbors (R1 and R2 in the
figure) as the relay. At the beginning of the contention, a node
transmits a newly generated packet by setting its priority to the
maximum value. The packet priority is subsequently decreased
at each traversed node. Because Pr(S1) > Pr(S2), S1 wins the
contention for the medium and sends its packet to R1 which
decreases its priority by one unit. After this, Pr(S2) becomes
equal to the priority of the packet just transmitted, which is
now stored at node R1. If S2 is placed in the coverage area
of both S1 and R1, it can overhear all messages exchanged
between these two nodes (remember that the packet at S2 still
has to be forwarded). If this is the case, S2 may now want to
send its packet to R1 instead of R2 as it knows that R1 already
has one packet in its queue (the packet previously transmitted
by S1). This facilitates in-network aggregation. DB-MAC
gives an example of how routing and data aggregation may
influence each other, and shows that, in most cases, energy
efficient solutions are achieved only through a cross-layer
design. The advantage of this strategy is the flexible and
distributed procedure for the construction of aggregation trees,
which appears to be suitable for wireless networks with a
dynamic topology.
Further Algorithms - Regarding the tree-based approaches,
many additional solutions have been proposed to solve the
problem of efficiently constructing aggregation trees. The
authors in [36] define an efficient, distributed and energy
aware heuristics (EADAT) to build the aggregation tree. A nice
feature of such an approach is that the tree construction process
only relies on a local knowledge of the network topology.
Hence, the costs incurred in updating the tree in response to
node mobility, device failures and duty cycles may be limited.
In addition, to further increase the energy savings, the scheme
in [36] uses an aggregation tree rooted at the sink where all
non-leaf sensors perform data aggregation while leaf nodes
can turn off their radios in order to save energy. In [52],
the problem of constructing the optimal aggregation tree is
treated from a game theoretic perspective. The authors develop
a framework including payoff functions that take into account
the path reliability, the path length and the energy constraints
of the nodes. They finally propose and evaluate a couple of
heuristics to implement opportunistic in-network aggregation
strategies. [53] presents a solution for the mobile sink case.
The authors define a protocol to maintain the aggregation tree
in the presence of mobile sinks. In their solution, they rely on
trusted nodes which work as gateways between the network
and the sinks. A further contribution can be found in [54],
where the authors combine a tree based scheme with data
compression based on polynomial regression.
An additional problem related to the aggregation tree is
addressed in [37], where the authors present a set of algo-
rithms to minimize the overall energy consumption of the
sensor nodes in the presence of latency constraints under
the assumption of perfect knowledge of the aggregation tree.
The problem is solved by devising appropriate scheduling
strategies for each node. This contribution is particularly
important for applications requiring a prompt delivery of the
information to the sink. A major drawback, however, is that the
problem of constructing the aggregation tree is not addressed.
Finally, in [55] the Secure Data Aggregation Protocol (SDAP)
scheme is presented. This algorithm addresses the problem
of delivering data over aggregation trees in a secure manner.
Further data aggregation schemes for secure communications
are collected in [56].
B. Cluster-based Approaches
Similarly to tree-based algorithms, cluster-based
schemes [2], [4], [48], [57] also consist of a hierarchical
organization of the network. However, here nodes are
subdivided into clusters. Moreover, special nodes, referred to
as cluster-heads, are elected in order to aggregate data locally
and transmit the result of such an aggregation to the sink.
The advantages and disadvantages of cluster-based schemes
are very similar to those of tree-based approaches.
LEACH [2] - Low-Energy Adaptive Clustering Hierarchy
(LEACH) is a self-organizing and adaptive clustering protocol
using randomization to evenly distribute the energy expendi-
ture among the sensors. Clustered structures are exploited to
perform data aggregation where cluster-heads act as aggre-
gation points. The protocol works in rounds and defines two
main phases: 1) a setup phase to organize the clusters and 2)
a steady-state phase which deals with the actual data transfers
to the sink node.
In the first phase the nodes organize themselves into clus-
ters. Within each cluster a node is elected as the cluster-head.
At the beginning of the setup phase, each sensor elects itself to
be the local cluster-head for the current round. This decision
is made according to a distributed probabilistic approach. The
aim is to have, on average, a percentage P of the nodes
acting as cluster-heads, where P has to be optimally chosen
according to the node density. In practice, sensors calculate
the following threshold:
T(n) =



P
1 − P(R mod (1/P))
if n ∈ G
0 otherwise,
(1)
where P is the desired percentage of cluster-heads, R is the
round number and G is the set of nodes that have not been
cluster-heads during the last 1/P rounds. A given node n
picks a random number [0, 1] and decides to be a cluster-
head if this number is lower than T(n). A cluster-head
sends advertisements to its neighbors using a CSMA MAC.
Surrounding nodes decide which cluster to join based on the
signal strength of these messages. Finally, based on the number
of nodes that are willing to be part of the cluster, each cluster-
head creates a TDMA schedule to optimally manage the local
transmissions.
The actual data transmission starts in the second phase of the
protocol. All source nodes (S in Fig. 4) send their data to their
cluster-heads according to the established schedule. The use
of a TDMA protocol in the data collection phase ensures that
there are no collisions within the clusters, saving both energy
and time. After cluster-heads (CH in Fig. 4) have received all
the data from the active sources, they send them back to the
sink using a single direct transmission (dotted lines in Fig. 4).
If the sink is placed far away from a cluster-head, a high power
may be necessary to successfully deliver the message. Also, a
doze mode is implemented to further save energy. When doze
mode is used, the nodes’ radios may be switched off until their
scheduled TDMA transmission slot. Note that cluster-heads
cannot switch their radio off as they have to receive packets
from potentially all nodes in the cluster. LEACH is completely
distributed in the sense that neither control messages from the
sink nor the distribution of global information to the sensor
nodes are required for correct operation. Moreover, LEACH
outperforms classical clustering algorithms by accounting for
adaptive clusters and rotating cluster-heads.
The LEACH framework also offers the opportunity to
implement any aggregation function at the cluster-heads.
However, several problems may arise in highly dynamic
environments. In this case continuous updates are needed
in order to keep the clusters consistent with the underlying
topology. This requires to send many control messages
which, in turn, may substantially impact the performance. In
addition, in case of mobility additional problems may arise.
A node close to a cluster-head at a given instant in time
may move away from the cluster-head. As a consequence,
the node needs to increase its power, thereby spending much
more energy to transmit to the cluster-head than expected. A
recent improvement of LEACH has been presented in [58]
where further energy savings are achieved by the introduction
of meta-data negotiation.
COUGAR [4], [59] - Cougar is most suitable for moni-
toring applications, where nodes produce relevant information
periodically. The protocol can be classified as a periodic per-
hop aggregation approach. Cougar is basically a clustering
scheme. As soon as the cluster-heads receive all data from the
nodes in their clusters, they send their partial aggregates to a
gateway node. Of course, being similar to LEACH, Cougar
is also affected by the same problems in highly dynamic
environments.
Noticeably, Cougar differs from the previous clustering
based algorithms in the way cluster-heads are elected. Unlike
in LEACH, where each node picks its cluster-head based
on signal strength measurements, in Cougar the cluster-head
selection may be driven by additional metrics. In fact, a node
could be more than one hop away from its cluster-head.
For this reason, the routing algorithm adopted to exchange
packets within clusters is based on the AODV (Ad hoc
On demand Distance Vector) technique. As AODV does not
generate duplicate data packets, Cougar is particularly suitable
to perform in-network aggregation with duplicate sensitive
aggregators. The core Cougar algorithm consists of the node
synchronization engine which ensures that data is aggregated
correctly. Each cluster-head has a waiting list containing all
nodes it expects a message from. The list is updated every
time the node receives a record from a node in its cluster.
The cluster-head does not report its reading to the gateway
until, at time tsend, it hears from all nodes in its waiting list.
A prediction mechanism is also implemented at each cluster-
head in order to infer the instant tsend. In addition, a child
node can determine whether its cluster-head is waiting for
a packet from it and can use a notification packet to refine
the prediction at the cluster-head. Timeouts and backoffs are
implemented to deal with wrong predictions.
In [4], the authors define three different data aggregation
features: Direct delivery, where data aggregation is performed
at the cluster-heads only, Packet merging which consists of
aggregation of packets without size reduction and Partial
aggregation, where data aggregation is implemented at the
intermediate nodes.
Further Algorithms - Many additional studies exploiting
a hierarchical organization of the nodes have been proposed
in the literature. Some of them are improvements of ex-
isting protocols. In [60], for instance, the authors propose
enhancements to the LEACH and PEGASIS schemes. For
the performance evaluation, the authors propose the new Data
Aggregation Quality (DAQ) metric, which is defined as the
ratio between the size of the aggregated data and its joint
entropy. DAQ is an interesting performance measure as it takes
into account both the effectiveness in reducing the size of the
data to be transmitted and the quality of the information. A
further improvement to LEACH is presented in [61], where
a code is added to the data transmission to enhance the
intra-cluster communication security. A similar approach is
proposed in [57] where the cluster-based scheme is enhanced
by a secure transmission protocol called SecureDAV.
[27] presents a location-based clustering scheme where
the sensors self-organize to form static clusters. The data
generated within each cluster is sent to the related cluster-
head along shortest paths, and in-network aggregation is
performed at the intermediate nodes. The cluster-heads send
the aggregated data to the sink through a multi-hop path
without any further aggregation. The cluster size can be varied
to tune the degree of aggregation. The authors in [62] study
the impact of partially correlated data on the performance of
clustering algorithms. They analyze the behavior of multi-hop
routing and, by combining random geometry techniques and
rate distortion theory, predict the total energy consumption
and network lifetime of their cluster-based scheme. Further
cluster-based algorithms for data aggregation can be found
in [63], [64]. An interesting work about clustering and data
aggregation is presented in [65]. Here, a cross-layer approach
is adopted and some issues concerning MAC design are
addressed.
Another work based on a hierarchical organization of the
network is proposed in [11]. Assuming that some algorithms
are used to form an aggregation tree or a cluster-based aggre-
gation structure, the authors propose a scheme to dynamically
adapt the data aggregation period (see [10]) according to the
aggregation quality required by the sink.
A different approach is presented in [66] where a semi-
structured approach, named ToD, is defined in order to alle-
viate the problem of maintaining a hierarchical organization
of nodes in case of dynamic large scale networks. This study
is enriched by simulations with 2000 nodes and experimental
results over 105 Mica2 devices.
C. Multi-path Approaches
In order to overcome the robustness problems of aggregation
trees, a new approach has been recently proposed in [8],
[9], [67]. Instead of having an aggregation tree where each
node has to send the partial result of its aggregation to a
single parent, these solutions send data over multiple paths.
The main idea is that each node can send the data to its
(possibly) multiple neighbors by exploiting the broadcast
characteristics of the wireless medium. Hence, data may
flow from the sources to the sinks along multiple paths
and aggregation may be performed by each node. Observe
that in contrast to the tree-based schemes discussed above,
multi-path approaches allow to propagate duplicates of the
same information. Clearly, such schemes trade a higher
robustness (as multiple copies of the same data can be sent
along multiple paths) for some extra overhead (due to sending
duplicates). An aggregation structure that fits well with this
methodology is called rings topology, where sensor nodes
are divided into several levels according to the number of
hops separating them from the data sink. Data aggregation is
performed over multiple paths as packets move level by level
towards the sink (see Fig. 5). Next, we review the synopsis
diffusion framework which belongs to this class of protocols.
Synopsis Diffusion [8] - The authors of [8] present the Syn-
opsis Diffusion protocol, where data aggregation is performed
through a multi-path approach. The underlying topology for
data dissemination is organized in concentric rings around
the sink. Synopsis Diffusion consists of two phases: 1) the
distribution of the queries and 2) the data retrieval phase. The
ring topology is formed when a node sends a query over the
network. In particular, two different structures, listed below,
can be taken into account. The first type of topology consists
of a simple ring structure. During the query distribution phase,
the network nodes form a set of rings around the querying node
q, which is the only sensor belonging to ring R0. A node is
in ring Ri if it is i hops away from the querying node.
The second type of topology has some improvements that
make it more robust than the previous one and able to cope
with changes in the network. This topology is called Adaptive
Rings. The distribution phase does not change but this time a
node u in ring i keeps track of the number of times, nov, the
transmissions from any node ni−1 in ring i − 1 included its
own data during the last few epochs. That is, node u checks
whether its data is aggregated with the information sent by any
node in ring i−1. If nov is small, u tries to find a better ring in
order to have more of its own data included in the subsequent
transmissions. In fact, rings i, i + 1, i − 2 and i + 2 can also
be considered for aggregating data (rings i−2 and i+2 could
be overheard in case of mobility). To allow for these checks,
the list of all node IDs participating in the construction of the
synopsis (data aggregation result) is included in the header
of each packet. This feature is also exploited at each node as
a sort of implicit acknowledgment. Finally, the decision on
which ring to join is made according to heuristics depending
on ni−1, ni, ni+1, ni+2 and ni−2 [8]. The query aggregation
period is divided into epochs and one aggregate is provided at
the end of each. Specific time slots are allocated within each
epoch and used to schedule the node transmissions in a TDMA
fashion. Sensors can be put to sleep and woken up at their
scheduled transmission slots. The aggregation starts from the
outermost ring, e.g., Ri, proceeds towards the subsequent ring,
e.g., Ri−1 and propagates level by level towards the sink. In
the example in Fig. 5, the data generated at node A can reach
the sink through seven paths: {A, B, F, I, S}, {A, B, F, H, S},
{A, B, F, G, H, S}, {A, C, D, E, I, S}, {A, C, F, H, S}, {A, C,
F, I, S}, and {A, C, G, H, S}. Note that, as the main feature of
Synopsis Diffusion is that data can flow over multiple paths,
a node may receive duplicates of the same information. This
may affect the aggregation result, especially when aggregation
functions are duplicate sensitive. This problem is addressed by
the authors in [8] by proposing proper aggregation functions
and data structures, see Section V-D. On the upside, multi-path
schemes are suitable for networks with frequent packet losses
due to mobility or channel impairments, as the extra overhead
(duplicates) pays off in terms of robustness: if a link fails, the
data may still reach the sink through a different path.
Further Algorithms - Another way to implement multi-path
schemes is based on multiple spanning trees. For instance, the
authors in [68] define a method to provide fault tolerance to
packet losses by forming a Directed Acyclic Graph (DAG).
DAG allows to have multiple parent nodes at each sensor. In
addition, the method ensures correct data transmission timing,
according to the actual hop count of the DAG edges.
D. Hybrid Data Aggregation Approaches
In order to benefit from the advantages of both tree-based
and multi-path schemes, it is possible to define hybrid
approaches which adaptively tune their data aggregation
structure for optimal performance. To the best of our
knowledge, a single work [9] has been proposed with this
aim. The related protocol is presented next.
Tributaries and Deltas [9] - The Tributaries and Deltas
protocol tries to overcome the problems of both the tree and
multi-path based structures, by combining the best features
of both schemes. The result is a hybrid algorithm where both
data aggregation structures may simultaneously run in different
regions of the network. The idea is that under low packet
loss rates, a data aggregation tree is the most suitable struc-
ture due to the possibility of implementing efficient sleeping
modes (see previous sections) and to the good efficiency in
representing and compressing the data. On the other hand,
in case of high loss rates or when transmitting partial results
which are accumulated from many sensor readings, a multi-
path approach may be the best option due to its increased
robustness. Hence, nodes are divided into two categories:
nodes using a tree-based approach to forward packets (also
called T nodes) and nodes using a multi-path scheme (M
nodes). This means that the network is organized in regions
implementing one of the two schemes. The main difficulty is
to link regions running different data aggregation structures.
In doing so, the following rules have to be satisfied [9]:
• Edge Correctness: An edge originating from an M node
can never be incident on a T node. It means that the
aggregation result in a multi-path region can only be
received by an M node (see Fig. 6).
• Path Correctness: M nodes form a subgraph including
the sink node, which is fed by trees composed of T nodes
(see Fig. 6).
According to the above rules, the sink is surrounded by M
nodes only. These form the so called delta region which can
be expanded or shrunk by switching nodes from the tree mode
(T) to the multi-path mode (M) and vice-versa, respectively. In
practice, only the nodes lying along the boundary between the
two regions are allowed to change their operating mode [9]
(see Fig. 6). Expanding the delta region corresponds to in-
creasing the number of paths towards the sink, which is good
when the packet loss probability is high. On the other hand,
shrinking the region is beneficial in case the network is static
and the packet loss probability is small. The user can set a
threshold to specify the minimum percentage of nodes that
should contribute to the aggregation operation. Note that this
percentage increases in case of a wider delta region. In fact,
this implies that more multi-path nodes are available thus
leading to a higher robustness against failures and, in turn, to
more nodes which actively contribute to the aggregation result.
The opposite holds when the delta region is shrunk. To see this,
consider node T5 in Fig. 6. This node is switched to an M
vertex (diagram on the right) and, as a consequence, can now
transmit the aggregated data flow also to nodes M4 and M5.
In particular, M5 can now contribute to the data aggregation
by passing the data coming from node M to lower levels.
In [9] the authors compare the Tributaries and Deltas algo-
rithm to TAG [5] (pure tree-based) and Synopsis Diffusion [8]
(pure multi-path). The simulation results in [9] only focus on
the quality of the gathered information (RMS error), while
disregarding the energy consumption aspect. In particular, they
demonstrate that Tributaries and Deltas guarantees smaller
errors with respect to TAG and that the approach nicely
solves the drawbacks of pure multi-path schemes (Synopsis
Diffusion). The major weakness of this approach is the pos-
sibly high overhead incurred in updating the data gathering
structure. The maintenance of the quite complex network
structure may also be a problem in case of node mobility (this
is also an open issue, not addressed in [9]).
Finally, particular attention is to be paid to the increase in
traffic and therefore to the MAC scheme in use. We stress that
most of the work on data aggregation done so far does not
consider this problem. We emphasize the need for true cross-
layer approaches which jointly consider routing, aggregation
functions and MAC aspects with particular focus on both data
representation efficiency and energy consumption.
V. DATA REPRESENTATIONS AND IN-NETWORK
AGGREGATION FUNCTIONS
As discussed in Section II, the problems of finding a
proper data representation and an optimal aggregation function
are strongly related and complex. The solutions proposed so
far mostly adopt very simple aggregation functions such as
average, median, quantile, min, max, etc. (see [3], [5]). These
strongly reduce the amount of data to be transmitted over the
network but also heavily affect the precision of the transmitted
information (lossy aggregation functions). However, in many
cases, we may be interested in a more detailed represen-
tation of the data, which calls for more complex functions
and data structures (taking into account the spatial, temporal
and semantic correlation of the readings). In this direction,
complex frameworks to provide data fusion rules have been
recently proposed in order to provide cross-layer functions
self-adaptable to the application dynamics [15], [69], [70].
A first improvement to a simple data aggregation function
to take into account the spatial correlation is presented in [13].
In this strategy, the dependence on the distance among nodes
is quantified by a decay function which may, e.g., decay
exponentially with an increasing hop distance [13]. During
the data aggregation, each reading is weighed by a decaying
factor which decreases with the distance to its source. The
framework can be extended by additionally accounting for
temporal and semantic correlation. However, this remains an
open and mostly unaddressed issue.
In the following sections, we describe a selection of in-
network aggregation functions according to our classification
in Section II. We review the simplest methods first, and
subsequently consider more complex approaches. At the end
of the section, we discuss distributed source coding techniques
which perform joint coding of correlated data from multiple
sources in a distributed manner.
A. TiNA [14]
Temporal coherency-aware in-Network Aggregation [14]
(TiNA) works on top of a routing tree (i.e., TAG or Cougar,
see Section IV-A) having the data gathering point (sink) as
its root. It exploits the temporal correlation in a sequence of
sensor readings to reduce energy consumption by suppressing
those values that do not affect the expected quality of the
aggregated data. This is implemented through a TOLERANCE
clause which is added to the SQL query. The tct parameter of
this clause is used to specify the temporal coherency tolerance
for the query. As an example, at a leaf node, each new available
value, Vnew, is compared against the last reported data point,
Vold. Vnew is transmitted (and aggregated) up the tree if and
only if it satisfies the following requirement (data suppression
rule):
|Vnew − Vold|
Vnew
> tct, (2)
TiNA uses the clause GROUP BY of the SQL query to decide
how different messages shall be processed, i.e., two data points
can only be aggregated if they belong to the same GROUP.
The data gathering procedure executed at the internal nodes is
as follows. They first gather and combine packets sent by their
children. If a given node does not receive valid data from any
of its children, it replaces the missing information using the
last reported data from the same child (previously stored in its
buffer). The node then considers its own reading. In case it can
be aggregated with some other data in its buffer (they belong
to the same GROUP), then the reading is aggregated with that
data regardless of the tct value. Doing so, internal nodes can
report their values more often than leaf nodes thus increasing
the accuracy of the aggregation. On the other hand, if the
internal node needs to create a new group, it does so and adds
the new reading only if this data satisfies Eq. (2). The idea is
that new groups are created only when the new measurements
significantly differ from old data points (Eq. (2)).
Moreover, in TiNA a very simple mechanism to counteract
link failures is used. Children, when suppressing data, must
send heartbeat messages to their parent at regular intervals.
The cost of this message is low as it is just a notification
packet. Thanks to these packets each parent knows whether
its children are still alive. Thus it can infer whether the old
readings are to be kept valid. In case of a missing notification,
the appropriate child is discarded until the parent hears from
that child again. These messages can also be used in case of
mobile sensors as nodes change their location in the network.
Finally, the periodic heartbeats allow children to reconnect to
the data gathering tree in case of parent failure.
B. DADMA [16]
Data Aggregation and Dilution by Modulus Addressing
(DADMA) [16] is a distributed data aggregation and dilution
technique for sensor networks where nodes aggregate or
dilute sensed values according to the rules given in an SQL
statement. DADMA treats a wireless sensor network as a
distributed relational database. This database has a single view
which is created by joining records which are locally stored
in the sensor nodes. This technique can be used over well
known routing schemes such as Directed Diffusion [1] and
LEACH [2], see Section IV-A. The sensor network database
view (SNDV) is temporarily created and maintained at the sink
node. The basic idea in DADMA is to aggregate data coming
from a group of sensors or to exclude some sensors from the
data gathering tree. These operations are carried out according
to two simple rules. First, a user can retrieve a subset of data
fields available in an SNDV and aggregate data by using the
following aggregate m function:
fa(x) = x div m. (3)
Moreover, sensor nodes can be excluded from a query by a
dilute m function as follows:
fd(x) = (x/r) mod (m/r). (4)
In the previous equations x is the grid location of a node with
respect to one of the axes, r is the resolution in meters and m
is the aggregation (or dilution) factor. As the sink sends a new
query, it also specifies a based on field and a command that
could be either aggregate or dilute. Each sensor node compares
the result of its aggregation or dilution function with the based
on value and decides its behavior.
For instance, on receiving a dilute m command a node
first uses Eq. (4) to calculate its location indices for both the
horizontal and vertical axes (fd(x) and fd(y)). Subsequently,
it compares these values with the x and y indices included
in the based on field of the query. If they match, the sensor
replies to the query. In a similar way, when an aggregate
m command is received, the values measured by a sensor
node are aggregated with those measured by the other nodes
having the same indices. We observe that such a strategy is a
practical way to take into account the spatial location of the
nodes by, for instance, aggregating only those values coming
from closely placed devices. The author in [16] studies the
performance of DADMA by putting particular emphasis on
the energy savings coming from the reduction of the number
of transmissions and on the probability of event detection.
Moreover, he devises a mechanism to achieve a good tradeoff
among the cost, the accuracy, and the reliability in retrieving
the wanted information. The same concepts are addressed
in [71] where, in addition to the aggregation/dilution schemes,
two location based hash functions are introduced to determine
how the sensed data can be grouped or which sensors should
be excluded from a query.
C. Data Aggregation by means of Feedback Control [72]
The authors of [72] define a strategy to tune the degree of
data aggregation while maintaining specified latency bounds
on data delivery and minimizing the energy consumption. They
consider time-constrained reference scenarios dealing with
real-time applications which impose specific time constraints
to the delivery of sensor measurements. Data is grouped
into different classes associated with different bounds on the
delivery time. The aim is to guarantee the delivery of all data at
the minimum energy cost while satisfying all time constraints.
The data aggregation degree is adapted accordingly to meet
these requirements. If the total communication load exceeds
the system capacity, the amount of data has to be reduced (the
data aggregation degree has to be increased), whereas the data
aggregation degree may be relaxed in case of low traffic. In
the former case, a so called lossy feedback loop mechanism
assigns a data aggregation degree (d) on the basis of load
and capacity estimates. This algorithm runs independently at
each node. Specifically, d is defined as the ratio between
the number of outgoing and incoming packets. For instance,
if d = 0.66, three received packets have to be aggregated
into two packets (e.g., by averaging two of them).2
In the
limiting case where d = 1 no data aggregation is performed.
Moreover, d is continuously adapted according to new load
and capacity estimates. In addition, when the system operates
in a non-overloaded regime, a further strategy called lossless
2Note that all packets have the same size in this case.
feedback loop can be used to reduce the energy consumption.
According to this scheme incoming messages are collected
and transmitted in a single packet without data size reduction.
This solution is interesting for two reasons: the control of
the data aggregation is based on physical measurements of the
network conditions, thus making the mechanism self-adaptable
to the actual network dynamics. Second, it aims at satisfying
time constraints that, in general, are rarely considered by
wireless sensor network algorithms. This solution is extended
in [15], where the authors define a complete data aggregation
framework (AIDA), by considering general aggregation rules.
D. Synopsis Diffusion Framework [8]
A recent solution to the data aggregation problem has been
proposed in [8]. The main contribution of the paper is to
define aggregation functions and data structures which are
robust to considering the same sensor readings in the data
aggregation process multiple times (double-counting problem).
This is crucial when data aggregation is used in conjunction
with multi-path routing schemes (see Section IV-C).
The approach defines order and duplicate insensitive (ODI)
properties whose role is to make sure that the final result
of the aggregation is independent of the routing topology.
That is, the computed aggregate must be the same irrespective
of the order in which the sensor readings are merged and
the number of times they are considered in the aggregation
process. A synopsis is defined as a summary of the partial
result of the overall aggregation process received at a given
node. Three functions on the synopses are possible to perform
data aggregation:
• Synopsis Generation: Given a sensor reading, a synopsis
generation function SG(·) produces the corresponding
synopsis for that data.
• Synopsis Fusion: Given two synopses, a synopsis fusion
function SF(·, ·) generates a new synopsis that summa-
rizes both.
• Synopsis Evaluation: Given a synopsis, a synopsis eval-
uation function SE(·) yields up the final result.
The exact implementations of the functions and the synopsis
definitions are strictly related to the considered aggregation
query. A simple and fast way to check whether a synopsis
diffusion algorithm is ODI-correct is based on the following
four properties:
• Preserves duplicates: if two readings contain the same
data values, the algorithm generates the same synopsis.
• The synopsis function SF(·) is commutative: for any
two synopses s1 and s2 we have that SF(s1, s2) =
SF(s2, s1).
• The synopsis function SF(·) is associative: for any
triple (s1, s2, s3) we have that SF(s1, SF(s2, s3)) =
SF(SF(s1, s2), s3).
• The synopsis function SF(·) is same-synopsis idempo-
tent: for any synopsis s we have that SF(s, s) = s.
The four properties above are necessary and sufficient for ODI-
correctness. More properties and examples can be found in the
related paper [8], where the authors also discuss the advantages
of their solution with respect to TAG [5].
E. The Quantile Digest [21]
Quantile Digest [21] (q-digest) is a data structure for
representing sensor readings with an arbitrary degree of ap-
proximation (trading data size for precision). The data com-
pression algorithm adapts its behavior to the data distribution
by automatically grouping the sensed data into variable size
buckets of almost equal weight. As in [21], we assume that
sensor readings are integer numbers falling within the range
[1, σ]. A q-digest consists of a set of buckets of different
sizes and their associated counts. More specifically, consider
a complete binary tree T. In a q-digest, each element of the
tree v ∈ T can be considered as a bucket with a specific
range. For example, the range associated with the root of the
q-digest is [1, σ] and its two children have ranges [1, σ/2] and
[σ/2 + 1, σ], respectively. In addition, every bucket v ∈ T
has a counter (count(v)) associated with it. The structure is
recursive and ranges are halved as we proceed from the root
to the leaves of the tree. A q-digest is simply a subset of
the (complete) tree which only contains those elements with
positive counts. For its construction, we say that an element
of the original tree v ∈ T is in the q-digest if and only if it
satisfies the following properties:
q1) count(v) ≤ n/k , where n is the number of
readings and k is the compression factor. This rule
ensures that the internal (non-leaf) element v in the
tree does not have a high count.
q2) count(v) + count(vp) + count(vs) > n/k where
vp and vs are the parent and the sibling of v,
respectively.
q3) Since there are no parent and sibling for the root
it can violate property q2). A leaf node is instead
allowed to violate property q1).
In Fig. 7 we show an example illustrating how a q-digest is
built. The example is the same described in [21]. n = 15 is the
number of readings at any one sensor, which are compressed
and summarized in the data structure. The leaf nodes, from
left to right, represent the values 1, 2, . . . , 8 and the number
inside the boxes represent the counts. The compression factor
k is equal to 5 which means that the q-digest has n/k = 3
levels. Finally, σ = 8 is the size of the data interval, where
we assume to collect integer values spanning from 1 to 8.
Consider a set of n = 15 readings within this range, as shown
in Fig. 7(a). The number of buckets needed to store all data is
7. In Fig. 7(a), the children of nodes a, c and d do not satisfy
the digest property (q2). Hence, we compress their values into
a single bucket by getting to the structure in Fig. 7(b). At
this point, node e still does not satisfy property (q2). Hence,
we compress the value therein by getting to Fig. 7(c). Now,
node g still does not satisfy property (q2) and hence a further
compression is needed. This last compression leads us to the
q-digest in Fig. 7(d). Note that only 5 buckets are needed
to store the final result, in spite of the 7 buckets that were
originally needed to store the data without compression. As
can be observed from this example, this procedure results in
a larger loss of accuracy for the readings with a small count.
The compression factor k is used to tune the procedure to the
desired accuracy. It also affects the memory requirements for
storing a q-digest [21].
For its practical implementation, the q-digest structure needs
two functions: 1) to construct the q-digest and 2) to merge two
or more q-digests. The first function is called compress as it
takes the uncompressed q-digest, the number of readings n and
the compression factor k as input and generates a compressed
representation of the q-digest as output (see the above exam-
ple). The second functionality is the merge function which is
used for example when two sensors send their q-digests to
the same parent. The parent merges these two q-digests into a
single q-digest and adds its own values to the new structure.
The merge function first takes the union of the two q-digests,
which is obtained by adding the counts of the buckets with the
same range. After this, it compresses the resulting q-digest by
applying the compress function above. As soon as the q-digest
structure has been built, each sensor packs it and transmits it
to its parent (predecessor node) in the data gathering tree.
In principle, this scheme can be used on top of any routing
protocol that avoids loops and duplicates of the same packet.
We observe, however, that the joint design of these data repre-
sentation and compression techniques with routing algorithms
is still a completely open research issue.
F. Distributed Source Coding
A recent paradigm to perform data aggregation exploits
Distributed Source Coding (DSC). These techniques are based
on the Slepian-Wolf theorem [73], which allows joint coding
of correlated data from multiple sources and without explicit
communication. This is possible as long as the individual
source rates satisfy certain constraints about conditional en-
tropies. These techniques require that the correlation struc-
ture is available a priori at the independent encoders. Ref-
erence [23] gives a good survey on DSC techniques and
related open issues in this emerging field. The probably most
important contribution to DSC was derived by Slepian and
Wolf in their landmark paper [24]. A simple way to encode
and transmit the data generated by two generic sources X
and Y is to apply separate coding with total rate R1 + R2 =
H(X) + H(Y ), where H(·) denotes the entropy of the data
flow. If the two sources can communicate, then they could
coordinate their coding operations and use together a total
rate of H(X, Y ) ≤ R1 + R2. The authors in [24] showed that
two correlated sources can be coded with a total rate equal to
the joint entropy H(X, Y ) even though they are not able to
communicate with each other, as long as their individual rates
are at least equal to the conditional entropies H(X|Y ) and
H(Y |X) respectively. Although different sources do not need
to communicate with each other, they do need to have some
common information about the correlation structure. Towards
this end, the sink node may first collect a certain amount of
data from the network, process it and subsequently send the
proper correlation information to all sensors. Only after this
operation, can each node start compressing its readings.
The theory has been generalized and recently applied to
wireless sensor networks. For instance, in [74] the authors fo-
cus on LDPC codes which are well known for their capacity of
approaching the Shannon limit; Slepian and Wolf proved that
the theoretical limit H(X, Y ) can be reached with equality,
but without devising practical schemes to approach it. In [75],
the authors apply Slepian-Wolf coding in its simplest form by
proving its effectiveness. Note that, in order for Slepian-Wolf
decoding to be effective we need to have a good estimate of
data correlation properties. Accordingly, the scheme in [75]
uses an algorithm, running at the sink, to measure the actual
data correlation. Then, a set of nodes is allowed to send
compressed data, where the compression is achieved locally
and decoding is performed in a centralized fashion at the data
gathering node. At the sink, the uncompressed samples coming
from the sensors that are not allowed to compress are used as
the side information for decoding. Notably, this approach has
the drawback that data is not aggregated along the path to the
sink. Hence, further savings can be achieved by exploiting in
network data fusion on top of the distributed per node data
compression. Also, this approach might be affected by packet
losses as, in such a case, the needed side information might not
be fully available at the sink (decoding entity). In the paper,
the authors discuss these issues but without giving detailed
results. In [76], the authors present and solve the minimum
cost data gathering tree problem. The network is modeled as
a graph G = (V, E), where V and E are the set of vertices
(nodes) and edges, respectively. Slepian-Wolf coding is used at
every node. Moreover, a communication cost we is associated
with each edge e ∈ E. The cost function is assumed to be
separable, i.e., f(xe, we) = xewe, where xe is the amount of
information to be sent over edge e and we is the edge cost (e.g.,
transmission power). The minimum cost data gathering tree
problem consists of finding the spanning tree of G and the rate
allocation for each node in V that minimize the cost function
of the network (i.e., the sum of the costs of all links). The
shortest path tree is optimal for any rate allocation and thus the
optimization problem can be separated into a spanning tree and
a rate allocation optimization subproblems. [76] gives exact
algorithms to solve both of them. Overall, the results in [76]
allow to code the data in a completely distributed fashion by
exploiting the side information in a recursive manner.
The main drawback of this scheme is that it involves the
calculation of an SPT and that it requires the full (centralized)
knowledge of the data correlation structure for all nodes in the
network to express the rate constraints. Lossless encoders can
then separately and independently encode data at each node
as efficiently as if its encoder would see the data values sent
by all other nodes. Notably, the scheme’s inability to tolerate
failures may eliminate this advantage. In fact, if the encoded
bits from one node are lost, the sink may not be able to
reconstruct several sensor values. The authors of reference [77]
highlight the drawbacks of previous approaches [75] [76] when
the network is error prone and, as a partial solution, propose
to exploit multi-path routing schemes. The advantages of their
approach come at the cost of a higher energy consumption to
setup/maintain multiple trees and to transmit multiple copies
(extra overhead) of the same message.
In summary, DSC effectively makes routing and coding
decisions independent of each other. On the downside, how-
ever, this solution increases the computational complexity and
requires the collection of information about joint statistics,
which may not always be easy in practice.
VI. DISCUSSIONS AND CONCLUSIONS
In this paper we have presented a detailed review of in-
network aggregation techniques for wireless sensor networks.
One of the main design aspects for sensor network archi-
tectures is energy efficiency, to keep the network operational
as long as possible. Therefore, aggregation techniques are an
essential building block, as they aim at reducing the number
of transmissions required for data collection which, in turn,
reduces energy consumption.
In this survey, we have provided a definition of in-network
data aggregation and identified its key elements: data dissem-
ination and query mechanisms (with particular focus on the
routing and MAC layer), aggregation functions, and data struc-
ture. Fig. 8 and Fig. 9 summarize the basic characteristics of
the presented solutions and provide a qualitative comparison.
By its very nature, in-network aggregation concerns several
layers of the protocol stack, and any efficient solution is likely
to require a cross-layer design. However, we note that most
of the existing research focuses on networking issues such
as routing, often considering only very simple approaches to
aggregate data. In addition, much work still remains to be done
to provide cross-layer solutions, accounting for application,
data representation, routing and MAC aspects. In fact, the
schemes proposed so far often focus on only a subset of these
aspects, typically trying to merge routing and data aggregation
techniques, but ignoring MAC, application or data representa-
tion issues. Finally, another aspect still not deeply investigated
concerns the memory and the computational resources allow
to sustain data aggregation processing [78].
For routing, many protocols are based on clustering. A
major advantage of a clustered structure is that it directly
allows aggregation of data at the cluster head. Such algorithms
work well in relatively static networks where the cluster
structure remains unchanged for a sufficiently long time, but
they may be fragile when used in more dynamic environments.
Often, the cost required to maintain the hierarchical struc-
ture is substantial. Similar considerations apply to tree-based
schemes. Initial work addresses some of these problems [53],
[79] but further research efforts are required to keep a network
functional under mobility. This last aspect is in fact largely
unexplored, and it is not clear how different protocols perform
in its presence. Also, multi-path algorithms may be able
to deal with (limited) topology changes due to their higher
robustness [8]. An interesting alternative research direction
is provided by reactive and localized routing protocols [7].
This study is also one of the very few that take MAC layer
issues into account [7], [65]. We stress that without such a
joint design, the performance gained at the routing layer may
be partially lost due to MAC inefficiencies. Hybrid algorithms
allow to combine the properties of different approaches. This
is the case for the algorithm in [9], which provides a good
tradeoff between tree-based and multi-path schemes. Hybrid
algorithms allow to tune the degree of aggregation and may
facilitate the adaptation of the aggregation scheme (e.g., to the
packet loss probability). For these reasons, they are particularly
suitable for the design of schemes that are able to adapt to
application needs.
As discussed above, only very few studies provide a deeper
analysis of the aggregation functions. Previous work mostly
takes spatial correlation [13], [80] and temporal correla-
tion [14] of data into account, but semantic correlation is not
sufficiently well studied. In this context, distributed source
coding is a fairly recent and very promising research area.
However, while many theoretical results are known, few of
them have been turned into practical algorithms applicable to
wireless sensor networks.
REFERENCES
[1] C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva,
“Directed diffusion for wireless sensor networking,” IEEE/ACM Trans.
Networking, vol. 11, no. 1, pp. 2–16, Feb. 2002.
[2] W. B. Heinzelman, A. P. Chandrakasan, and H. Balakrishnan, “An
application-specific protocol architecture for wireless microsensor net-
works,” IEEE Trans. Wireless Commun., vol. 1, no. 4, pp. 660–670,
Oct. 2002.
[3] S. Lindsey, C. Raghavendra, and K. M. Sivalingam, “Data Gathering
Algorithms in Sensor Networks using Energy Metrics,” IEEE Trans.
Parallel Distrib. Syst., vol. 13, no. 9, pp. 924–935, Sep. 2002.
[4] Y. Yao and J. Gehrke, “Query processing for sensor networks,” in ACM
CIDR 2003, Asilomar, CA, US, Jan. 2003.
[5] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, “TAG:
a Tiny AGgregation Service for Ad-Hoc Sensor Networks,” in OSDI
2002, Boston, MA, US, Dec. 2002.
[6] Y. Xu, J. Heidemann, and D. Estrin, “Geographic-Informed Energy
Conservation for Ad Hoc Routing,” in ACM/SIGMOBILE MobiCom
2001, Rome, Italy, Jul. 2001.
[7] G. Di Bacco, T. Melodia, and F. Cuomo, “A MAC Protocol for Delay-
Bounded Applications in Wireless Sensor Networks,” in Med-Hoc-Net
2004, Bodrum, Turkey, Jun. 2004.
[8] S. Nath, P. B. Gibbons, Z. R. Anderson, and S. Seshan, “Synopsis
Diffusion for Robust Aggregation in Sensor Networks,” in ACM SenSys
2004, Baltimore, MD, US, Nov. 2004.
[9] A. Manjhi, S. Nath, and P. B. Gibbons, “Tributaries and Deltas:
Efficient and Robust Aggregation in Sensor Network Stream,” in ACM
SIGMOD 2005, Baltimore, MD, US, Jun. 2005.
[10] I. Solis and K. Obraczka, “The Impact of Timing in Data Aggregation
for Sensor Networks,” in IEEE ICC 2004, Paris, France, Jun. 2004.
[11] F. Hu, C. Xiaojun, and C. May, “Optimized Scheduling for Data
Aggregation in wireless sensor networks,” in IEEE ITCC 2005, Las
Vegas, NV, US, Apr. 2005.
[12] X. Jianbo, Z. Siliang, and Q. Fengjiao, “A new In-network data
aggregation technology of wireless sensor networks.” in IEEE SKG’06,
Guilin, China, Nov. 2006.
[13] E. Cohen and H. Kaplan, “Spatially-Decaying Aggregation Over a
Network: Model and Algorithms,” in ACM SIGMOD 2004, Paris,
France, Jun. 2004.
[14] A. Sharaf, J. Beaver, A. Labrinidis, and K. Chrysanthis, “Balancing
energy efficiency and quality of aggregate data in sensor networks,”
The VLDB Journal, vol. 13, no. 4, pp. 384–403, Dec. 2004.
[15] T. He, B. M. Blum, J. A. Stankovic, and T. Abdelzaher, “AIDA:
Adaptive Apllication-Independent data aggregation in wireless sensor
networks,” ACM Trans. on Embedded Computing Systems, vol. 3, no. 2,
pp. 426–457, May 2004.
[16] E. Cayirci, “Data Aggregation and Dilution by modulus addressing in
wireless sensor networks,” IEEE Communications Letters, vol. 7, no. 8,
pp. 355–357, Aug. 2003.
[17] D. Petrovic, R. C. Shah, K. Ramchandran, and J. Rabaey, “Data
Funneling: Routing with Aggregation and Compression for Wireless
Sensor Networks,” in IEEE SNPA 2003, Anchorage, AK, US, May
2003.
[18] M. Riedewald, D. P. Agrawal, and A. El Abbadi, “pCube: Update-
efficient online aggregation with progressive feedback and error
bounds,” in IEEE SSDBM 2000, Berlin, Germany, Jul. 2000.
[19] L. Huang, B. Zhao, A. Joseph, and J. Kubiatowicz,
“Probabilistic data aggregation in distributed networks,”
EECS Department, University of California, Berkeley, Tech.
Rep. UCB/EECS-2006-11, February 6 2006. [Online]. Avail-
able: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-
11.html
[20] X. Wu and Z. Tian, “Optimized Data Fusion in Bandwidth and Energy
Constrained Sensor Networks.” in IEEE ICASSP 2006, Toulouse,
France, May 2006.
[21] N. Shrivastava, C. Buragohain, D. P. Agrawal, and S. Suri, “Medians
and Beyond: New Aggregation Techniques for Sensor Networks,” in
ACM SenSys 2004, Baltimore, MD, US, Nov. 2004.
[22] A. Bezenchek, M. Rafanelli, and L. Tininini, “A data structure for rep-
resenting aggregate data,” in IEEE SSDBM 1996, Stockholm, Sweden,
Jun. 1996.
[23] Z. Xiong, A. D. Liveris, and S. Cheng, “Distributed source coding for
sensor networks,” IEEE Signal Processing Mag., vol. 21, no. 5, pp.
80–94, Sep. 2004.
[24] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information
sources,” IEEE Trans. Inform. Theory, vol. 19, no. 4, pp. 471–480, Jul.
1973.
[25] R. M. Karp, Reducibility among combinatorial problems, in: Complex-
ity of Computer Computations, R. Miller, J.W. Thatcher , Ed. New
York, US: Plenum Press, 1972.
[26] B. Krishnamachari, D. Estrin, and S. Wicker, “The Impact of Data
Aggregation in Wireless Sensor Networks,” in IEEE ICDCS 2002,
Vienna, Austria, Jul. 2002.
[27] S. Pattem, B. Krishnamachari, and R. Govindan, “The Impact of
Spatial Correlation on Routing with Compression in Wireless Sensor
Networks,” in ACM/IEEE IPSN 2004, Berkeley, CA, US, Apr. 2004.
[28] Y. Zhu, K. Sundaresan, and R. Sivakumar, “Practical Limits on Achiev-
able Energy Improvements and Useable Delay Tolerance in Correlation
Aware Data Gathering in Wireless Sensor Networks,” in IEEE SECON
2005, Santa Clara, CA, US, Sep. 2005.
[29] Y. Zhu, R. Vedantham, S. J. Park, and R. Sivakumar, “A Scalable Cor-
relation Aware Aggregation Strategy for Wireless Sensor Networks,”
in IEEE WICON 2005, Budapest, Hungary, Jul. 2005.
[30] P. von Rickenbach and R. Wattenhofer, “Gathering correlated data
in sensor networks,” in ACM DIALM-POMC 2004, Philadelphia, PA,
USA, Oct. 2004.
[31] R. Cristescu, B. Beferull-Lozano, M. Vetterli, and R. Wattenhofer,
“Network correlated data gathering with explicit communication: Np-
completeness and algorithms,” in submitted to IEEE/ACM Transactions
on Networking.
[32] K. Akkaya and M. Younis, “A survey of routing protocols in wireless
sensor networks,” Elsevier Ad Hoc Network Journal, vol. 3, no. 3, pp.
325–349, May 2005.
[33] J. N. Al-Karaki and A. E. Kamal, “Routing techniques in wireless
sensor networks: a survey,” IEEE Wireless Communications, vol. 11,
no. 6, pp. 6–28, Dec. 2004.
[34] I. Solis and K. Obraczka, “Isolines: energy-efficient mapping in sensor
networks,” in IEEE ISCC 2005, Cartagena, Spain, Jun. 2005.
[35] V. Erramilli, I. Matta, and A. Bestavros, “On the interaction between
data aggregation and topology control in wireless sensor networks,” in
IEEE SECON 2004, Santa Clara, CA, US, Oct. 2004.
[36] M. Ding, X. Cheng, and G. Xue, “Aggregation Tree Construction in
Sensor Networks,” in IEEE VTC 2003, Orlando, FL, US, Oct. 2003.
[37] Y. Yu, B. Krishnamachari, and V. Prasanna, “Energy-Latency Tradeoffs
for Data Gathering in Wireless Sensor Networks,” in IEEE Infocom
2004, Hong Kong, Mar. 2004.
[38] H. Luo, J. Luo, Y. Liu, and S. Das, “Energy efficient routing with
adaptive data fusion in sensor Networks.” in Third ACM/SIGMOBILE
Workshop on Foundations of Mobile Computing, Cologne, Germany,
Aug. 2005.
[39] K. Dasgupta, K. Kalpakis, and P. Namjoshi, “An efficient Clustering-
based heuristic for data gathering and aggregation in sensor networks,”
in IEEE WCNC 2003, New Orleans, LA, US, Mar. 2003.
[40] H. Albert, R. Kravets, and I. Gupta, “Building Trees Based On
Aggregation Efficiency in Sensor Networks,” in Med-Hoc-Net 2006,
Lipari, Italy, Jun. 2006.
[41] J. N. Al-Karaki, R. Ul-Mustafa, and A. E. Kamal, “Data aggregation in
wireless sensor networks: exact and approximate algorithms,” in IEEE
HPSR 2004, Phoenix, AZ, US, Apr. 2004.
[42] G. Hartl and B. Li, “infer: A Bayesian Approach towards Energy
Efficient Data Collection in Dense Sensor Networks,” in IEEE ICDCS
2005, Columbia, OH, US, Jun. 2005.
[43] H. O. Tan and I. Korpeoglu, “Power Efficient Data gathering and
aggregation in wireless sensor networks,” ACM SIGMOD Record,
vol. 32, no. 4, pp. 66–71, Dec. 2003.
[44] H. Cheng, Q. Liu, and X. Jia, “Heuristic algorithms for real-time
data aggregation in wireless sensor networks.” in ACM IWCCC 2066,
Vancouver, British Columbia, Canada, Jul. 2006.
[45] Y. Hu, N. Yu, and X. Jia, “Energy efficient real-time data aggregation
in wireless sensor networks.” in ACM IWCCC 2006, Vancouver, British
Columbia, Canada, Jul. 2006.
[46] J. Choi, J. W. Lee, K. Lee, S. Choi, W. H. Kwon, and H. S. Park,
“Aggregation Time Control Algorithm for Time constrained Data
Delivery in Wireless Sensor Networks.” in IEEE VTC 2006-Spring,
Melbourne, Australia, May 2006.
[47] H. Gupta, V. Navda, S. R. Das, and V. Chowdhary, “Efficient Gathering
of Correlated Data in Sensor Networks,” in ACM MobiHoc 2005,
Urbana-Champaign, IL, US, May 2005.
[48] B. Zhou, L. H. Ngoh, B. S. Lee, and C. P. Fu, “A Hierarchical
Scheme for Data Aggregation in Sensor Network,” in IEEE ICON 2004,
Singapore, Nov. 2004.
[49] C. Intanagonwiwat, D. Estrin, R. Govindan, and J. Heidemann, “Impact
of Network Density on Data Aggregation in Wireless Sensor Net-
works,” in IEEE ICDCS 2002, Vienna, Austria, Jul. 2002.
[50] M. Lee and V. W. S. Wong, “An energy-aware spanning tree algorithm
for data aggregation in wireless sensor networks,” in IEEE PacRrim
2005, Victoria, BC, Canada, Aug. 2005.
[51] IEEE LAN MAN Standards, Part 11: Wireless LAN Medium Access
Control (MAC) and Physical Layer (PHY) specifications”, ANSI/IEEE
Std., March 1999.
[52] R. Kannan and S. S. Iyengar, “Game-theoretic models for reliable path-
length and energy-constrained routing with data aggregation in wireless
sensor networks,” IEEE J. Select. Areas Commun., vol. 22, no. 6, pp.
1141–1150, Aug. 2004.
[53] H. S. Kim, T. F. Abdelzaher, and W. H. Kwon, “Minimum-Energy
Asynchronous Dissemination to Mobile Sinks in Wireless sensor
networks,” in ACM SenSys 2003, Los Angeles, CA, US, Nov. 2003.
[54] T. Banerjee, K. Chowdhury, and D. P. Agrawal, “Tree based data
aggregation in sensor networks using polynomial regression,” in IEEE
FUSION 2005, Philadelphia, PA, US, Jul. 2005.
[55] Y. Yang, X. Wang, S. Zhu, and G. Cao, “SDAP: A Secure HopbyHop
Data Aggregation Protocol for Sensor Networks.” in ACM MobiHoc
2006, Florence, Italy, May 2006.
[56] Y. Sang, H. Shen, Y. Inoguchi, Y. Tan, and N. Xiong, “Secure Data
Aggregation in Wireless Sensor Networks: A Survey.” in PDCAT’06,
Taipei, Taiwan, Dec 2006.
[57] A. Mahimkar and T. S. Rappaport, “SecureDAV: A secure data
aggregation and verification protocol for sensor networks,” in IEEE
GLOBECOM 2004, Dallas, TX, US, Nov. 2004.
[58] H. Chen, H. Mineno, and T. Mizuno, “A Meta-Data-Based Data
Aggregation Scheme in Clustering Wireless Sensor Networks.” in
IEEE/ACM MDM 2006, Nara, Japan, May 2006.
[59] Y. Yao and J. Gehrke, “The cougar approach to in-network query
processing in sensor networks,” ACM SIGMOD Record, vol. 31, no. 3,
pp. 9–18, Sep. 2002.
[60] T. Pham, E. J. Kim, and M. Moh, “On data aggregation quality
and energy efficiency of wireless sensor network protocols,” in IEEE
BroadNets 2004, San Jos´e, CA, US, Oct. 2004.
[61] H. Cam, S. Ozdemir, P. Nair, and D. Muthuavinashiappan, “ESPDA:
Energy-efficient and secure pattern-based data aggregation for wireless
sensor networks,” IEEE SENSORS 2003, Oct. 2003.
[62] M. Lotfinezhad and B. Liang, “Effect of partially correlated data on
clustering in wireless sensor networks,” in IEEE SECON 2004, Santa
Clara, CA, US, Oct. 2004.
[63] O. Younis and S. Fahmy, “Distributed clustering in ad-hoc sensor
networks: a hybrid, energy-efficient approach,” in IEEE Infocom 2004,
Hong Kong, Mar. 2004.
[64] V. Mhatre and C. Rosenberg, “Design guidelines for wireless sensor
networks: Communication, clustering and aggregation,” Elsevier Ad
Hoc Networks Journal, vol. 2, no. 1, pp. 45–63, Jan. 2004.
[65] P. Popovski, F. H. P. Fitzek, H. Yomo, T. K. Madsen, R. Prasad, and
N. J. Vej, “MAC-layer approach for cluster-based aggregation in sensor
networks,” in IEEE IWWAN’04, Oulu, Finland, May 2004.
[66] K.-W. Fan, S. Liu, and P. Sinha, “Scalable data aggregation for dynamic
events in sensor networks.” in ACM SenSys 2006, Boulder, CO, US,
Oct. 2006.
[67] S. Chen and Z. Zhang, “Localized algorithm for aggregate fairness in
wireless sensor networks.” in ACM/SIGMOBILE MobiCom 2006, Los
Angeles, CA, US, Sep. 2006.
[68] S. Motegi, K. Yoshihara, and H. Horiuchi, “DAG Based In-Network
Aggregation for Sensor Network Monitoring,” in IEEE SAINT 2006,
Phoenix, AZ, US, Jan. 2006.
[69] U. Ramachandran, R. Kumar, M. Wolenetz, B. Cooper, B. Agarwalla,
J. Shin, P. Hutto, and A. Paul, “Dynamic data fusion for future sensor
networks.” ACM Transactions on Sensor Networks, vol. 2, no. 3, pp.
404–443, Aug. 2006.
[70] M. Wenz and H. Worn, “Event-based Production Rules for Data Aggre-
gation in Wireless Sensor Networks.” in IEEE MFI 2006, Heidelberg,
Germany, Sep. 2006.
[71] E. Cayirci and T. Coplu, “Distributed spatial data aggregation and
dilution based on hashing and relational algebra in wireless sensor
networks,” in IEEE ISSNIP 2004, Melbourne, Australia, Dec. 2004.
[72] T. Abdelzaher, T. He, and J. Stankovic, “Feedback control of data
aggregation in sensor networks,” in IEEE CDC 2004, Atlantis, Paradise
Island, Bahamas, Dec. 2004.
[73] T. M. Cover and J. A. Thomas, Elements of Information Theory. John
Wiley, 1991.
[74] M. Sartipi and F. Fekri, “Source and channel coding in wireless sensor
networks using LDPC codes,” in IEEE SECON 2004, Santa Clara, CA,
US, Oct. 2004.
[75] J. Chou, D. Petrovic, and K. Ramchandran, “A Distributed and Adap-
tive Signal Processing Approach to Reducing Energy Consumption in
Sensor Networks,” in IEEE Infocom 2003, San Francisco, CA, US,
Mar. 2003.
[76] R. Cristescu, B. Beferull-Lozano, and M. Vetterli, “On Network
Correlated Data Gathering,” in IEEE Infocom 2004, Hong Kong, Mar.
2004.
[77] S. Coleri and P. Varaiya, “Fault Tolerant and Energy Efficient Routing
for Sensor Networks,” in IEEE GLOBECOM 2004, Dallas, TX, US,
Nov. 2004.
[78] J. Verd, J. Garca, M. Nemirovsky, and M. Valero, “The impact of traffic
aggregation on the memory performance of networking applications.”
in ACM MEDEA 2004, Antibes Juan-les-Pins, France, Oct. 2004.
[79] K.-I. Hwang, J. In, and D.-S. Eom, “Distributed Dynamic Shared Tree
for Minimum Energy Data Aggregation of Multiple Mobile Sinks in
Wireless Sensor Networks.” in IEEE EWSN06, Zurich, Switzerland,
Feb. 2006.
[80] M. Sharifzadeh and C. Shahabi, “Supporting Spatial Aggregation in
sensor network database,” in ACM GIS 2004, Washington, DC, US,
Nov. 2004.
Elena Fasolo (fasoloel@dei.unipd.it) was born in Padova,
Italy, in 1980. She received the Laurea Degree in
Telecommunications Engineering from the University of
Padova, in 2004. She is currently a PhD Student at the School
in Information Engineering at the Information Engineering
Department, University of Padova, Italy, and she is spending
six months at the NTT DoCoMO European Laboratories
to do research on network coding techniques for wireless
networks. Her main interests are on networking and MAC
protocol layers especially on efficient data gathering and
dissemination schemes.
Michele Rossi (rossi@dei.unipd.it) was born in Ferrara, Italy,
in 1974. He received both the Laurea degree and the Ph.D. in
Information Engineering from the University of Ferrara, Italy,
in 2000 and 2004, respectively. During 2003 he was on leave
at the Center for Wireless Communications at the University
of California San Diego, US. Since November 2005 he is
an Assistant Professor at the Department of Information
Engineering of the University of Padova, Italy. His research
interests include channel access, routing and data distribution
algorithms for wireless networks.
J¨org Widmer (widmer@docomolab-euro.com) is senior
manager at DoCoMo Euro-Labs in Munich, Germany. He
leads the Self-Organized Ambient Networking research group,
working on several projects in the area of wireless ad-hoc
and sensor networks. Before joining DoCoMo, he spent two
years as postdoctoral researcher at EPFL, Switzerland, doing
research on ultra-wide band networks. Joerg Widmer obtained
his M.S. degree and Ph.D. degree in computer science from
the University of Mannheim, Germany in 2000 and 2003,
respectively. In 1999/2000, he was at the ICSI Center for
Internet Research, Berkeley, CA working on equation-based
congestion control. His research interests include efficient
algorithms for wireless ad-hoc and sensor networks, network
coding, and transport protocols.
Michele Zorzi (zorzi@ing.unife.it) was born in Venice, Italy,
in 1966. He received the Laurea Degree and the Ph.D. in
Electrical Engineering from the University of Padova, Italy,
in 1990 and 1994, respectively. During the Academic Year
1992/93, he was on leave at the University of California,
San Diego (UCSD), attending graduate courses and doing
research on multiple access in mobile radio networks. In
1993, he joined the faculty of the Dipartimento di Elettronica
e Informazione, Politecnico di Milano, Italy. After spending
three years with the Center forWireless Communications at
UCSD, in 1998 he joined the School of Engineering of the
University of Ferrara, Italy, where he became a Professor in
2000. Since November 2003, he has been on the faculty at
the Information Engineering Department of the University of
Padova. His present research interests include performance
evaluation in mobile communications systems, random access
in mobile 18 radio networks, ad hoc and sensor networks, and
energy constrained communications protocols. Dr. Zorzi is the
Editor-In-Chief of the IEEE Wireless Communications Maga-
zine, and currently serves on the Editorial Boards of the IEEE
Transactions on Communications, the IEEE Transactions on
Wireless Communications, the IEEE Transactions on Mobile
Computing, the Wiley Journal of Wireless Communications
and Mobile Computing and the ACM/URSI/Kluwer Journal of
Wireless Networks. He was also guest editor for special issues
in the IEEE Personal Communications Magazine (Energy
Management in Personal Communications Systems) and the
IEEE Journal on Selected Areas in Communications (Multi-
media Network Radios).
Figure and Table captions
Fig. 1 Diagram for in-network aggregation techniques and their relation with different
protocol layers. We stress that in general data processing also interacts with the
Application, MAC and PHY layers.
Fig. 2 A simplified scheme for Directed Diffusion [1].
Fig. 3 A message exchange example in DB-MAC.
Fig. 4 LEACH clustering approach.
Fig. 5 Examples of aggregation paths over a ring structure.
Fig. 6 Example of data gathering regions in Tributary and Delta.
Fig. 7 Q-digest example [21]: the complete tree T is derived by a recursive binary splitting
of the original (root) interval [1, σ]. The q-digest consists of the non-empty boxes of the
data structure in sub-figure (d).
Fig. 8 Summary of the basic characteristics of the routing protocols.
Fig. 9 Summary of the basic characteristics of the data aggregation functions.
Data representations
Data aggregation functions
Routing protocols
Applications
MAC and PHY layers
Data representations
Data aggregation functions
Routing protocols
Applications
MAC and PHY layers
Fig. 1. Diagram for in-network aggregation techniques and their relation with different protocol layers. We stress that in general data processing also interacts
with the Application, MAC and PHY layers.
c
b Source
Sink
(a) Interest dissemination
c
b
a
Source
Sink
(b) Gradients set up
c
b
a
Source
Sink
(c) Data delivery along the reinforced path
a
c
b Source
Sink
(a) Interest dissemination
c
b
a
Source
Sink
(b) Gradients set up
c
b
a
Source
Sink
(c) Data delivery along the reinforced path
a
Fig. 2. A simplified scheme for Directed Diffusion [1].
S1
S2
R1
R2
Sink direction
Pr = 10
Pr = 9
RTS(10)
CTS(10)
CTS(10)
S1
S2
R1
R2
Sink direction
Pr = 10
Pr = 9
RTS(10)
CTS(10)
CTS(10)
Fig. 3. A message exchange example in DB-MAC.
S
S
Sink
S
CH
S
S
S
S
S
S
S S
SS
S
CH
CH
S
S
Sink
S
CH
S
S
S
S
S
S
S S
SS
S
CH
CH
Fig. 4. LEACH clustering approach.
Sink
G
I
DF
CB
A
R1
R2
R3
R4
H E
Sink
G
I
DF
CB
A
R1
R2
R3
R4
H E
Fig. 5. Examples of aggregation paths over a ring structure.
T1
T4
T2
T3
M2
M1
T5 M3
M4
M5
T1
T4
T2
T3
M2
M1
M M3
M4
M5
Delta region Expanded
Delta region
SINK SINK
T1
T4
T2
T3
M2
M1
T5 M3
M4
M5
T1
T4
T2
T3
M2
M1
M M3
M4
M5
Delta region Expanded
Delta region
T1
T4
T2
T3
M2
M1
T5 M3
M4
M5
T1
T4
T2
T3
M2
M1
M M3
M4
M5
Delta region Expanded
Delta region
SINK SINK
Fig. 6. Example of data gathering regions in Tributary and Delta.
n=15, k=5, σ=8
11 64 111
a b c
root
1
4 6
2 2
e f
d
4 6
2 2
1
g1
4 6
2 2
(a)
(d)
(b)
(c)
Fig. 7. Q-digest example [21]: the complete tree T is derived by a recursive binary splitting of the original (root) interval [1, σ]. The q-digest consists of
the non-empty boxes of the data structure in sub-figure (d).
AsynchronousAsynchronousPeriodic per hopPeriodic per hopAsynchronousPeriodic per hopAsynchronousPeriodic per
hop adjusted
Timing strategy
NoneNoneLocal route
repairs
Rotation of the
cluster-head,
sleeping periods
NoneRotation of the
leader
NoneSleeping
periods
Energy saving
methods
MediumHighLowLowHighVery LowMediumLowResilience in case of
node mobility
MediumHighLowLowHighVery LowMediumLowScalability
MediumMediumMediumMediumLowHighHighHigh
Overhead to setup/maintain
the aggregation structure
HighHighMediumLowMediumLowMediumMediumResilience to link failures
Tree/Multi-path
based, driven by the
sink
Multi-path
based, on-line,
distributed
Cluster-based,
on-line,
distributed
synchronous
Cluster-based,
on-line,
distributed
Completely
distributed,
asynchronous
Chain-based,
centralized or
distributed
Tree-based, on-
line, driven by
the sink
Tree-based,
on-line, driven
by the sinkAggregation method
Tributaries and
Deltas [9]
Synopsis
Diffusion [8]
COUGAR
[4] [49]
LEACH
[2]
DB-MAC
[7]
PEGASIS
[3]
Directed
Diffusion [1]
TAG
[5]
AsynchronousAsynchronousPeriodic per hopPeriodic per hopAsynchronousPeriodic per hopAsynchronousPeriodic per
hop adjusted
Timing strategy
NoneNoneLocal route
repairs
Rotation of the
cluster-head,
sleeping periods
NoneRotation of the
leader
NoneSleeping
periods
Energy saving
methods
MediumHighLowLowHighVery LowMediumLowResilience in case of
node mobility
MediumHighLowLowHighVery LowMediumLowScalability
MediumMediumMediumMediumLowHighHighHigh
Overhead to setup/maintain
the aggregation structure
HighHighMediumLowMediumLowMediumMediumResilience to link failures
Tree/Multi-path
based, driven by the
sink
Multi-path
based, on-line,
distributed
Cluster-based,
on-line,
distributed
synchronous
Cluster-based,
on-line,
distributed
Completely
distributed,
asynchronous
Chain-based,
centralized or
distributed
Tree-based, on-
line, driven by
the sink
Tree-based,
on-line, driven
by the sinkAggregation method
Tributaries and
Deltas [9]
Synopsis
Diffusion [8]
COUGAR
[4] [49]
LEACH
[2]
DB-MAC
[7]
PEGASIS
[3]
Directed
Diffusion [1]
TAG
[5]
Algo.
Char.
Algo.
Char.
Fig. 8. Summary of the basic characteristics of the routing protocols.
√
(Temporal, Spatial)
√ / XX√
(Spatial)
√
(Temporal)
Correlation awareness
LowHighLowMediumHighResilience to losses/failures
√X√ / X√XDuplicate sensitive
√ / X√ / X√ / X√√Lossy aggregation
Q-digest
[18]
Synopsis
Diffusion [8]
AIDA
[58]
DADMA
[15]
TiNA
[13]
√
(Temporal, Spatial)
√ / XX√
(Spatial)
√
(Temporal)
Correlation awareness
LowHighLowMediumHighResilience to losses/failures
√X√ / X√XDuplicate sensitive
√ / X√ / X√ / X√√Lossy aggregation
Q-digest
[18]
Synopsis
Diffusion [8]
AIDA
[58]
DADMA
[15]
TiNA
[13]
Algo.
Char.
Algo.
Char.
Fig. 9. Summary of the basic characteristics of the data aggregation functions.
Ad

More Related Content

What's hot (18)

Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable MechanismSurvey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
IRJET Journal
 
Review on Clustering and Data Aggregation in Wireless Sensor Network
Review on Clustering and Data Aggregation in Wireless Sensor NetworkReview on Clustering and Data Aggregation in Wireless Sensor Network
Review on Clustering and Data Aggregation in Wireless Sensor Network
Editor IJCATR
 
Wireless personal communication
Wireless personal communicationWireless personal communication
Wireless personal communication
Venu Madhav
 
A smart clustering based approach to
A smart clustering based approach toA smart clustering based approach to
A smart clustering based approach to
IJCNCJournal
 
A New Architecture for Group Replication in Data Grid
A New Architecture for Group Replication in Data GridA New Architecture for Group Replication in Data Grid
A New Architecture for Group Replication in Data Grid
Editor IJCATR
 
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
Tal Lavian Ph.D.
 
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
Java Abs   Peer To Peer Design & Implementation Of A Tuple SpaceJava Abs   Peer To Peer Design & Implementation Of A Tuple Space
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
ncct
 
Analytical Study of Cluster Based Routing Protocols in MANET
Analytical Study of Cluster Based Routing Protocols in MANETAnalytical Study of Cluster Based Routing Protocols in MANET
Analytical Study of Cluster Based Routing Protocols in MANET
International Journal of Computer and Communication System Engineering
 
IEEE Networking 2016 Title and Abstract
IEEE Networking 2016 Title and AbstractIEEE Networking 2016 Title and Abstract
IEEE Networking 2016 Title and Abstract
tsysglobalsolutions
 
Information Extraction from Wireless Sensor Networks: System and Approaches
Information Extraction from Wireless Sensor Networks: System and ApproachesInformation Extraction from Wireless Sensor Networks: System and Approaches
Information Extraction from Wireless Sensor Networks: System and Approaches
M H
 
SDC: A Distributed Clustering Protocol
SDC: A Distributed Clustering ProtocolSDC: A Distributed Clustering Protocol
SDC: A Distributed Clustering Protocol
CSCJournals
 
TRENDS TOWARD REAL-TIME NETWORK DATA STEGANOGRAPHY
TRENDS TOWARD REAL-TIME NETWORK DATA STEGANOGRAPHY TRENDS TOWARD REAL-TIME NETWORK DATA STEGANOGRAPHY
TRENDS TOWARD REAL-TIME NETWORK DATA STEGANOGRAPHY
IJNSA Journal
 
Proposal of a Transparent Relay System with vNIC for Encrypted Overlay Networks
Proposal of a Transparent Relay System with vNIC for Encrypted Overlay NetworksProposal of a Transparent Relay System with vNIC for Encrypted Overlay Networks
Proposal of a Transparent Relay System with vNIC for Encrypted Overlay Networks
IJCSIS Research Publications
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
ijcsit
 
05688207
0568820705688207
05688207
samadivyareddy
 
Analyse the performance of mobile peer to Peer network using ant colony optim...
Analyse the performance of mobile peer to Peer network using ant colony optim...Analyse the performance of mobile peer to Peer network using ant colony optim...
Analyse the performance of mobile peer to Peer network using ant colony optim...
IJCI JOURNAL
 
Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...
Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...
Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...
IJECEIAES
 
IEEE 2014 DOTNET NETWORKING PROJECTS Leveraging social networks for p2 p cont...
IEEE 2014 DOTNET NETWORKING PROJECTS Leveraging social networks for p2 p cont...IEEE 2014 DOTNET NETWORKING PROJECTS Leveraging social networks for p2 p cont...
IEEE 2014 DOTNET NETWORKING PROJECTS Leveraging social networks for p2 p cont...
IEEEMEMTECHSTUDENTPROJECTS
 
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable MechanismSurvey on Synchronizing File Operations Along with Storage Scalable Mechanism
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
IRJET Journal
 
Review on Clustering and Data Aggregation in Wireless Sensor Network
Review on Clustering and Data Aggregation in Wireless Sensor NetworkReview on Clustering and Data Aggregation in Wireless Sensor Network
Review on Clustering and Data Aggregation in Wireless Sensor Network
Editor IJCATR
 
Wireless personal communication
Wireless personal communicationWireless personal communication
Wireless personal communication
Venu Madhav
 
A smart clustering based approach to
A smart clustering based approach toA smart clustering based approach to
A smart clustering based approach to
IJCNCJournal
 
A New Architecture for Group Replication in Data Grid
A New Architecture for Group Replication in Data GridA New Architecture for Group Replication in Data Grid
A New Architecture for Group Replication in Data Grid
Editor IJCATR
 
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
Tal Lavian Ph.D.
 
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
Java Abs   Peer To Peer Design & Implementation Of A Tuple SpaceJava Abs   Peer To Peer Design & Implementation Of A Tuple Space
Java Abs Peer To Peer Design & Implementation Of A Tuple Space
ncct
 
IEEE Networking 2016 Title and Abstract
IEEE Networking 2016 Title and AbstractIEEE Networking 2016 Title and Abstract
IEEE Networking 2016 Title and Abstract
tsysglobalsolutions
 
Information Extraction from Wireless Sensor Networks: System and Approaches
Information Extraction from Wireless Sensor Networks: System and ApproachesInformation Extraction from Wireless Sensor Networks: System and Approaches
Information Extraction from Wireless Sensor Networks: System and Approaches
M H
 
SDC: A Distributed Clustering Protocol
SDC: A Distributed Clustering ProtocolSDC: A Distributed Clustering Protocol
SDC: A Distributed Clustering Protocol
CSCJournals
 
TRENDS TOWARD REAL-TIME NETWORK DATA STEGANOGRAPHY
TRENDS TOWARD REAL-TIME NETWORK DATA STEGANOGRAPHY TRENDS TOWARD REAL-TIME NETWORK DATA STEGANOGRAPHY
TRENDS TOWARD REAL-TIME NETWORK DATA STEGANOGRAPHY
IJNSA Journal
 
Proposal of a Transparent Relay System with vNIC for Encrypted Overlay Networks
Proposal of a Transparent Relay System with vNIC for Encrypted Overlay NetworksProposal of a Transparent Relay System with vNIC for Encrypted Overlay Networks
Proposal of a Transparent Relay System with vNIC for Encrypted Overlay Networks
IJCSIS Research Publications
 
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...
ijcsit
 
Analyse the performance of mobile peer to Peer network using ant colony optim...
Analyse the performance of mobile peer to Peer network using ant colony optim...Analyse the performance of mobile peer to Peer network using ant colony optim...
Analyse the performance of mobile peer to Peer network using ant colony optim...
IJCI JOURNAL
 
Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...
Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...
Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...
IJECEIAES
 
IEEE 2014 DOTNET NETWORKING PROJECTS Leveraging social networks for p2 p cont...
IEEE 2014 DOTNET NETWORKING PROJECTS Leveraging social networks for p2 p cont...IEEE 2014 DOTNET NETWORKING PROJECTS Leveraging social networks for p2 p cont...
IEEE 2014 DOTNET NETWORKING PROJECTS Leveraging social networks for p2 p cont...
IEEEMEMTECHSTUDENTPROJECTS
 

Viewers also liked (20)

Chapter 5 (master page)
Chapter 5 (master page)Chapter 5 (master page)
Chapter 5 (master page)
let's go to study
 
Cuidadano
CuidadanoCuidadano
Cuidadano
Rodrigo Egaña
 
Mg54 imagine+ events
Mg54 imagine+ eventsMg54 imagine+ events
Mg54 imagine+ events
Santiago Trevisán
 
Sitepal cultura de ka info
Sitepal cultura de ka infoSitepal cultura de ka info
Sitepal cultura de ka info
Marisol Bárcenas
 
March2004-CPerlRun
March2004-CPerlRunMarch2004-CPerlRun
March2004-CPerlRun
tutorialsruby
 
Formasti
FormastiFormasti
Formasti
FOMASTI
 
FERNANDO MENDES NOLASCO
FERNANDO MENDES NOLASCOFERNANDO MENDES NOLASCO
FERNANDO MENDES NOLASCO
FERNANDO MENDES NOLASCO
 
Food Med Final Handbook ES
Food Med Final Handbook ESFood Med Final Handbook ES
Food Med Final Handbook ES
Anna Dimitrakopoulou
 
8 steps to broadcast digital signage using NoviSign
8 steps to broadcast digital signage using NoviSign8 steps to broadcast digital signage using NoviSign
8 steps to broadcast digital signage using NoviSign
NoviSign
 
Holly Rodrigue Design Samples
Holly Rodrigue Design Samples Holly Rodrigue Design Samples
Holly Rodrigue Design Samples
Holly Rodrigue
 
Brand Studie 2009
Brand Studie 2009Brand Studie 2009
Brand Studie 2009
Lukas Stuber
 
Results 2010
Results 2010Results 2010
Results 2010
matiseg
 
Find hot news in insecticides china news 1401
Find hot news in insecticides china news 1401Find hot news in insecticides china news 1401
Find hot news in insecticides china news 1401
CCM Intelligence
 
Mactac Soignies - Solutions adhésives médicales
Mactac Soignies - Solutions adhésives médicalesMactac Soignies - Solutions adhésives médicales
Mactac Soignies - Solutions adhésives médicales
Mactac Europe
 
Ray white know how to consistently outperform in the uae property market
Ray white know how to consistently outperform in the uae property marketRay white know how to consistently outperform in the uae property market
Ray white know how to consistently outperform in the uae property market
Janna Yerman
 
14 razones para_innovar
14 razones para_innovar14 razones para_innovar
14 razones para_innovar
Arturo Ancavil Quezada
 
Comunicado de Prensa Sumavisos Financiamiento
Comunicado de Prensa Sumavisos FinanciamientoComunicado de Prensa Sumavisos Financiamiento
Comunicado de Prensa Sumavisos Financiamiento
Sumavisos
 
Regelmæssige franske verber
Regelmæssige franske verberRegelmæssige franske verber
Regelmæssige franske verber
Gammel Hellerup Gymnasium
 
ESCUELA DE COMERCIO Y ADMINISTRACION.FACULTAD DE FILOSOFIA.PAULINA CHACHA.EMP...
ESCUELA DE COMERCIO Y ADMINISTRACION.FACULTAD DE FILOSOFIA.PAULINA CHACHA.EMP...ESCUELA DE COMERCIO Y ADMINISTRACION.FACULTAD DE FILOSOFIA.PAULINA CHACHA.EMP...
ESCUELA DE COMERCIO Y ADMINISTRACION.FACULTAD DE FILOSOFIA.PAULINA CHACHA.EMP...
paulinap2
 
Formasti
FormastiFormasti
Formasti
FOMASTI
 
8 steps to broadcast digital signage using NoviSign
8 steps to broadcast digital signage using NoviSign8 steps to broadcast digital signage using NoviSign
8 steps to broadcast digital signage using NoviSign
NoviSign
 
Holly Rodrigue Design Samples
Holly Rodrigue Design Samples Holly Rodrigue Design Samples
Holly Rodrigue Design Samples
Holly Rodrigue
 
Results 2010
Results 2010Results 2010
Results 2010
matiseg
 
Find hot news in insecticides china news 1401
Find hot news in insecticides china news 1401Find hot news in insecticides china news 1401
Find hot news in insecticides china news 1401
CCM Intelligence
 
Mactac Soignies - Solutions adhésives médicales
Mactac Soignies - Solutions adhésives médicalesMactac Soignies - Solutions adhésives médicales
Mactac Soignies - Solutions adhésives médicales
Mactac Europe
 
Ray white know how to consistently outperform in the uae property market
Ray white know how to consistently outperform in the uae property marketRay white know how to consistently outperform in the uae property market
Ray white know how to consistently outperform in the uae property market
Janna Yerman
 
Comunicado de Prensa Sumavisos Financiamiento
Comunicado de Prensa Sumavisos FinanciamientoComunicado de Prensa Sumavisos Financiamiento
Comunicado de Prensa Sumavisos Financiamiento
Sumavisos
 
ESCUELA DE COMERCIO Y ADMINISTRACION.FACULTAD DE FILOSOFIA.PAULINA CHACHA.EMP...
ESCUELA DE COMERCIO Y ADMINISTRACION.FACULTAD DE FILOSOFIA.PAULINA CHACHA.EMP...ESCUELA DE COMERCIO Y ADMINISTRACION.FACULTAD DE FILOSOFIA.PAULINA CHACHA.EMP...
ESCUELA DE COMERCIO Y ADMINISTRACION.FACULTAD DE FILOSOFIA.PAULINA CHACHA.EMP...
paulinap2
 
Ad

Similar to In network aggregation techniques for wireless sensor networks - a survey (20)

Communication Cost Reduction by Data Aggregation: A Survey
Communication Cost Reduction by Data Aggregation: A SurveyCommunication Cost Reduction by Data Aggregation: A Survey
Communication Cost Reduction by Data Aggregation: A Survey
IJMTST Journal
 
[IJET-V1I4P2] Authors : Doddappa Kandakur; Ashwini B P
[IJET-V1I4P2] Authors : Doddappa Kandakur; Ashwini B P[IJET-V1I4P2] Authors : Doddappa Kandakur; Ashwini B P
[IJET-V1I4P2] Authors : Doddappa Kandakur; Ashwini B P
IJET - International Journal of Engineering and Techniques
 
B0330811
B0330811B0330811
B0330811
iosrjournals
 
Mobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Mobile Agents based Energy Efficient Routing for Wireless Sensor NetworksMobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Mobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Eswar Publications
 
Energy Efficient Data Mining in Multi-Feature Sensor Networks Using Improved...
Energy Efficient Data Mining in Multi-Feature Sensor Networks  Using Improved...Energy Efficient Data Mining in Multi-Feature Sensor Networks  Using Improved...
Energy Efficient Data Mining in Multi-Feature Sensor Networks Using Improved...
IOSR Journals
 
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
Editor IJCATR
 
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
ijasuc
 
Dynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy managementDynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy management
eSAT Publishing House
 
Dynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy managementDynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy management
eSAT Journals
 
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
IRJET Journal
 
Energy Efficient Techniques for Data aggregation and collection in WSN
Energy Efficient Techniques for Data aggregation and collection in WSNEnergy Efficient Techniques for Data aggregation and collection in WSN
Energy Efficient Techniques for Data aggregation and collection in WSN
IJCSEA Journal
 
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
1crore projects
 
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSMULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
ijcses
 
An Efficient Approach for Data Gathering and Sharing with Inter Node Communi...
 An Efficient Approach for Data Gathering and Sharing with Inter Node Communi... An Efficient Approach for Data Gathering and Sharing with Inter Node Communi...
An Efficient Approach for Data Gathering and Sharing with Inter Node Communi...
cscpconf
 
Clustering and data aggregation scheme in underwater wireless acoustic sensor...
Clustering and data aggregation scheme in underwater wireless acoustic sensor...Clustering and data aggregation scheme in underwater wireless acoustic sensor...
Clustering and data aggregation scheme in underwater wireless acoustic sensor...
TELKOMNIKA JOURNAL
 
Developing QoS by Priority Routing for Real Time Data in Internet of Things (...
Developing QoS by Priority Routing for Real Time Data in Internet of Things (...Developing QoS by Priority Routing for Real Time Data in Internet of Things (...
Developing QoS by Priority Routing for Real Time Data in Internet of Things (...
IJCNCJournal
 
Providing A Network Encryption Approach to reduce end-to-end Delay in MANET
Providing A Network Encryption Approach to reduce end-to-end Delay in MANETProviding A Network Encryption Approach to reduce end-to-end Delay in MANET
Providing A Network Encryption Approach to reduce end-to-end Delay in MANET
Editor IJCATR
 
8 ijcse-01235
8 ijcse-012358 ijcse-01235
8 ijcse-01235
Shivlal Mewada
 
Performance evaluation of data filtering approach in wireless sensor networks...
Performance evaluation of data filtering approach in wireless sensor networks...Performance evaluation of data filtering approach in wireless sensor networks...
Performance evaluation of data filtering approach in wireless sensor networks...
ijmnct
 
ENERGY SAVINGS IN APPLICATIONS FOR WIRELESS SENSOR NETWORKS TIME CRITICAL REQ...
ENERGY SAVINGS IN APPLICATIONS FOR WIRELESS SENSOR NETWORKS TIME CRITICAL REQ...ENERGY SAVINGS IN APPLICATIONS FOR WIRELESS SENSOR NETWORKS TIME CRITICAL REQ...
ENERGY SAVINGS IN APPLICATIONS FOR WIRELESS SENSOR NETWORKS TIME CRITICAL REQ...
IJCNCJournal
 
Communication Cost Reduction by Data Aggregation: A Survey
Communication Cost Reduction by Data Aggregation: A SurveyCommunication Cost Reduction by Data Aggregation: A Survey
Communication Cost Reduction by Data Aggregation: A Survey
IJMTST Journal
 
Mobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Mobile Agents based Energy Efficient Routing for Wireless Sensor NetworksMobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Mobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Eswar Publications
 
Energy Efficient Data Mining in Multi-Feature Sensor Networks Using Improved...
Energy Efficient Data Mining in Multi-Feature Sensor Networks  Using Improved...Energy Efficient Data Mining in Multi-Feature Sensor Networks  Using Improved...
Energy Efficient Data Mining in Multi-Feature Sensor Networks Using Improved...
IOSR Journals
 
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
A New Method for Reducing Energy Consumption in Wireless Sensor Networks usin...
Editor IJCATR
 
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...
ijasuc
 
Dynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy managementDynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy management
eSAT Publishing House
 
Dynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy managementDynamic selection of cluster head in in networks for energy management
Dynamic selection of cluster head in in networks for energy management
eSAT Journals
 
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
Energy Efficient Clustering Algorithm based on Expectation Maximization for H...
IRJET Journal
 
Energy Efficient Techniques for Data aggregation and collection in WSN
Energy Efficient Techniques for Data aggregation and collection in WSNEnergy Efficient Techniques for Data aggregation and collection in WSN
Energy Efficient Techniques for Data aggregation and collection in WSN
IJCSEA Journal
 
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
Mobile Data Gathering with Load Balanced Clustering and Dual Data Uploading i...
1crore projects
 
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSMULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
ijcses
 
An Efficient Approach for Data Gathering and Sharing with Inter Node Communi...
 An Efficient Approach for Data Gathering and Sharing with Inter Node Communi... An Efficient Approach for Data Gathering and Sharing with Inter Node Communi...
An Efficient Approach for Data Gathering and Sharing with Inter Node Communi...
cscpconf
 
Clustering and data aggregation scheme in underwater wireless acoustic sensor...
Clustering and data aggregation scheme in underwater wireless acoustic sensor...Clustering and data aggregation scheme in underwater wireless acoustic sensor...
Clustering and data aggregation scheme in underwater wireless acoustic sensor...
TELKOMNIKA JOURNAL
 
Developing QoS by Priority Routing for Real Time Data in Internet of Things (...
Developing QoS by Priority Routing for Real Time Data in Internet of Things (...Developing QoS by Priority Routing for Real Time Data in Internet of Things (...
Developing QoS by Priority Routing for Real Time Data in Internet of Things (...
IJCNCJournal
 
Providing A Network Encryption Approach to reduce end-to-end Delay in MANET
Providing A Network Encryption Approach to reduce end-to-end Delay in MANETProviding A Network Encryption Approach to reduce end-to-end Delay in MANET
Providing A Network Encryption Approach to reduce end-to-end Delay in MANET
Editor IJCATR
 
Performance evaluation of data filtering approach in wireless sensor networks...
Performance evaluation of data filtering approach in wireless sensor networks...Performance evaluation of data filtering approach in wireless sensor networks...
Performance evaluation of data filtering approach in wireless sensor networks...
ijmnct
 
ENERGY SAVINGS IN APPLICATIONS FOR WIRELESS SENSOR NETWORKS TIME CRITICAL REQ...
ENERGY SAVINGS IN APPLICATIONS FOR WIRELESS SENSOR NETWORKS TIME CRITICAL REQ...ENERGY SAVINGS IN APPLICATIONS FOR WIRELESS SENSOR NETWORKS TIME CRITICAL REQ...
ENERGY SAVINGS IN APPLICATIONS FOR WIRELESS SENSOR NETWORKS TIME CRITICAL REQ...
IJCNCJournal
 
Ad

In network aggregation techniques for wireless sensor networks - a survey

  • 1. In-network Aggregation Techniques for Wireless Sensor Networks: A Survey Elena Fasolo† , Michele Rossi† , J¨org Widmer and Michele Zorzi† † DEI, University of Padova, via Gradenigo 6/B – 35131, Padova, Italy DoCoMo Euro-Labs, Landsberger Strasse 312 – 80687 Munich, Germany Abstract— In this paper, we provide a comprehensive review of the existing literature on techniques and protocols for in- network aggregation in wireless sensor networks. We first define suitable criteria to classify existing solutions, and then describe them by separately addressing the different layers of the protocol stack while highlighting the role of a cross-layer design approach, which is likely to be needed for optimal performance. Throughout the paper we identify and discuss open issues, and propose directions for future research in the area. I. INTRODUCTION Recent advances in technology make it feasible to mass produce small sensor nodes with sensing, computation, and communication capabilities. This has spurred a substantial amount of research on wireless sensor networks over the past few years. For ease of deployment, sensor devices should be inexpensive, small, and have a long lifetime, which makes it important to develop very efficient software and hardware solutions. For this reason, protocols for sensor networks should be carefully designed so as to make the most efficient use of the limited resources in terms of energy, computation, and storage. These restrictions are likely to remain, since in many cases it is desirable to exploit technological improvements to develop smaller and more energy efficient devices rather than making them more powerful. Typical applications envi- sioned for sensor networks (e.g., environmental monitoring, surveillance, tracking, etc.), along with the already mentioned resource-constrained character of sensor nodes, usually result in very different network requirements and communications patterns compared to other types of ad hoc network scenarios. The area of communications and protocol design for sensor networks has been widely researched in the past few years, and many solutions have been proposed and compared. In this survey paper we focus instead on another important aspect of sensor networks, namely in-network aggregation and data management. These techniques allow to trade off commu- nication for computational complexity. Given the application area, network resource constraints, and the fact that local computation often consumes significantly less energy than communication, in-network data aggregation and management are at the very heart of sensor network research. In particular, resource efficiency, timely delivery of data to the sink node, and accuracy or granularity of the results are conflicting goals and the optimal tradeoff among them largely depends on the specific application. Initially, in-network aggregation techniques involved differ- ent ways to route packets in order to combine data coming from different sources but directed towards the same desti- nation(s). In other words, these protocols were simply routing algorithms which differed from more traditional ad hoc routing protocols in the metric they used to select the routing paths. More recently, many additional studies have been published, addressing not only the routing problem but also mechanisms to represent and combine data more efficiently. In-network data aggregation is a complex problem that involves many layers of the protocol stack and different aspects of protocol design, and a characterization and classification of concepts and algorithms is still lacking in the literature. The aim of the present paper is to provide a taxonomy of in-network aggregation by defining the main concepts and covering the most important and recent work in the field. Our major contributions are, on the one hand, to define criteria to classify existing solutions and, on the other hand, to identify and propose directions for future research in this area. Compared with well-researched topics in sensor networks, such as for example MAC and routing protocol design, data aggregation does not seem to have received as much attention, and we think that it provides many interesting opportunities for relevant contributions. The goal of this paper is to help people to get an updated view of this area and to provide a motivation and a starting point for researchers and students who are interested in these issues. The paper is organized as follows. In Section II we define the in-network aggregation paradigm, by identifying the main problems involved and giving some criteria to classify existing algorithms. In Section III we discuss theoretical performance limits of in-network aggregation techniques. In Section IV we introduce some protocol issues in the presence of in-network processing, classify the most recent solutions, and discuss their advantages and disadvantages. In Section V we focus on possible techniques to combine data by means of aggregation functions, highlight how these interact with routing protocols, and discuss the benefits arising from a cross-layer design (routing and aggregation). Finally, in Section VI we summarize the in-network aggregation approaches discussed throughout the paper, and give directions and motivations for future research. II. BASICS OF IN-NETWORK AGGREGATION In typical sensor network scenarios, data is collected by sensor nodes throughout some area, and needs to be made available at some central sink node(s), where it is processed, analyzed, and used by the application. In many cases, data generated by different sensors can be jointly processed while being forwarded towards the sink, e.g., by fusing together sensor readings related to the same event or physical quantity, or by locally processing raw data before this is transmitted.
  • 2. In-network aggregation deals with this distributed processing of data within the network. Data aggregation techniques are tightly coupled with how data is gathered at the sensor nodes as well as how packets are routed through the network, and have a significant impact on energy consumption and overall network efficiency (e.g., by reducing the number of transmissions or the length of the packets to be transmitted). In-network data aggregation can be considered a relatively complex functionality, since the aggregation algorithms should be distributed in the network and therefore require coordina- tion among nodes to achieve better performance. Also, we em- phasize that data size reduction through in-network processing shall not hide statistical information about the monitored event. For instance, when multiple sensors collaborate in observing the same event, the number of nodes reporting it and the timings of the reports may reveal the event’s size and/or dynamics, respectively. We define the in-network aggregation process as follows: In-network aggregation is the global process of gathering and routing information through a multi- hop network, processing data at intermediate nodes with the objective of reducing resource consumption (in particular energy), thereby increasing network lifetime. We can distinguish between two approaches: • In-network aggregation with size reduction refers to the process of combining and compressing data coming from different sources in order to reduce the information to be sent over the network. As an example, assume that a node receives two packets from two different sources containing the locally measured temperatures. Instead of forwarding the two packets, the sensor may compute the average of the two readings and send it in a single packet. • In-network aggregation without size reduction refers to the process of merging packets coming from different sources into the same packet without data processing: assume to receive two packets carrying different physical quantities, e.g., temperature and humidity. These two values cannot be processed together but they can still be transmitted in a single packet, thereby reducing overhead. The first approach is better able to reduce the amount of data to be sent over the network but it may also reduce the accuracy with which the gathered information can be recovered at the sink. After the aggregation operation, it is usually not possible to perfectly reconstruct all of the original data.1 The second approach, instead, preserves the original information (i.e., at the sink, the original data can be perfectly reconstructed). Which solution to use depends on many factors including the type of application, the data rate, the network characteristics, and so on. Both of the above strategies may involve the treatment of data at different network layers. In-network aggregation techniques require three basic in- gredients: suitable networking protocols, effective aggregation functions and efficient ways of representing the data (see 1This actually depends on the type of aggregation function in use, i.e., lossy or lossless. Fig. 1). In the remainder of this section we briefly introduce each of these aspects. Routing Protocols [1]–[9]: The most important ingredient for in-network aggregation is a well designed routing protocol. Data aggregation requires a different forwarding paradigm compared to classic routing. Classic routing protocols typically forward data along the shortest path to the destination (with re- spect to some specified metric). If, however, we are interested in aggregating data to minimize energy expenditure, nodes should route packets based on the packet content and choose the next hop in order to promote in-network aggregation. This type of data forwarding is often referred to as data centric routing. According to the data centric paradigm, as a node searches for the relay nodes, it needs to use metrics which take into account the positions of the most suitable aggregation points, the data type, the priority of the information, etc. Altogether, the application scenario, routing scheme, and data aggregation mechanism are closely interrelated. Moreover, in-network aggregation techniques may require some form of synchronization among nodes. In particular, the best strategy at a given node is not always to send data as soon as it is available. Waiting for information from neighboring nodes may lead to better data aggregation opportunities and, in turn, improved performance. Timing strategies are required especially in the case of monitoring applications where sensor nodes need to periodically report their readings to the sink. These strategies usually involve data gathering trees rooted at the sink. The main timing strategies proposed so far in the literature are summarized below [10]: • Periodic simple aggregation requires each node to wait for a pre-defined period of time, to aggregate all data items received, and then to send out a packet with the result of the aggregation. • Periodic per-hop aggregation is quite similar to the previous approach, the only difference being that the aggregated data is transmitted as soon as the node hears from all of its children. This requires that each node knows the number of its children. In addition, a timeout is used in case some children’s packets are lost. • Periodic per-hop adjusted aggregation adjusts the timeout of a node, after which it sends the aggregated data, depending on the node’s position in the gathering tree. Note that the choice of the timing strategy strongly affects the design of the routing protocol [10]–[12]. Aggregation Functions [8], [13]–[20]: One of the most important functionalities that in-network aggregation tech- niques should provide is the ability to combine data coming from different nodes. There are several types of aggregation functions and most of them are closely related to the specific sensor application. Nevertheless, we can identify some com- mon paradigms for their classification: • Lossy and lossless: Aggregation functions can compress and merge data according to either a lossy or a lossless approach. In the first case the original values can not be recovered after having merged them by means of the aggregation function. In addition, we may lose in precision with respect to transmitting all readings un-
  • 3. compressed. In contrast, the second approach (lossless) allows to compress the data by preserving the original information. This means that all readings can be perfectly reconstructed from their aggregate at the receiver side. • Duplicate sensitive and duplicate insensitive: An inter- mediate node may receive multiple copies of the same information. In this case, it may happen that the same data is considered multiple times when the information is aggregated. If the aggregation function in use is duplicate sensitive, the final result depends on the number of times the same value has been considered. Otherwise, the aggregation function is said to be duplicate insensitive. For instance, a function that takes the average is duplicate sensitive, whereas a function that takes the minimum value is duplicate insensitive. Good aggregation functions for wireless sensor networks need to meet additional requirements. In particular, they should take into account the very limited processing and energy capabili- ties of sensor devices, and should therefore be implementable by means of elementary operations. Also, different devices may be suitable for different types of operations, depending on their energy resources and computation capabilities. These facts need to be considered in the design of aggregation functions and routing protocols. Data representation [21]–[24]: Due to its limited storage capabilities, a node may not be able to store all the re- ceived/generated information in its internal buffer. It there- fore needs to decide whether to store, discard, compress, or transmit the data. All these operations require a suitable way to represent the information. The corresponding data structure may vary according to the application requirements. Finally, even though the data structure is usually common to all nodes, it should be adaptable to node-specific or location specific characteristics. A recent and promising method to deal with data representation and compression is distributed source coding techniques, that compress data on the basis of some knowledge about its correlation. More details on the approach are given in section V-F. Although we described routing, aggregation and data rep- resentation in isolation, they are intimately related and should be designed and implemented jointly for optimal performance. Most of the related work in the literature covers only partial aspects of the joint optimization of these functionalities, and often neglects or oversimplifies some of the others. Further work on cross-layer optimization for in-network aggregation should therefore be appreciated as innovative and is very much needed. In the sequel, we thoroughly review each of the aforementioned functionalities. We start with a review of recent work on the theoretical limits of aggregation techniques in the next section. III. THEORETICAL LIMITS OF IN-NETWORK AGGREGATION TECHNIQUES Several theoretical studies provide limits and bounds on the performance of in-network data aggregation techniques and thus assist in the design of suitable algorithms. The efficiency of these algorithms depends on the correlation among the data generated by different information sources (sensor units). Such a correlation can be spatial, when the values generated by close-by sensors are related, temporal, when the sensor readings change slowly over time, or semantic, when the contents of different data packets can be classified under the same semantic group (e.g., the data is generated by sensors placed in the same room). The gains of in-network data aggregation can be best demonstrated in the extreme case when data generated by different sources can be combined into a single packet (e.g., when the sources generate identical data). If there are K sources all close to each other and far away from the sink, the combination of their data into a single packet leads, on average, to a K-fold reduction in transmissions with respect to the case where all data are sent separately. Generally, the optimal joint routing and compression structure is a Steiner tree, which is known to be NP hard [25]. However, there exist polynomial solutions for special cases where the information sources are close to each other [26]. The authors in [27] propose a model to describe the spatial correlation in terms of joint entropy. They analyze a symmetric line network with different degrees of correlation among neighboring nodes. For the uncorrelated case, the authors show that the best routing strategy is to forward packets along shortest paths. In contrast, in case of completely correlated information, the best strategy is to aggregate data as soon as possible. After that, a single packet (formed by the aggregated data) is sent to the sink along the shortest path. In all the intermediate cases, clustering-based solutions may be the optimal choice, although no formal proof is given in the paper. In [28] the authors study the impact of data correlation on the energy expenditure of data distribution protocols. They focus on various energy aware data aggregation trees under different network conditions, such as node density, source density, source distribution, and data aggregation degree. The findings of the study are in agreement with the results in [27] but in addition provide more quantitative results. In particular, the authors focus on tree structures and compare the Minimum Steiner Tree (MST) with the Shortest Path Tree (SPT). The MST is found to be the optimal aggregation tree structure. Although the SPT guarantees low delays and can be built in an online fashion, its performance in terms of aggregation effectiveness is largely inferior to that of the MST. In addition, in [28] opportunistic aggregation is compared to systematic aggregation in terms of cost ratio which is the cost of the correlation unaware (SPT) tree over that of the correlation aware (MST) tree considering the same set of sources and sinks. The authors prove, using an analytical model, that the expected cost improvement of MST over SPT in sensor networks increases as O( √ log N), where N is the number of nodes in the network. This result makes SPT a viable solution for many practical cases (small networks). Based on this study, the authors propose a hybrid tree structure called SCT (Semantic/Spatial Correlation Tree) [29]. SCT is based on the identification of an aggregation backbone which is used to generate efficient aggregation trees, regardless of sources distribution and density. The aim is to efficiently build and maintain a network structure for data aggregation. To this end, the authors of [29] propose a ring-sector subdivision of
  • 4. the network. A subset of nodes is elected as aggregation nodes and they are organized in a spanning tree to form the data aggregation backbone. Each node belonging to the backbone aggregates messages coming from a certain sub-area. A further tree-based aggregation algorithm that exploits data correlation is presented in [30]. It is based on shallow light trees (SLT) that unify the properties of MST and SPT. In an SLT, the total cost of the tree is only a constant factor larger than that of the MST, while the distances (delays) between any node and the sink are only a constant factor larger than the shortest paths. In [31], the authors analyze aggregation properties of a tree structure that is based on an SPT of nodes close to the sink node, while nodes that are further away are connected to the leaves of the SPT via paths found by an approximation algorithm for the traveling salesman problem. Simulations show that these trees outperform SLTs in many scenarios. IV. NETWORKING PROTOCOLS AND HIERARCHIES FOR IN-NETWORK AGGREGATION Most of the work done so far on in-network aggregation deals with the problem of forwarding packets in order to fa- cilitate the in-network aggregation of the information therein. Initially, the main ideas were to enhance existing routing algorithms in such a way to make data aggregation possible. To this respect, many studies proposed solutions exploiting tree- based (or hierarchical) structures. These consist of routing algorithms based on a tree rooted at the sink. Trees are usually SPTs but some approaches exist which consider more complex tree constructions. The tree based approaches are referred to in this paper as classical approaches. Sometimes the tree structure can be optimized to the type of data to be gathered. Also, the nodes can be locally grouped into clusters for improved efficiency. Recently, a few notable exceptions looked at the problem from a different angle. These papers address the weaknesses of the tree-based approach by focusing on multi-path routing. Finally, some very recent schemes implement a mixture of tree-based and multi-path solutions. These are referred to here as hybrid approaches to emphasize the adaptive nature of their routing algorithms. In the following, we focus on each class of routing protocols separately (tree-based, cluster-based, multi-path and hybrid) by reviewing the main concepts and briefly commenting the pros and cons of each scheme. As seen from the number of schemes discussed in each subsection, many solutions are proposed in the tree-based and cluster-based categories. On the other hand, very few studies use the multi-path and hybrid approaches. This leaves room for further work in this area. A. Tree-based Approaches Classic routing strategies [32], [33] are usually based on a hierarchical organization of the nodes in the network. In fact, the simplest way to aggregate data flowing from the sources to the sink is to elect some special nodes which work as aggregation points and define a preferred direction to be followed when forwarding data. In addition, a node may be marked as special depending on many factors such as its position within the data gathering tree [34], its resources [35], the type of data stored in its queue [36], [37], or the processing cost due to aggregation procedures [38]. According to the tree-based approach [1], [3], [6] a spanning tree rooted at the sink is constructed first. Subsequently, such a structure is exploited in answering queries generated by the sink. This is done by performing in- network aggregation along the aggregation tree by proceeding level by level from its leaves to its root. Thus, as two or more messages get to a given node, their aggregate can be computed exactly. However, this way of operating has some drawbacks as actual wireless sensor networks are not free from failures. More precisely, when a packet is lost at a given level of the tree, e.g., due to channel impairments, the data coming from the related subtree are lost as well. In fact, a single message at a given level of the tree may aggregate the data coming from the whole related subtree. In spite of the potentially high cost of maintaining a hierarchical structure in dynamic networks and the scarce robustness of the system in case of link/device failures, these approaches are particularly suitable to design optimal aggregation functions and perform efficient energy management. In fact, there are some studies where the sink organizes routing paths to evenly and optimally distribute the energy consumption while favoring the aggregation of data at the intermediate nodes [36], [39], [40]. In [39] the authors compute aggregation topologies by taking into account the residual energy of each node through linear programming. Further algorithms can be found in [34], [35], [41], [42]. In [41] the authors investigate which nodes in the network can be exploited as aggregation points for optimal performance. In [34], [42] the focus is on the nodes that should be entrusted with the transmission of the sensed values, whereas in [35] the emphasis is put on the proper scheduling of sleeping/active periods. Often, optimal paths are calculated in a centralized manner at the sink by exploiting different assumptions on the data correlation and selecting the best aggregation points by means of cost functions [43]. Recently, also tree-based schemes for real time or time-constrained applications have been proposed [44]–[46]. Finally, a last approach based on aggregation trees relies on the construction of connected dominating sets [47]. These sets consist of a small subset of nodes which form a connected backbone and whose positions are such that they can collect data from any point in the network. Nodes that do not belong to these sets are allowed to sleep when they do not have data to send. Some rotation of the nodes in the dominating set is recommended for energy balancing. In the following paragraphs, we review the main routing approaches based on aggregation trees. TAG [5] - The Tiny AGgregation (TAG) approach is a data centric protocol. It is based on aggregation trees and is specifically designed for monitoring applications. This means that all nodes should produce relevant information periodically. Therefore, it is possible to classify TAG as a periodic per- hop adjusted aggregation approach. The implementation of the core TAG algorithm consists of two main phases: 1) the distribution phase, where queries are disseminated to the sensors, and 2) the collection phase, where the aggregated
  • 5. sensor readings are routed up the aggregation tree. For the distribution phase, TAG uses a tree based routing scheme rooted at the sink node. The sink broadcasts a message asking nodes to organize into a routing tree and then sends its queries. In each message there is a field specifying the level, or distance from the root, of the sending node (the level of the root is equal to zero). Whenever a node receives a message and it does not yet belong to any level, it sets its own level to be the level of the message plus one. It also elects the node from which it receives the message as its parent. The parent is the node that is used to route messages towards the sink. Each sensor then rebroadcasts the received message adding its own identifier (ID) and level. This process continues until all nodes have been assigned an ID and a parent. The routing messages are periodically broadcast by the sink in order to keep the tree structure updated. After the construction of the tree, the queries are sent along the structure to all nodes in the network. TAG adopts the selection and aggregation facilities of the database query languages (SQL). Accordingly, TAG queries have the following form: SELECT{agg(expr), attrs} from SENSOR WHERE{selPreds} GROUP BY{attrs} HAVING{havingPreds} EPOCH DURATION i In practice, the sink sends a query, where it specifies the quantities that it wants to collect (attrs field), how these must be aggregated (agg(expr)) and the sensors that should be involved in the data retrieval. This last request is specified through the WHERE, GROUP and HAVING clauses [5]. Fi- nally, an EPOCH duration field specifies the time (in seconds) each device should wait before sending new sensor readings. This means the readings used to compute an aggregate record all belong to the same time interval, or epoch. During the data collection phase, due to the tree structure, each parent has to wait for data from all of its children before it can send its aggregate up the tree. Epochs are divided into shorter intervals called communication slots. The number of these slots equals the maximum depth of the routing tree. The slot mechanism gives a nice benefit. As the time is slotted, sensor nodes can be put to sleep until the next scheduled transmission interval. In practice, a node goes back to sleep soon after it has finished sending its readings to its parent. Data aggregation is performed by all intermediate nodes. However, in order not to limit TAG to the few and very simple aggregation functions defined by the SQL language (such as COUNT, MIN, MAX, SUM, and AVERAGE) a more general classification is accounted for by partitioning aggregates according to the Duplicate Sensitivity, Exemplary and Summary and Monotonic properties [5]. As for most tree-based schemes, TAG may be inefficient in case of dynamic topologies or link/device failures: as discussed above, trees are particularly sensitive to failures at intermediate nodes as the related subtree may become disconnected. In addition, as the topology changes, TAG has to re-organize the tree structure and this means high costs in terms of energy consumption and overhead. Directed Diffusion [1] - Directed Diffusion is a reactive data centric protocol. The routing scheme is specifically tailored for those applications where one or few sinks ask some specific information by flooding the network with their queries. Directed Diffusion is organized in three phases (see Fig. 2, originally shown in [1]): 1) interest dissemination, 2) gradient setup and 3) data forwarding along the reinforced paths (path reinforcement and forwarding). When a certain sink is interested in collecting data from the nodes in the network, it propagates an interest message (interest dissemination), describing the type of data the node is interested in, and setting a suitable operational mode for its collection. Each node, on receiving the interest, re-broadcasts it to its neighbors. In addition, the node sets up interest gradients, i.e., vectors containing the next hop that has to be used to propagate the result of the query back to the sink node (gradient setup). As an illustrative example (see Fig. 2), if the Sink sends an interest which reaches nodes a and b, and both forward the interest to node c, then node c sets up two vectors indicating that the data matching that interest should be sent back to a and/or b. The strength of such a gradient can be adapted, which may result in a different amount of information being redirected to each neighbor. To this end, various metrics such as the node’s energy level, its communication capability and its position within the network can be used. Each gradient is related to the attribute it has been set up for. As the gradient setup phase for a certain interest is complete, only a single path for each source is reinforced and used to route packets towards the sink (path reinforcement and forwarding). Data aggregation is performed when data is forwarded to the sink by means of proper methods, which can be selected according to application requirements. The data gathering tree (i.e., reinforced paths) must be periodically refreshed by the sink and this can be expensive in case of dynamic topologies. A tradeoff, depending on the network dynamics, is involved between the frequency of the gradient setup (i.e., energy expenditure) and the achieved performance. A valuable feature of Directed Diffusion consists of the local interaction among nodes in setting up gradients and reinforcing paths. This allows for increased efficiency as there is no need to spread the complete network topology to all nodes in the network. We observe that attention is to be paid to MAC Layer design. Consider as an example the IEEE802.11 wireless technology. As said above, queries are propagated by means of broadcasts (basic access in IEEE802.11). However, data is sent back to the sink via unicast transmissions. This means that when either the node density increases or the duplicate suppression rule is not used, due to MAC collisions and subsequent backoffs, the delay may become excessively large. Hence, the local traffic should be kept at an acceptably low level in order to avoid collisions. Several approaches [36], [48], [49] have been proposed to reduce the control traffic generated by the local interactions among nodes with Directed Diffusion. In these solutions, the authors use properly defined aggregation trees with the main purpose of reducing both traffic and delay. In [48] a modified version of Directed Diffusion, Enhanced Directed Diffusion (EDD), is proposed. This protocol jointly exploits Directed Diffusion to collect data and a cluster-based architecture to increase the efficiency of the local interactions
  • 6. (decreasing local traffic and related collisions). A similar approach is investigated in [50]. PEGASIS [3] - The key idea in Power-Efficient GAthering in Sensor Information Systems (PEGASIS) is to organize the sensor nodes in a chain. Moreover, nodes take turns to act as the chain leader, where at every instant the chain leader is the only node allowed to transmit data directly to the sink. In this way, it is possible to evenly distribute the energy expenditure among the nodes in the network. The chain can be built either in a centralized (by the sink) or distributed manner (by using a greedy algorithm at each node). In both cases, however, the construction of the chain requires global knowledge of the network at all nodes. The chain building process starts with the node furthest from the sink. Then the closest neighbor to this node is chosen as the next one in the chain and so on. Nodes take turns to act as leader according to the following rule: node i is elected as the leader in round i. If there are N nodes in the network, rounds cyclically take values in {1, 2, . . . , N} according to a TDMA schedule. As a consequence, the leader is not always the same but, during each transmission round, it is at a different position in the chain. Note that in this scheme a direct communication channel from each sensor to the sink is required. In PEGASIS, each node receives data from a neighbor and aggregates it with its own reading by generating a single packet of the same length. Subsequently, such an aggregate is transmitted to the next node in the chain until the packet reaches the current chain leader. At this point the leader includes its own data into the packet and sends it to the sink. A possible drawback of the scheme comes from the distance among neighbors. In fact, when the neighbors along the chain are too distant the energy expenditure can be very high. In addition, transmission energies are not evenly distributed but depend on the actual distances between the nodes and their neighbors, i.e., nodes with distant neighbors dissipate more energy. PEGASIS can therefore be enhanced by not allowing such nodes to become leaders, for example using a threshold-based leader election policy. The main disadvantages of PEGASIS are the necessity of having a complete view of the network topology at each node for a proper chain construction and that all nodes must be able to transmit directly to the sink. This makes the scheme unsuitable for those networks with a time varying topology. In addition, also link failures and packet losses may affect the performance of this protocol. In fact, the failure of any intermediate node compromises the delivery of all data aggregated and sent by the previous nodes in the chain. Hence, some improvements to the scheme may be needed in order to increase its robustness. DB-MAC [7] - A different approach to route packets by performing data aggregation is presented in [7], where the routing and the MAC protocols are jointly designed. The primary objective of the Delay Bounded Medium Access Control (DB-MAC) [7] scheme is to minimize the latency for delay bounded applications while taking advantage of data aggregation mechanisms for increased energy efficiency. DB-MAC adopts a CSMA/CA contention scheme based on an RTS/CTS/DATA/ACK handshake. The protocol is most suitable for those cases where different sources sense an event almost at the same time and, due to the delay constraints, have to send their measurements right away to the sink. In such cases, the generated data flows can be dynamically aggregated while routing them towards the sink. This gives rise to an ag- gregation tree, which is built on the fly and without having any knowledge about the network topology. The MAC protocol is very similar to the IEEE802.11 RTS/CTS Access [51] with some minor modifications: RTS/CTS messages are exploited to perform data aggregation and backoff intervals are com- puted by taking into account the priorities assigned to different transmissions. In particular, each node takes advantage of the transmissions from other nodes by overhearing CTSs in order to facilitate data aggregation. This leads to choosing the relay node among those nodes that already have some packets to transmit in their queue. This is implemented to promote data aggregation with the information stored along the path. As an example, refer to the scenario in Fig. 3. We have two nodes S1 and S2 which want to transmit their packets to the sink using one of their neighbors (R1 and R2 in the figure) as the relay. At the beginning of the contention, a node transmits a newly generated packet by setting its priority to the maximum value. The packet priority is subsequently decreased at each traversed node. Because Pr(S1) > Pr(S2), S1 wins the contention for the medium and sends its packet to R1 which decreases its priority by one unit. After this, Pr(S2) becomes equal to the priority of the packet just transmitted, which is now stored at node R1. If S2 is placed in the coverage area of both S1 and R1, it can overhear all messages exchanged between these two nodes (remember that the packet at S2 still has to be forwarded). If this is the case, S2 may now want to send its packet to R1 instead of R2 as it knows that R1 already has one packet in its queue (the packet previously transmitted by S1). This facilitates in-network aggregation. DB-MAC gives an example of how routing and data aggregation may influence each other, and shows that, in most cases, energy efficient solutions are achieved only through a cross-layer design. The advantage of this strategy is the flexible and distributed procedure for the construction of aggregation trees, which appears to be suitable for wireless networks with a dynamic topology. Further Algorithms - Regarding the tree-based approaches, many additional solutions have been proposed to solve the problem of efficiently constructing aggregation trees. The authors in [36] define an efficient, distributed and energy aware heuristics (EADAT) to build the aggregation tree. A nice feature of such an approach is that the tree construction process only relies on a local knowledge of the network topology. Hence, the costs incurred in updating the tree in response to node mobility, device failures and duty cycles may be limited. In addition, to further increase the energy savings, the scheme in [36] uses an aggregation tree rooted at the sink where all non-leaf sensors perform data aggregation while leaf nodes can turn off their radios in order to save energy. In [52], the problem of constructing the optimal aggregation tree is treated from a game theoretic perspective. The authors develop a framework including payoff functions that take into account
  • 7. the path reliability, the path length and the energy constraints of the nodes. They finally propose and evaluate a couple of heuristics to implement opportunistic in-network aggregation strategies. [53] presents a solution for the mobile sink case. The authors define a protocol to maintain the aggregation tree in the presence of mobile sinks. In their solution, they rely on trusted nodes which work as gateways between the network and the sinks. A further contribution can be found in [54], where the authors combine a tree based scheme with data compression based on polynomial regression. An additional problem related to the aggregation tree is addressed in [37], where the authors present a set of algo- rithms to minimize the overall energy consumption of the sensor nodes in the presence of latency constraints under the assumption of perfect knowledge of the aggregation tree. The problem is solved by devising appropriate scheduling strategies for each node. This contribution is particularly important for applications requiring a prompt delivery of the information to the sink. A major drawback, however, is that the problem of constructing the aggregation tree is not addressed. Finally, in [55] the Secure Data Aggregation Protocol (SDAP) scheme is presented. This algorithm addresses the problem of delivering data over aggregation trees in a secure manner. Further data aggregation schemes for secure communications are collected in [56]. B. Cluster-based Approaches Similarly to tree-based algorithms, cluster-based schemes [2], [4], [48], [57] also consist of a hierarchical organization of the network. However, here nodes are subdivided into clusters. Moreover, special nodes, referred to as cluster-heads, are elected in order to aggregate data locally and transmit the result of such an aggregation to the sink. The advantages and disadvantages of cluster-based schemes are very similar to those of tree-based approaches. LEACH [2] - Low-Energy Adaptive Clustering Hierarchy (LEACH) is a self-organizing and adaptive clustering protocol using randomization to evenly distribute the energy expendi- ture among the sensors. Clustered structures are exploited to perform data aggregation where cluster-heads act as aggre- gation points. The protocol works in rounds and defines two main phases: 1) a setup phase to organize the clusters and 2) a steady-state phase which deals with the actual data transfers to the sink node. In the first phase the nodes organize themselves into clus- ters. Within each cluster a node is elected as the cluster-head. At the beginning of the setup phase, each sensor elects itself to be the local cluster-head for the current round. This decision is made according to a distributed probabilistic approach. The aim is to have, on average, a percentage P of the nodes acting as cluster-heads, where P has to be optimally chosen according to the node density. In practice, sensors calculate the following threshold: T(n) =    P 1 − P(R mod (1/P)) if n ∈ G 0 otherwise, (1) where P is the desired percentage of cluster-heads, R is the round number and G is the set of nodes that have not been cluster-heads during the last 1/P rounds. A given node n picks a random number [0, 1] and decides to be a cluster- head if this number is lower than T(n). A cluster-head sends advertisements to its neighbors using a CSMA MAC. Surrounding nodes decide which cluster to join based on the signal strength of these messages. Finally, based on the number of nodes that are willing to be part of the cluster, each cluster- head creates a TDMA schedule to optimally manage the local transmissions. The actual data transmission starts in the second phase of the protocol. All source nodes (S in Fig. 4) send their data to their cluster-heads according to the established schedule. The use of a TDMA protocol in the data collection phase ensures that there are no collisions within the clusters, saving both energy and time. After cluster-heads (CH in Fig. 4) have received all the data from the active sources, they send them back to the sink using a single direct transmission (dotted lines in Fig. 4). If the sink is placed far away from a cluster-head, a high power may be necessary to successfully deliver the message. Also, a doze mode is implemented to further save energy. When doze mode is used, the nodes’ radios may be switched off until their scheduled TDMA transmission slot. Note that cluster-heads cannot switch their radio off as they have to receive packets from potentially all nodes in the cluster. LEACH is completely distributed in the sense that neither control messages from the sink nor the distribution of global information to the sensor nodes are required for correct operation. Moreover, LEACH outperforms classical clustering algorithms by accounting for adaptive clusters and rotating cluster-heads. The LEACH framework also offers the opportunity to implement any aggregation function at the cluster-heads. However, several problems may arise in highly dynamic environments. In this case continuous updates are needed in order to keep the clusters consistent with the underlying topology. This requires to send many control messages which, in turn, may substantially impact the performance. In addition, in case of mobility additional problems may arise. A node close to a cluster-head at a given instant in time may move away from the cluster-head. As a consequence, the node needs to increase its power, thereby spending much more energy to transmit to the cluster-head than expected. A recent improvement of LEACH has been presented in [58] where further energy savings are achieved by the introduction of meta-data negotiation. COUGAR [4], [59] - Cougar is most suitable for moni- toring applications, where nodes produce relevant information periodically. The protocol can be classified as a periodic per- hop aggregation approach. Cougar is basically a clustering scheme. As soon as the cluster-heads receive all data from the nodes in their clusters, they send their partial aggregates to a gateway node. Of course, being similar to LEACH, Cougar is also affected by the same problems in highly dynamic environments. Noticeably, Cougar differs from the previous clustering based algorithms in the way cluster-heads are elected. Unlike
  • 8. in LEACH, where each node picks its cluster-head based on signal strength measurements, in Cougar the cluster-head selection may be driven by additional metrics. In fact, a node could be more than one hop away from its cluster-head. For this reason, the routing algorithm adopted to exchange packets within clusters is based on the AODV (Ad hoc On demand Distance Vector) technique. As AODV does not generate duplicate data packets, Cougar is particularly suitable to perform in-network aggregation with duplicate sensitive aggregators. The core Cougar algorithm consists of the node synchronization engine which ensures that data is aggregated correctly. Each cluster-head has a waiting list containing all nodes it expects a message from. The list is updated every time the node receives a record from a node in its cluster. The cluster-head does not report its reading to the gateway until, at time tsend, it hears from all nodes in its waiting list. A prediction mechanism is also implemented at each cluster- head in order to infer the instant tsend. In addition, a child node can determine whether its cluster-head is waiting for a packet from it and can use a notification packet to refine the prediction at the cluster-head. Timeouts and backoffs are implemented to deal with wrong predictions. In [4], the authors define three different data aggregation features: Direct delivery, where data aggregation is performed at the cluster-heads only, Packet merging which consists of aggregation of packets without size reduction and Partial aggregation, where data aggregation is implemented at the intermediate nodes. Further Algorithms - Many additional studies exploiting a hierarchical organization of the nodes have been proposed in the literature. Some of them are improvements of ex- isting protocols. In [60], for instance, the authors propose enhancements to the LEACH and PEGASIS schemes. For the performance evaluation, the authors propose the new Data Aggregation Quality (DAQ) metric, which is defined as the ratio between the size of the aggregated data and its joint entropy. DAQ is an interesting performance measure as it takes into account both the effectiveness in reducing the size of the data to be transmitted and the quality of the information. A further improvement to LEACH is presented in [61], where a code is added to the data transmission to enhance the intra-cluster communication security. A similar approach is proposed in [57] where the cluster-based scheme is enhanced by a secure transmission protocol called SecureDAV. [27] presents a location-based clustering scheme where the sensors self-organize to form static clusters. The data generated within each cluster is sent to the related cluster- head along shortest paths, and in-network aggregation is performed at the intermediate nodes. The cluster-heads send the aggregated data to the sink through a multi-hop path without any further aggregation. The cluster size can be varied to tune the degree of aggregation. The authors in [62] study the impact of partially correlated data on the performance of clustering algorithms. They analyze the behavior of multi-hop routing and, by combining random geometry techniques and rate distortion theory, predict the total energy consumption and network lifetime of their cluster-based scheme. Further cluster-based algorithms for data aggregation can be found in [63], [64]. An interesting work about clustering and data aggregation is presented in [65]. Here, a cross-layer approach is adopted and some issues concerning MAC design are addressed. Another work based on a hierarchical organization of the network is proposed in [11]. Assuming that some algorithms are used to form an aggregation tree or a cluster-based aggre- gation structure, the authors propose a scheme to dynamically adapt the data aggregation period (see [10]) according to the aggregation quality required by the sink. A different approach is presented in [66] where a semi- structured approach, named ToD, is defined in order to alle- viate the problem of maintaining a hierarchical organization of nodes in case of dynamic large scale networks. This study is enriched by simulations with 2000 nodes and experimental results over 105 Mica2 devices. C. Multi-path Approaches In order to overcome the robustness problems of aggregation trees, a new approach has been recently proposed in [8], [9], [67]. Instead of having an aggregation tree where each node has to send the partial result of its aggregation to a single parent, these solutions send data over multiple paths. The main idea is that each node can send the data to its (possibly) multiple neighbors by exploiting the broadcast characteristics of the wireless medium. Hence, data may flow from the sources to the sinks along multiple paths and aggregation may be performed by each node. Observe that in contrast to the tree-based schemes discussed above, multi-path approaches allow to propagate duplicates of the same information. Clearly, such schemes trade a higher robustness (as multiple copies of the same data can be sent along multiple paths) for some extra overhead (due to sending duplicates). An aggregation structure that fits well with this methodology is called rings topology, where sensor nodes are divided into several levels according to the number of hops separating them from the data sink. Data aggregation is performed over multiple paths as packets move level by level towards the sink (see Fig. 5). Next, we review the synopsis diffusion framework which belongs to this class of protocols. Synopsis Diffusion [8] - The authors of [8] present the Syn- opsis Diffusion protocol, where data aggregation is performed through a multi-path approach. The underlying topology for data dissemination is organized in concentric rings around the sink. Synopsis Diffusion consists of two phases: 1) the distribution of the queries and 2) the data retrieval phase. The ring topology is formed when a node sends a query over the network. In particular, two different structures, listed below, can be taken into account. The first type of topology consists of a simple ring structure. During the query distribution phase, the network nodes form a set of rings around the querying node q, which is the only sensor belonging to ring R0. A node is in ring Ri if it is i hops away from the querying node. The second type of topology has some improvements that make it more robust than the previous one and able to cope with changes in the network. This topology is called Adaptive
  • 9. Rings. The distribution phase does not change but this time a node u in ring i keeps track of the number of times, nov, the transmissions from any node ni−1 in ring i − 1 included its own data during the last few epochs. That is, node u checks whether its data is aggregated with the information sent by any node in ring i−1. If nov is small, u tries to find a better ring in order to have more of its own data included in the subsequent transmissions. In fact, rings i, i + 1, i − 2 and i + 2 can also be considered for aggregating data (rings i−2 and i+2 could be overheard in case of mobility). To allow for these checks, the list of all node IDs participating in the construction of the synopsis (data aggregation result) is included in the header of each packet. This feature is also exploited at each node as a sort of implicit acknowledgment. Finally, the decision on which ring to join is made according to heuristics depending on ni−1, ni, ni+1, ni+2 and ni−2 [8]. The query aggregation period is divided into epochs and one aggregate is provided at the end of each. Specific time slots are allocated within each epoch and used to schedule the node transmissions in a TDMA fashion. Sensors can be put to sleep and woken up at their scheduled transmission slots. The aggregation starts from the outermost ring, e.g., Ri, proceeds towards the subsequent ring, e.g., Ri−1 and propagates level by level towards the sink. In the example in Fig. 5, the data generated at node A can reach the sink through seven paths: {A, B, F, I, S}, {A, B, F, H, S}, {A, B, F, G, H, S}, {A, C, D, E, I, S}, {A, C, F, H, S}, {A, C, F, I, S}, and {A, C, G, H, S}. Note that, as the main feature of Synopsis Diffusion is that data can flow over multiple paths, a node may receive duplicates of the same information. This may affect the aggregation result, especially when aggregation functions are duplicate sensitive. This problem is addressed by the authors in [8] by proposing proper aggregation functions and data structures, see Section V-D. On the upside, multi-path schemes are suitable for networks with frequent packet losses due to mobility or channel impairments, as the extra overhead (duplicates) pays off in terms of robustness: if a link fails, the data may still reach the sink through a different path. Further Algorithms - Another way to implement multi-path schemes is based on multiple spanning trees. For instance, the authors in [68] define a method to provide fault tolerance to packet losses by forming a Directed Acyclic Graph (DAG). DAG allows to have multiple parent nodes at each sensor. In addition, the method ensures correct data transmission timing, according to the actual hop count of the DAG edges. D. Hybrid Data Aggregation Approaches In order to benefit from the advantages of both tree-based and multi-path schemes, it is possible to define hybrid approaches which adaptively tune their data aggregation structure for optimal performance. To the best of our knowledge, a single work [9] has been proposed with this aim. The related protocol is presented next. Tributaries and Deltas [9] - The Tributaries and Deltas protocol tries to overcome the problems of both the tree and multi-path based structures, by combining the best features of both schemes. The result is a hybrid algorithm where both data aggregation structures may simultaneously run in different regions of the network. The idea is that under low packet loss rates, a data aggregation tree is the most suitable struc- ture due to the possibility of implementing efficient sleeping modes (see previous sections) and to the good efficiency in representing and compressing the data. On the other hand, in case of high loss rates or when transmitting partial results which are accumulated from many sensor readings, a multi- path approach may be the best option due to its increased robustness. Hence, nodes are divided into two categories: nodes using a tree-based approach to forward packets (also called T nodes) and nodes using a multi-path scheme (M nodes). This means that the network is organized in regions implementing one of the two schemes. The main difficulty is to link regions running different data aggregation structures. In doing so, the following rules have to be satisfied [9]: • Edge Correctness: An edge originating from an M node can never be incident on a T node. It means that the aggregation result in a multi-path region can only be received by an M node (see Fig. 6). • Path Correctness: M nodes form a subgraph including the sink node, which is fed by trees composed of T nodes (see Fig. 6). According to the above rules, the sink is surrounded by M nodes only. These form the so called delta region which can be expanded or shrunk by switching nodes from the tree mode (T) to the multi-path mode (M) and vice-versa, respectively. In practice, only the nodes lying along the boundary between the two regions are allowed to change their operating mode [9] (see Fig. 6). Expanding the delta region corresponds to in- creasing the number of paths towards the sink, which is good when the packet loss probability is high. On the other hand, shrinking the region is beneficial in case the network is static and the packet loss probability is small. The user can set a threshold to specify the minimum percentage of nodes that should contribute to the aggregation operation. Note that this percentage increases in case of a wider delta region. In fact, this implies that more multi-path nodes are available thus leading to a higher robustness against failures and, in turn, to more nodes which actively contribute to the aggregation result. The opposite holds when the delta region is shrunk. To see this, consider node T5 in Fig. 6. This node is switched to an M vertex (diagram on the right) and, as a consequence, can now transmit the aggregated data flow also to nodes M4 and M5. In particular, M5 can now contribute to the data aggregation by passing the data coming from node M to lower levels. In [9] the authors compare the Tributaries and Deltas algo- rithm to TAG [5] (pure tree-based) and Synopsis Diffusion [8] (pure multi-path). The simulation results in [9] only focus on the quality of the gathered information (RMS error), while disregarding the energy consumption aspect. In particular, they demonstrate that Tributaries and Deltas guarantees smaller errors with respect to TAG and that the approach nicely solves the drawbacks of pure multi-path schemes (Synopsis Diffusion). The major weakness of this approach is the pos- sibly high overhead incurred in updating the data gathering structure. The maintenance of the quite complex network
  • 10. structure may also be a problem in case of node mobility (this is also an open issue, not addressed in [9]). Finally, particular attention is to be paid to the increase in traffic and therefore to the MAC scheme in use. We stress that most of the work on data aggregation done so far does not consider this problem. We emphasize the need for true cross- layer approaches which jointly consider routing, aggregation functions and MAC aspects with particular focus on both data representation efficiency and energy consumption. V. DATA REPRESENTATIONS AND IN-NETWORK AGGREGATION FUNCTIONS As discussed in Section II, the problems of finding a proper data representation and an optimal aggregation function are strongly related and complex. The solutions proposed so far mostly adopt very simple aggregation functions such as average, median, quantile, min, max, etc. (see [3], [5]). These strongly reduce the amount of data to be transmitted over the network but also heavily affect the precision of the transmitted information (lossy aggregation functions). However, in many cases, we may be interested in a more detailed represen- tation of the data, which calls for more complex functions and data structures (taking into account the spatial, temporal and semantic correlation of the readings). In this direction, complex frameworks to provide data fusion rules have been recently proposed in order to provide cross-layer functions self-adaptable to the application dynamics [15], [69], [70]. A first improvement to a simple data aggregation function to take into account the spatial correlation is presented in [13]. In this strategy, the dependence on the distance among nodes is quantified by a decay function which may, e.g., decay exponentially with an increasing hop distance [13]. During the data aggregation, each reading is weighed by a decaying factor which decreases with the distance to its source. The framework can be extended by additionally accounting for temporal and semantic correlation. However, this remains an open and mostly unaddressed issue. In the following sections, we describe a selection of in- network aggregation functions according to our classification in Section II. We review the simplest methods first, and subsequently consider more complex approaches. At the end of the section, we discuss distributed source coding techniques which perform joint coding of correlated data from multiple sources in a distributed manner. A. TiNA [14] Temporal coherency-aware in-Network Aggregation [14] (TiNA) works on top of a routing tree (i.e., TAG or Cougar, see Section IV-A) having the data gathering point (sink) as its root. It exploits the temporal correlation in a sequence of sensor readings to reduce energy consumption by suppressing those values that do not affect the expected quality of the aggregated data. This is implemented through a TOLERANCE clause which is added to the SQL query. The tct parameter of this clause is used to specify the temporal coherency tolerance for the query. As an example, at a leaf node, each new available value, Vnew, is compared against the last reported data point, Vold. Vnew is transmitted (and aggregated) up the tree if and only if it satisfies the following requirement (data suppression rule): |Vnew − Vold| Vnew > tct, (2) TiNA uses the clause GROUP BY of the SQL query to decide how different messages shall be processed, i.e., two data points can only be aggregated if they belong to the same GROUP. The data gathering procedure executed at the internal nodes is as follows. They first gather and combine packets sent by their children. If a given node does not receive valid data from any of its children, it replaces the missing information using the last reported data from the same child (previously stored in its buffer). The node then considers its own reading. In case it can be aggregated with some other data in its buffer (they belong to the same GROUP), then the reading is aggregated with that data regardless of the tct value. Doing so, internal nodes can report their values more often than leaf nodes thus increasing the accuracy of the aggregation. On the other hand, if the internal node needs to create a new group, it does so and adds the new reading only if this data satisfies Eq. (2). The idea is that new groups are created only when the new measurements significantly differ from old data points (Eq. (2)). Moreover, in TiNA a very simple mechanism to counteract link failures is used. Children, when suppressing data, must send heartbeat messages to their parent at regular intervals. The cost of this message is low as it is just a notification packet. Thanks to these packets each parent knows whether its children are still alive. Thus it can infer whether the old readings are to be kept valid. In case of a missing notification, the appropriate child is discarded until the parent hears from that child again. These messages can also be used in case of mobile sensors as nodes change their location in the network. Finally, the periodic heartbeats allow children to reconnect to the data gathering tree in case of parent failure. B. DADMA [16] Data Aggregation and Dilution by Modulus Addressing (DADMA) [16] is a distributed data aggregation and dilution technique for sensor networks where nodes aggregate or dilute sensed values according to the rules given in an SQL statement. DADMA treats a wireless sensor network as a distributed relational database. This database has a single view which is created by joining records which are locally stored in the sensor nodes. This technique can be used over well known routing schemes such as Directed Diffusion [1] and LEACH [2], see Section IV-A. The sensor network database view (SNDV) is temporarily created and maintained at the sink node. The basic idea in DADMA is to aggregate data coming from a group of sensors or to exclude some sensors from the data gathering tree. These operations are carried out according to two simple rules. First, a user can retrieve a subset of data fields available in an SNDV and aggregate data by using the following aggregate m function: fa(x) = x div m. (3) Moreover, sensor nodes can be excluded from a query by a dilute m function as follows: fd(x) = (x/r) mod (m/r). (4)
  • 11. In the previous equations x is the grid location of a node with respect to one of the axes, r is the resolution in meters and m is the aggregation (or dilution) factor. As the sink sends a new query, it also specifies a based on field and a command that could be either aggregate or dilute. Each sensor node compares the result of its aggregation or dilution function with the based on value and decides its behavior. For instance, on receiving a dilute m command a node first uses Eq. (4) to calculate its location indices for both the horizontal and vertical axes (fd(x) and fd(y)). Subsequently, it compares these values with the x and y indices included in the based on field of the query. If they match, the sensor replies to the query. In a similar way, when an aggregate m command is received, the values measured by a sensor node are aggregated with those measured by the other nodes having the same indices. We observe that such a strategy is a practical way to take into account the spatial location of the nodes by, for instance, aggregating only those values coming from closely placed devices. The author in [16] studies the performance of DADMA by putting particular emphasis on the energy savings coming from the reduction of the number of transmissions and on the probability of event detection. Moreover, he devises a mechanism to achieve a good tradeoff among the cost, the accuracy, and the reliability in retrieving the wanted information. The same concepts are addressed in [71] where, in addition to the aggregation/dilution schemes, two location based hash functions are introduced to determine how the sensed data can be grouped or which sensors should be excluded from a query. C. Data Aggregation by means of Feedback Control [72] The authors of [72] define a strategy to tune the degree of data aggregation while maintaining specified latency bounds on data delivery and minimizing the energy consumption. They consider time-constrained reference scenarios dealing with real-time applications which impose specific time constraints to the delivery of sensor measurements. Data is grouped into different classes associated with different bounds on the delivery time. The aim is to guarantee the delivery of all data at the minimum energy cost while satisfying all time constraints. The data aggregation degree is adapted accordingly to meet these requirements. If the total communication load exceeds the system capacity, the amount of data has to be reduced (the data aggregation degree has to be increased), whereas the data aggregation degree may be relaxed in case of low traffic. In the former case, a so called lossy feedback loop mechanism assigns a data aggregation degree (d) on the basis of load and capacity estimates. This algorithm runs independently at each node. Specifically, d is defined as the ratio between the number of outgoing and incoming packets. For instance, if d = 0.66, three received packets have to be aggregated into two packets (e.g., by averaging two of them).2 In the limiting case where d = 1 no data aggregation is performed. Moreover, d is continuously adapted according to new load and capacity estimates. In addition, when the system operates in a non-overloaded regime, a further strategy called lossless 2Note that all packets have the same size in this case. feedback loop can be used to reduce the energy consumption. According to this scheme incoming messages are collected and transmitted in a single packet without data size reduction. This solution is interesting for two reasons: the control of the data aggregation is based on physical measurements of the network conditions, thus making the mechanism self-adaptable to the actual network dynamics. Second, it aims at satisfying time constraints that, in general, are rarely considered by wireless sensor network algorithms. This solution is extended in [15], where the authors define a complete data aggregation framework (AIDA), by considering general aggregation rules. D. Synopsis Diffusion Framework [8] A recent solution to the data aggregation problem has been proposed in [8]. The main contribution of the paper is to define aggregation functions and data structures which are robust to considering the same sensor readings in the data aggregation process multiple times (double-counting problem). This is crucial when data aggregation is used in conjunction with multi-path routing schemes (see Section IV-C). The approach defines order and duplicate insensitive (ODI) properties whose role is to make sure that the final result of the aggregation is independent of the routing topology. That is, the computed aggregate must be the same irrespective of the order in which the sensor readings are merged and the number of times they are considered in the aggregation process. A synopsis is defined as a summary of the partial result of the overall aggregation process received at a given node. Three functions on the synopses are possible to perform data aggregation: • Synopsis Generation: Given a sensor reading, a synopsis generation function SG(·) produces the corresponding synopsis for that data. • Synopsis Fusion: Given two synopses, a synopsis fusion function SF(·, ·) generates a new synopsis that summa- rizes both. • Synopsis Evaluation: Given a synopsis, a synopsis eval- uation function SE(·) yields up the final result. The exact implementations of the functions and the synopsis definitions are strictly related to the considered aggregation query. A simple and fast way to check whether a synopsis diffusion algorithm is ODI-correct is based on the following four properties: • Preserves duplicates: if two readings contain the same data values, the algorithm generates the same synopsis. • The synopsis function SF(·) is commutative: for any two synopses s1 and s2 we have that SF(s1, s2) = SF(s2, s1). • The synopsis function SF(·) is associative: for any triple (s1, s2, s3) we have that SF(s1, SF(s2, s3)) = SF(SF(s1, s2), s3). • The synopsis function SF(·) is same-synopsis idempo- tent: for any synopsis s we have that SF(s, s) = s. The four properties above are necessary and sufficient for ODI- correctness. More properties and examples can be found in the related paper [8], where the authors also discuss the advantages of their solution with respect to TAG [5].
  • 12. E. The Quantile Digest [21] Quantile Digest [21] (q-digest) is a data structure for representing sensor readings with an arbitrary degree of ap- proximation (trading data size for precision). The data com- pression algorithm adapts its behavior to the data distribution by automatically grouping the sensed data into variable size buckets of almost equal weight. As in [21], we assume that sensor readings are integer numbers falling within the range [1, σ]. A q-digest consists of a set of buckets of different sizes and their associated counts. More specifically, consider a complete binary tree T. In a q-digest, each element of the tree v ∈ T can be considered as a bucket with a specific range. For example, the range associated with the root of the q-digest is [1, σ] and its two children have ranges [1, σ/2] and [σ/2 + 1, σ], respectively. In addition, every bucket v ∈ T has a counter (count(v)) associated with it. The structure is recursive and ranges are halved as we proceed from the root to the leaves of the tree. A q-digest is simply a subset of the (complete) tree which only contains those elements with positive counts. For its construction, we say that an element of the original tree v ∈ T is in the q-digest if and only if it satisfies the following properties: q1) count(v) ≤ n/k , where n is the number of readings and k is the compression factor. This rule ensures that the internal (non-leaf) element v in the tree does not have a high count. q2) count(v) + count(vp) + count(vs) > n/k where vp and vs are the parent and the sibling of v, respectively. q3) Since there are no parent and sibling for the root it can violate property q2). A leaf node is instead allowed to violate property q1). In Fig. 7 we show an example illustrating how a q-digest is built. The example is the same described in [21]. n = 15 is the number of readings at any one sensor, which are compressed and summarized in the data structure. The leaf nodes, from left to right, represent the values 1, 2, . . . , 8 and the number inside the boxes represent the counts. The compression factor k is equal to 5 which means that the q-digest has n/k = 3 levels. Finally, σ = 8 is the size of the data interval, where we assume to collect integer values spanning from 1 to 8. Consider a set of n = 15 readings within this range, as shown in Fig. 7(a). The number of buckets needed to store all data is 7. In Fig. 7(a), the children of nodes a, c and d do not satisfy the digest property (q2). Hence, we compress their values into a single bucket by getting to the structure in Fig. 7(b). At this point, node e still does not satisfy property (q2). Hence, we compress the value therein by getting to Fig. 7(c). Now, node g still does not satisfy property (q2) and hence a further compression is needed. This last compression leads us to the q-digest in Fig. 7(d). Note that only 5 buckets are needed to store the final result, in spite of the 7 buckets that were originally needed to store the data without compression. As can be observed from this example, this procedure results in a larger loss of accuracy for the readings with a small count. The compression factor k is used to tune the procedure to the desired accuracy. It also affects the memory requirements for storing a q-digest [21]. For its practical implementation, the q-digest structure needs two functions: 1) to construct the q-digest and 2) to merge two or more q-digests. The first function is called compress as it takes the uncompressed q-digest, the number of readings n and the compression factor k as input and generates a compressed representation of the q-digest as output (see the above exam- ple). The second functionality is the merge function which is used for example when two sensors send their q-digests to the same parent. The parent merges these two q-digests into a single q-digest and adds its own values to the new structure. The merge function first takes the union of the two q-digests, which is obtained by adding the counts of the buckets with the same range. After this, it compresses the resulting q-digest by applying the compress function above. As soon as the q-digest structure has been built, each sensor packs it and transmits it to its parent (predecessor node) in the data gathering tree. In principle, this scheme can be used on top of any routing protocol that avoids loops and duplicates of the same packet. We observe, however, that the joint design of these data repre- sentation and compression techniques with routing algorithms is still a completely open research issue. F. Distributed Source Coding A recent paradigm to perform data aggregation exploits Distributed Source Coding (DSC). These techniques are based on the Slepian-Wolf theorem [73], which allows joint coding of correlated data from multiple sources and without explicit communication. This is possible as long as the individual source rates satisfy certain constraints about conditional en- tropies. These techniques require that the correlation struc- ture is available a priori at the independent encoders. Ref- erence [23] gives a good survey on DSC techniques and related open issues in this emerging field. The probably most important contribution to DSC was derived by Slepian and Wolf in their landmark paper [24]. A simple way to encode and transmit the data generated by two generic sources X and Y is to apply separate coding with total rate R1 + R2 = H(X) + H(Y ), where H(·) denotes the entropy of the data flow. If the two sources can communicate, then they could coordinate their coding operations and use together a total rate of H(X, Y ) ≤ R1 + R2. The authors in [24] showed that two correlated sources can be coded with a total rate equal to the joint entropy H(X, Y ) even though they are not able to communicate with each other, as long as their individual rates are at least equal to the conditional entropies H(X|Y ) and H(Y |X) respectively. Although different sources do not need to communicate with each other, they do need to have some common information about the correlation structure. Towards this end, the sink node may first collect a certain amount of data from the network, process it and subsequently send the proper correlation information to all sensors. Only after this operation, can each node start compressing its readings. The theory has been generalized and recently applied to wireless sensor networks. For instance, in [74] the authors fo- cus on LDPC codes which are well known for their capacity of approaching the Shannon limit; Slepian and Wolf proved that the theoretical limit H(X, Y ) can be reached with equality,
  • 13. but without devising practical schemes to approach it. In [75], the authors apply Slepian-Wolf coding in its simplest form by proving its effectiveness. Note that, in order for Slepian-Wolf decoding to be effective we need to have a good estimate of data correlation properties. Accordingly, the scheme in [75] uses an algorithm, running at the sink, to measure the actual data correlation. Then, a set of nodes is allowed to send compressed data, where the compression is achieved locally and decoding is performed in a centralized fashion at the data gathering node. At the sink, the uncompressed samples coming from the sensors that are not allowed to compress are used as the side information for decoding. Notably, this approach has the drawback that data is not aggregated along the path to the sink. Hence, further savings can be achieved by exploiting in network data fusion on top of the distributed per node data compression. Also, this approach might be affected by packet losses as, in such a case, the needed side information might not be fully available at the sink (decoding entity). In the paper, the authors discuss these issues but without giving detailed results. In [76], the authors present and solve the minimum cost data gathering tree problem. The network is modeled as a graph G = (V, E), where V and E are the set of vertices (nodes) and edges, respectively. Slepian-Wolf coding is used at every node. Moreover, a communication cost we is associated with each edge e ∈ E. The cost function is assumed to be separable, i.e., f(xe, we) = xewe, where xe is the amount of information to be sent over edge e and we is the edge cost (e.g., transmission power). The minimum cost data gathering tree problem consists of finding the spanning tree of G and the rate allocation for each node in V that minimize the cost function of the network (i.e., the sum of the costs of all links). The shortest path tree is optimal for any rate allocation and thus the optimization problem can be separated into a spanning tree and a rate allocation optimization subproblems. [76] gives exact algorithms to solve both of them. Overall, the results in [76] allow to code the data in a completely distributed fashion by exploiting the side information in a recursive manner. The main drawback of this scheme is that it involves the calculation of an SPT and that it requires the full (centralized) knowledge of the data correlation structure for all nodes in the network to express the rate constraints. Lossless encoders can then separately and independently encode data at each node as efficiently as if its encoder would see the data values sent by all other nodes. Notably, the scheme’s inability to tolerate failures may eliminate this advantage. In fact, if the encoded bits from one node are lost, the sink may not be able to reconstruct several sensor values. The authors of reference [77] highlight the drawbacks of previous approaches [75] [76] when the network is error prone and, as a partial solution, propose to exploit multi-path routing schemes. The advantages of their approach come at the cost of a higher energy consumption to setup/maintain multiple trees and to transmit multiple copies (extra overhead) of the same message. In summary, DSC effectively makes routing and coding decisions independent of each other. On the downside, how- ever, this solution increases the computational complexity and requires the collection of information about joint statistics, which may not always be easy in practice. VI. DISCUSSIONS AND CONCLUSIONS In this paper we have presented a detailed review of in- network aggregation techniques for wireless sensor networks. One of the main design aspects for sensor network archi- tectures is energy efficiency, to keep the network operational as long as possible. Therefore, aggregation techniques are an essential building block, as they aim at reducing the number of transmissions required for data collection which, in turn, reduces energy consumption. In this survey, we have provided a definition of in-network data aggregation and identified its key elements: data dissem- ination and query mechanisms (with particular focus on the routing and MAC layer), aggregation functions, and data struc- ture. Fig. 8 and Fig. 9 summarize the basic characteristics of the presented solutions and provide a qualitative comparison. By its very nature, in-network aggregation concerns several layers of the protocol stack, and any efficient solution is likely to require a cross-layer design. However, we note that most of the existing research focuses on networking issues such as routing, often considering only very simple approaches to aggregate data. In addition, much work still remains to be done to provide cross-layer solutions, accounting for application, data representation, routing and MAC aspects. In fact, the schemes proposed so far often focus on only a subset of these aspects, typically trying to merge routing and data aggregation techniques, but ignoring MAC, application or data representa- tion issues. Finally, another aspect still not deeply investigated concerns the memory and the computational resources allow to sustain data aggregation processing [78]. For routing, many protocols are based on clustering. A major advantage of a clustered structure is that it directly allows aggregation of data at the cluster head. Such algorithms work well in relatively static networks where the cluster structure remains unchanged for a sufficiently long time, but they may be fragile when used in more dynamic environments. Often, the cost required to maintain the hierarchical struc- ture is substantial. Similar considerations apply to tree-based schemes. Initial work addresses some of these problems [53], [79] but further research efforts are required to keep a network functional under mobility. This last aspect is in fact largely unexplored, and it is not clear how different protocols perform in its presence. Also, multi-path algorithms may be able to deal with (limited) topology changes due to their higher robustness [8]. An interesting alternative research direction is provided by reactive and localized routing protocols [7]. This study is also one of the very few that take MAC layer issues into account [7], [65]. We stress that without such a joint design, the performance gained at the routing layer may be partially lost due to MAC inefficiencies. Hybrid algorithms allow to combine the properties of different approaches. This is the case for the algorithm in [9], which provides a good tradeoff between tree-based and multi-path schemes. Hybrid algorithms allow to tune the degree of aggregation and may facilitate the adaptation of the aggregation scheme (e.g., to the packet loss probability). For these reasons, they are particularly suitable for the design of schemes that are able to adapt to application needs.
  • 14. As discussed above, only very few studies provide a deeper analysis of the aggregation functions. Previous work mostly takes spatial correlation [13], [80] and temporal correla- tion [14] of data into account, but semantic correlation is not sufficiently well studied. In this context, distributed source coding is a fairly recent and very promising research area. However, while many theoretical results are known, few of them have been turned into practical algorithms applicable to wireless sensor networks. REFERENCES [1] C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva, “Directed diffusion for wireless sensor networking,” IEEE/ACM Trans. Networking, vol. 11, no. 1, pp. 2–16, Feb. 2002. [2] W. B. Heinzelman, A. P. Chandrakasan, and H. Balakrishnan, “An application-specific protocol architecture for wireless microsensor net- works,” IEEE Trans. Wireless Commun., vol. 1, no. 4, pp. 660–670, Oct. 2002. [3] S. Lindsey, C. Raghavendra, and K. M. Sivalingam, “Data Gathering Algorithms in Sensor Networks using Energy Metrics,” IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 9, pp. 924–935, Sep. 2002. [4] Y. Yao and J. Gehrke, “Query processing for sensor networks,” in ACM CIDR 2003, Asilomar, CA, US, Jan. 2003. [5] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, “TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks,” in OSDI 2002, Boston, MA, US, Dec. 2002. [6] Y. Xu, J. Heidemann, and D. Estrin, “Geographic-Informed Energy Conservation for Ad Hoc Routing,” in ACM/SIGMOBILE MobiCom 2001, Rome, Italy, Jul. 2001. [7] G. Di Bacco, T. Melodia, and F. Cuomo, “A MAC Protocol for Delay- Bounded Applications in Wireless Sensor Networks,” in Med-Hoc-Net 2004, Bodrum, Turkey, Jun. 2004. [8] S. Nath, P. B. Gibbons, Z. R. Anderson, and S. Seshan, “Synopsis Diffusion for Robust Aggregation in Sensor Networks,” in ACM SenSys 2004, Baltimore, MD, US, Nov. 2004. [9] A. Manjhi, S. Nath, and P. B. Gibbons, “Tributaries and Deltas: Efficient and Robust Aggregation in Sensor Network Stream,” in ACM SIGMOD 2005, Baltimore, MD, US, Jun. 2005. [10] I. Solis and K. Obraczka, “The Impact of Timing in Data Aggregation for Sensor Networks,” in IEEE ICC 2004, Paris, France, Jun. 2004. [11] F. Hu, C. Xiaojun, and C. May, “Optimized Scheduling for Data Aggregation in wireless sensor networks,” in IEEE ITCC 2005, Las Vegas, NV, US, Apr. 2005. [12] X. Jianbo, Z. Siliang, and Q. Fengjiao, “A new In-network data aggregation technology of wireless sensor networks.” in IEEE SKG’06, Guilin, China, Nov. 2006. [13] E. Cohen and H. Kaplan, “Spatially-Decaying Aggregation Over a Network: Model and Algorithms,” in ACM SIGMOD 2004, Paris, France, Jun. 2004. [14] A. Sharaf, J. Beaver, A. Labrinidis, and K. Chrysanthis, “Balancing energy efficiency and quality of aggregate data in sensor networks,” The VLDB Journal, vol. 13, no. 4, pp. 384–403, Dec. 2004. [15] T. He, B. M. Blum, J. A. Stankovic, and T. Abdelzaher, “AIDA: Adaptive Apllication-Independent data aggregation in wireless sensor networks,” ACM Trans. on Embedded Computing Systems, vol. 3, no. 2, pp. 426–457, May 2004. [16] E. Cayirci, “Data Aggregation and Dilution by modulus addressing in wireless sensor networks,” IEEE Communications Letters, vol. 7, no. 8, pp. 355–357, Aug. 2003. [17] D. Petrovic, R. C. Shah, K. Ramchandran, and J. Rabaey, “Data Funneling: Routing with Aggregation and Compression for Wireless Sensor Networks,” in IEEE SNPA 2003, Anchorage, AK, US, May 2003. [18] M. Riedewald, D. P. Agrawal, and A. El Abbadi, “pCube: Update- efficient online aggregation with progressive feedback and error bounds,” in IEEE SSDBM 2000, Berlin, Germany, Jul. 2000. [19] L. Huang, B. Zhao, A. Joseph, and J. Kubiatowicz, “Probabilistic data aggregation in distributed networks,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2006-11, February 6 2006. [Online]. Avail- able: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006- 11.html [20] X. Wu and Z. Tian, “Optimized Data Fusion in Bandwidth and Energy Constrained Sensor Networks.” in IEEE ICASSP 2006, Toulouse, France, May 2006. [21] N. Shrivastava, C. Buragohain, D. P. Agrawal, and S. Suri, “Medians and Beyond: New Aggregation Techniques for Sensor Networks,” in ACM SenSys 2004, Baltimore, MD, US, Nov. 2004. [22] A. Bezenchek, M. Rafanelli, and L. Tininini, “A data structure for rep- resenting aggregate data,” in IEEE SSDBM 1996, Stockholm, Sweden, Jun. 1996. [23] Z. Xiong, A. D. Liveris, and S. Cheng, “Distributed source coding for sensor networks,” IEEE Signal Processing Mag., vol. 21, no. 5, pp. 80–94, Sep. 2004. [24] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inform. Theory, vol. 19, no. 4, pp. 471–480, Jul. 1973. [25] R. M. Karp, Reducibility among combinatorial problems, in: Complex- ity of Computer Computations, R. Miller, J.W. Thatcher , Ed. New York, US: Plenum Press, 1972. [26] B. Krishnamachari, D. Estrin, and S. Wicker, “The Impact of Data Aggregation in Wireless Sensor Networks,” in IEEE ICDCS 2002, Vienna, Austria, Jul. 2002. [27] S. Pattem, B. Krishnamachari, and R. Govindan, “The Impact of Spatial Correlation on Routing with Compression in Wireless Sensor Networks,” in ACM/IEEE IPSN 2004, Berkeley, CA, US, Apr. 2004. [28] Y. Zhu, K. Sundaresan, and R. Sivakumar, “Practical Limits on Achiev- able Energy Improvements and Useable Delay Tolerance in Correlation Aware Data Gathering in Wireless Sensor Networks,” in IEEE SECON 2005, Santa Clara, CA, US, Sep. 2005. [29] Y. Zhu, R. Vedantham, S. J. Park, and R. Sivakumar, “A Scalable Cor- relation Aware Aggregation Strategy for Wireless Sensor Networks,” in IEEE WICON 2005, Budapest, Hungary, Jul. 2005. [30] P. von Rickenbach and R. Wattenhofer, “Gathering correlated data in sensor networks,” in ACM DIALM-POMC 2004, Philadelphia, PA, USA, Oct. 2004. [31] R. Cristescu, B. Beferull-Lozano, M. Vetterli, and R. Wattenhofer, “Network correlated data gathering with explicit communication: Np- completeness and algorithms,” in submitted to IEEE/ACM Transactions on Networking. [32] K. Akkaya and M. Younis, “A survey of routing protocols in wireless sensor networks,” Elsevier Ad Hoc Network Journal, vol. 3, no. 3, pp. 325–349, May 2005. [33] J. N. Al-Karaki and A. E. Kamal, “Routing techniques in wireless sensor networks: a survey,” IEEE Wireless Communications, vol. 11, no. 6, pp. 6–28, Dec. 2004. [34] I. Solis and K. Obraczka, “Isolines: energy-efficient mapping in sensor networks,” in IEEE ISCC 2005, Cartagena, Spain, Jun. 2005. [35] V. Erramilli, I. Matta, and A. Bestavros, “On the interaction between data aggregation and topology control in wireless sensor networks,” in IEEE SECON 2004, Santa Clara, CA, US, Oct. 2004. [36] M. Ding, X. Cheng, and G. Xue, “Aggregation Tree Construction in Sensor Networks,” in IEEE VTC 2003, Orlando, FL, US, Oct. 2003. [37] Y. Yu, B. Krishnamachari, and V. Prasanna, “Energy-Latency Tradeoffs for Data Gathering in Wireless Sensor Networks,” in IEEE Infocom 2004, Hong Kong, Mar. 2004. [38] H. Luo, J. Luo, Y. Liu, and S. Das, “Energy efficient routing with adaptive data fusion in sensor Networks.” in Third ACM/SIGMOBILE Workshop on Foundations of Mobile Computing, Cologne, Germany, Aug. 2005. [39] K. Dasgupta, K. Kalpakis, and P. Namjoshi, “An efficient Clustering- based heuristic for data gathering and aggregation in sensor networks,” in IEEE WCNC 2003, New Orleans, LA, US, Mar. 2003. [40] H. Albert, R. Kravets, and I. Gupta, “Building Trees Based On Aggregation Efficiency in Sensor Networks,” in Med-Hoc-Net 2006, Lipari, Italy, Jun. 2006. [41] J. N. Al-Karaki, R. Ul-Mustafa, and A. E. Kamal, “Data aggregation in wireless sensor networks: exact and approximate algorithms,” in IEEE HPSR 2004, Phoenix, AZ, US, Apr. 2004. [42] G. Hartl and B. Li, “infer: A Bayesian Approach towards Energy Efficient Data Collection in Dense Sensor Networks,” in IEEE ICDCS 2005, Columbia, OH, US, Jun. 2005. [43] H. O. Tan and I. Korpeoglu, “Power Efficient Data gathering and aggregation in wireless sensor networks,” ACM SIGMOD Record, vol. 32, no. 4, pp. 66–71, Dec. 2003. [44] H. Cheng, Q. Liu, and X. Jia, “Heuristic algorithms for real-time data aggregation in wireless sensor networks.” in ACM IWCCC 2066, Vancouver, British Columbia, Canada, Jul. 2006.
  • 15. [45] Y. Hu, N. Yu, and X. Jia, “Energy efficient real-time data aggregation in wireless sensor networks.” in ACM IWCCC 2006, Vancouver, British Columbia, Canada, Jul. 2006. [46] J. Choi, J. W. Lee, K. Lee, S. Choi, W. H. Kwon, and H. S. Park, “Aggregation Time Control Algorithm for Time constrained Data Delivery in Wireless Sensor Networks.” in IEEE VTC 2006-Spring, Melbourne, Australia, May 2006. [47] H. Gupta, V. Navda, S. R. Das, and V. Chowdhary, “Efficient Gathering of Correlated Data in Sensor Networks,” in ACM MobiHoc 2005, Urbana-Champaign, IL, US, May 2005. [48] B. Zhou, L. H. Ngoh, B. S. Lee, and C. P. Fu, “A Hierarchical Scheme for Data Aggregation in Sensor Network,” in IEEE ICON 2004, Singapore, Nov. 2004. [49] C. Intanagonwiwat, D. Estrin, R. Govindan, and J. Heidemann, “Impact of Network Density on Data Aggregation in Wireless Sensor Net- works,” in IEEE ICDCS 2002, Vienna, Austria, Jul. 2002. [50] M. Lee and V. W. S. Wong, “An energy-aware spanning tree algorithm for data aggregation in wireless sensor networks,” in IEEE PacRrim 2005, Victoria, BC, Canada, Aug. 2005. [51] IEEE LAN MAN Standards, Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications”, ANSI/IEEE Std., March 1999. [52] R. Kannan and S. S. Iyengar, “Game-theoretic models for reliable path- length and energy-constrained routing with data aggregation in wireless sensor networks,” IEEE J. Select. Areas Commun., vol. 22, no. 6, pp. 1141–1150, Aug. 2004. [53] H. S. Kim, T. F. Abdelzaher, and W. H. Kwon, “Minimum-Energy Asynchronous Dissemination to Mobile Sinks in Wireless sensor networks,” in ACM SenSys 2003, Los Angeles, CA, US, Nov. 2003. [54] T. Banerjee, K. Chowdhury, and D. P. Agrawal, “Tree based data aggregation in sensor networks using polynomial regression,” in IEEE FUSION 2005, Philadelphia, PA, US, Jul. 2005. [55] Y. Yang, X. Wang, S. Zhu, and G. Cao, “SDAP: A Secure HopbyHop Data Aggregation Protocol for Sensor Networks.” in ACM MobiHoc 2006, Florence, Italy, May 2006. [56] Y. Sang, H. Shen, Y. Inoguchi, Y. Tan, and N. Xiong, “Secure Data Aggregation in Wireless Sensor Networks: A Survey.” in PDCAT’06, Taipei, Taiwan, Dec 2006. [57] A. Mahimkar and T. S. Rappaport, “SecureDAV: A secure data aggregation and verification protocol for sensor networks,” in IEEE GLOBECOM 2004, Dallas, TX, US, Nov. 2004. [58] H. Chen, H. Mineno, and T. Mizuno, “A Meta-Data-Based Data Aggregation Scheme in Clustering Wireless Sensor Networks.” in IEEE/ACM MDM 2006, Nara, Japan, May 2006. [59] Y. Yao and J. Gehrke, “The cougar approach to in-network query processing in sensor networks,” ACM SIGMOD Record, vol. 31, no. 3, pp. 9–18, Sep. 2002. [60] T. Pham, E. J. Kim, and M. Moh, “On data aggregation quality and energy efficiency of wireless sensor network protocols,” in IEEE BroadNets 2004, San Jos´e, CA, US, Oct. 2004. [61] H. Cam, S. Ozdemir, P. Nair, and D. Muthuavinashiappan, “ESPDA: Energy-efficient and secure pattern-based data aggregation for wireless sensor networks,” IEEE SENSORS 2003, Oct. 2003. [62] M. Lotfinezhad and B. Liang, “Effect of partially correlated data on clustering in wireless sensor networks,” in IEEE SECON 2004, Santa Clara, CA, US, Oct. 2004. [63] O. Younis and S. Fahmy, “Distributed clustering in ad-hoc sensor networks: a hybrid, energy-efficient approach,” in IEEE Infocom 2004, Hong Kong, Mar. 2004. [64] V. Mhatre and C. Rosenberg, “Design guidelines for wireless sensor networks: Communication, clustering and aggregation,” Elsevier Ad Hoc Networks Journal, vol. 2, no. 1, pp. 45–63, Jan. 2004. [65] P. Popovski, F. H. P. Fitzek, H. Yomo, T. K. Madsen, R. Prasad, and N. J. Vej, “MAC-layer approach for cluster-based aggregation in sensor networks,” in IEEE IWWAN’04, Oulu, Finland, May 2004. [66] K.-W. Fan, S. Liu, and P. Sinha, “Scalable data aggregation for dynamic events in sensor networks.” in ACM SenSys 2006, Boulder, CO, US, Oct. 2006. [67] S. Chen and Z. Zhang, “Localized algorithm for aggregate fairness in wireless sensor networks.” in ACM/SIGMOBILE MobiCom 2006, Los Angeles, CA, US, Sep. 2006. [68] S. Motegi, K. Yoshihara, and H. Horiuchi, “DAG Based In-Network Aggregation for Sensor Network Monitoring,” in IEEE SAINT 2006, Phoenix, AZ, US, Jan. 2006. [69] U. Ramachandran, R. Kumar, M. Wolenetz, B. Cooper, B. Agarwalla, J. Shin, P. Hutto, and A. Paul, “Dynamic data fusion for future sensor networks.” ACM Transactions on Sensor Networks, vol. 2, no. 3, pp. 404–443, Aug. 2006. [70] M. Wenz and H. Worn, “Event-based Production Rules for Data Aggre- gation in Wireless Sensor Networks.” in IEEE MFI 2006, Heidelberg, Germany, Sep. 2006. [71] E. Cayirci and T. Coplu, “Distributed spatial data aggregation and dilution based on hashing and relational algebra in wireless sensor networks,” in IEEE ISSNIP 2004, Melbourne, Australia, Dec. 2004. [72] T. Abdelzaher, T. He, and J. Stankovic, “Feedback control of data aggregation in sensor networks,” in IEEE CDC 2004, Atlantis, Paradise Island, Bahamas, Dec. 2004. [73] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley, 1991. [74] M. Sartipi and F. Fekri, “Source and channel coding in wireless sensor networks using LDPC codes,” in IEEE SECON 2004, Santa Clara, CA, US, Oct. 2004. [75] J. Chou, D. Petrovic, and K. Ramchandran, “A Distributed and Adap- tive Signal Processing Approach to Reducing Energy Consumption in Sensor Networks,” in IEEE Infocom 2003, San Francisco, CA, US, Mar. 2003. [76] R. Cristescu, B. Beferull-Lozano, and M. Vetterli, “On Network Correlated Data Gathering,” in IEEE Infocom 2004, Hong Kong, Mar. 2004. [77] S. Coleri and P. Varaiya, “Fault Tolerant and Energy Efficient Routing for Sensor Networks,” in IEEE GLOBECOM 2004, Dallas, TX, US, Nov. 2004. [78] J. Verd, J. Garca, M. Nemirovsky, and M. Valero, “The impact of traffic aggregation on the memory performance of networking applications.” in ACM MEDEA 2004, Antibes Juan-les-Pins, France, Oct. 2004. [79] K.-I. Hwang, J. In, and D.-S. Eom, “Distributed Dynamic Shared Tree for Minimum Energy Data Aggregation of Multiple Mobile Sinks in Wireless Sensor Networks.” in IEEE EWSN06, Zurich, Switzerland, Feb. 2006. [80] M. Sharifzadeh and C. Shahabi, “Supporting Spatial Aggregation in sensor network database,” in ACM GIS 2004, Washington, DC, US, Nov. 2004.
  • 16. Elena Fasolo (fasoloel@dei.unipd.it) was born in Padova, Italy, in 1980. She received the Laurea Degree in Telecommunications Engineering from the University of Padova, in 2004. She is currently a PhD Student at the School in Information Engineering at the Information Engineering Department, University of Padova, Italy, and she is spending six months at the NTT DoCoMO European Laboratories to do research on network coding techniques for wireless networks. Her main interests are on networking and MAC protocol layers especially on efficient data gathering and dissemination schemes. Michele Rossi (rossi@dei.unipd.it) was born in Ferrara, Italy, in 1974. He received both the Laurea degree and the Ph.D. in Information Engineering from the University of Ferrara, Italy, in 2000 and 2004, respectively. During 2003 he was on leave at the Center for Wireless Communications at the University of California San Diego, US. Since November 2005 he is an Assistant Professor at the Department of Information Engineering of the University of Padova, Italy. His research interests include channel access, routing and data distribution algorithms for wireless networks. J¨org Widmer (widmer@docomolab-euro.com) is senior manager at DoCoMo Euro-Labs in Munich, Germany. He leads the Self-Organized Ambient Networking research group, working on several projects in the area of wireless ad-hoc and sensor networks. Before joining DoCoMo, he spent two years as postdoctoral researcher at EPFL, Switzerland, doing research on ultra-wide band networks. Joerg Widmer obtained his M.S. degree and Ph.D. degree in computer science from the University of Mannheim, Germany in 2000 and 2003, respectively. In 1999/2000, he was at the ICSI Center for Internet Research, Berkeley, CA working on equation-based congestion control. His research interests include efficient algorithms for wireless ad-hoc and sensor networks, network coding, and transport protocols. Michele Zorzi (zorzi@ing.unife.it) was born in Venice, Italy, in 1966. He received the Laurea Degree and the Ph.D. in Electrical Engineering from the University of Padova, Italy, in 1990 and 1994, respectively. During the Academic Year 1992/93, he was on leave at the University of California, San Diego (UCSD), attending graduate courses and doing research on multiple access in mobile radio networks. In 1993, he joined the faculty of the Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy. After spending three years with the Center forWireless Communications at UCSD, in 1998 he joined the School of Engineering of the University of Ferrara, Italy, where he became a Professor in 2000. Since November 2003, he has been on the faculty at the Information Engineering Department of the University of Padova. His present research interests include performance evaluation in mobile communications systems, random access in mobile 18 radio networks, ad hoc and sensor networks, and energy constrained communications protocols. Dr. Zorzi is the Editor-In-Chief of the IEEE Wireless Communications Maga- zine, and currently serves on the Editorial Boards of the IEEE Transactions on Communications, the IEEE Transactions on Wireless Communications, the IEEE Transactions on Mobile Computing, the Wiley Journal of Wireless Communications and Mobile Computing and the ACM/URSI/Kluwer Journal of Wireless Networks. He was also guest editor for special issues in the IEEE Personal Communications Magazine (Energy Management in Personal Communications Systems) and the IEEE Journal on Selected Areas in Communications (Multi- media Network Radios).
  • 17. Figure and Table captions Fig. 1 Diagram for in-network aggregation techniques and their relation with different protocol layers. We stress that in general data processing also interacts with the Application, MAC and PHY layers. Fig. 2 A simplified scheme for Directed Diffusion [1]. Fig. 3 A message exchange example in DB-MAC. Fig. 4 LEACH clustering approach. Fig. 5 Examples of aggregation paths over a ring structure. Fig. 6 Example of data gathering regions in Tributary and Delta. Fig. 7 Q-digest example [21]: the complete tree T is derived by a recursive binary splitting of the original (root) interval [1, σ]. The q-digest consists of the non-empty boxes of the data structure in sub-figure (d). Fig. 8 Summary of the basic characteristics of the routing protocols. Fig. 9 Summary of the basic characteristics of the data aggregation functions.
  • 18. Data representations Data aggregation functions Routing protocols Applications MAC and PHY layers Data representations Data aggregation functions Routing protocols Applications MAC and PHY layers Fig. 1. Diagram for in-network aggregation techniques and their relation with different protocol layers. We stress that in general data processing also interacts with the Application, MAC and PHY layers.
  • 19. c b Source Sink (a) Interest dissemination c b a Source Sink (b) Gradients set up c b a Source Sink (c) Data delivery along the reinforced path a c b Source Sink (a) Interest dissemination c b a Source Sink (b) Gradients set up c b a Source Sink (c) Data delivery along the reinforced path a Fig. 2. A simplified scheme for Directed Diffusion [1].
  • 20. S1 S2 R1 R2 Sink direction Pr = 10 Pr = 9 RTS(10) CTS(10) CTS(10) S1 S2 R1 R2 Sink direction Pr = 10 Pr = 9 RTS(10) CTS(10) CTS(10) Fig. 3. A message exchange example in DB-MAC.
  • 22. Sink G I DF CB A R1 R2 R3 R4 H E Sink G I DF CB A R1 R2 R3 R4 H E Fig. 5. Examples of aggregation paths over a ring structure.
  • 23. T1 T4 T2 T3 M2 M1 T5 M3 M4 M5 T1 T4 T2 T3 M2 M1 M M3 M4 M5 Delta region Expanded Delta region SINK SINK T1 T4 T2 T3 M2 M1 T5 M3 M4 M5 T1 T4 T2 T3 M2 M1 M M3 M4 M5 Delta region Expanded Delta region T1 T4 T2 T3 M2 M1 T5 M3 M4 M5 T1 T4 T2 T3 M2 M1 M M3 M4 M5 Delta region Expanded Delta region SINK SINK Fig. 6. Example of data gathering regions in Tributary and Delta.
  • 24. n=15, k=5, σ=8 11 64 111 a b c root 1 4 6 2 2 e f d 4 6 2 2 1 g1 4 6 2 2 (a) (d) (b) (c) Fig. 7. Q-digest example [21]: the complete tree T is derived by a recursive binary splitting of the original (root) interval [1, σ]. The q-digest consists of the non-empty boxes of the data structure in sub-figure (d).
  • 25. AsynchronousAsynchronousPeriodic per hopPeriodic per hopAsynchronousPeriodic per hopAsynchronousPeriodic per hop adjusted Timing strategy NoneNoneLocal route repairs Rotation of the cluster-head, sleeping periods NoneRotation of the leader NoneSleeping periods Energy saving methods MediumHighLowLowHighVery LowMediumLowResilience in case of node mobility MediumHighLowLowHighVery LowMediumLowScalability MediumMediumMediumMediumLowHighHighHigh Overhead to setup/maintain the aggregation structure HighHighMediumLowMediumLowMediumMediumResilience to link failures Tree/Multi-path based, driven by the sink Multi-path based, on-line, distributed Cluster-based, on-line, distributed synchronous Cluster-based, on-line, distributed Completely distributed, asynchronous Chain-based, centralized or distributed Tree-based, on- line, driven by the sink Tree-based, on-line, driven by the sinkAggregation method Tributaries and Deltas [9] Synopsis Diffusion [8] COUGAR [4] [49] LEACH [2] DB-MAC [7] PEGASIS [3] Directed Diffusion [1] TAG [5] AsynchronousAsynchronousPeriodic per hopPeriodic per hopAsynchronousPeriodic per hopAsynchronousPeriodic per hop adjusted Timing strategy NoneNoneLocal route repairs Rotation of the cluster-head, sleeping periods NoneRotation of the leader NoneSleeping periods Energy saving methods MediumHighLowLowHighVery LowMediumLowResilience in case of node mobility MediumHighLowLowHighVery LowMediumLowScalability MediumMediumMediumMediumLowHighHighHigh Overhead to setup/maintain the aggregation structure HighHighMediumLowMediumLowMediumMediumResilience to link failures Tree/Multi-path based, driven by the sink Multi-path based, on-line, distributed Cluster-based, on-line, distributed synchronous Cluster-based, on-line, distributed Completely distributed, asynchronous Chain-based, centralized or distributed Tree-based, on- line, driven by the sink Tree-based, on-line, driven by the sinkAggregation method Tributaries and Deltas [9] Synopsis Diffusion [8] COUGAR [4] [49] LEACH [2] DB-MAC [7] PEGASIS [3] Directed Diffusion [1] TAG [5] Algo. Char. Algo. Char. Fig. 8. Summary of the basic characteristics of the routing protocols.
  • 26. √ (Temporal, Spatial) √ / XX√ (Spatial) √ (Temporal) Correlation awareness LowHighLowMediumHighResilience to losses/failures √X√ / X√XDuplicate sensitive √ / X√ / X√ / X√√Lossy aggregation Q-digest [18] Synopsis Diffusion [8] AIDA [58] DADMA [15] TiNA [13] √ (Temporal, Spatial) √ / XX√ (Spatial) √ (Temporal) Correlation awareness LowHighLowMediumHighResilience to losses/failures √X√ / X√XDuplicate sensitive √ / X√ / X√ / X√√Lossy aggregation Q-digest [18] Synopsis Diffusion [8] AIDA [58] DADMA [15] TiNA [13] Algo. Char. Algo. Char. Fig. 9. Summary of the basic characteristics of the data aggregation functions.
  翻译: