Sie sind auf Seite 1von 32

Adaptive Cleaning for

RFID Data Streams


Shawn Jeffery
UC Berkeley

Minos Garofalakis
Michael Franklin
Intel Research Berkeley
UC Berkeley

Presented by: Hamid Haidarian Shahri

Where Are We? Look at the Signs!

Looking at Signs Before Jumping In


S. Chaudhuri, U. Dayal, "An Overview of
Data Warehousing and OLAP Technology,"
SIGMOD Record, 1997.
800+ citations

DW and information integration


Data cleaning term publicized
Identified its importance in integration

Extensive research followed

VLDB 2001

Session R12: DATA QUALITY & CLEANING

Declarative data cleaning: language, model, and


algorithms
Helena Galhardas (INRIA Rocquencourt), Daniela Florescu
(Propel), Dennis Shasha (NYU), Eric Simon, and CristianAugustin Saita (INRIA Rocquencourt)
Potter's wheel: an interactive data cleaning system
Vijayshankar Raman and Joseph M. Hellerstein
(University of California at Berkeley)
Update propagation strategies for improving the quality of
data on the Web
Alexandros Labrinidis and Nick Roussopoulos (University
of Maryland)

Data Cleaning Previous Work - 2006

Hamid Haidarian Shahri, S.H. Shahri, Eliminating Duplicates in


Information Integration: An Adaptive, Extensible Framework," IEEE
Intelligent Systems, Vol. 21, No. 5, 2006.

Putting Things into Context


Data cleaning required after integration
No unified standard across sources
NOW: sensor/hardware errors inevitable;
research opportunity
Data modeling (Amol Deshpande)
An important use case is cleaning

VLDB 2006 Three weeks ago

Research Session 5: Sensor Data(dedicated to cleaning!)

Title: Adaptive Cleaning for RFID Data Streams

Title: A Deferred Cleansing Method for RFID Data Analytics

Authors: Shawn R. Jeffery, Minos Garofalakis, Michael J.


Franklin
Authors: Jun Rao, Sangeeta Doraiswamy, Hetal Thakkar, Latha S.
Colby

Title: Online Outlier Detection in Sensor Data Using NonParametric Models

Authors: Sharmila Subramaniam, Themis Palpana, Dimitris


Papadopoulos, Vana Kalogeraki, Dimitrios Gunopulos

RFID: Radio Frequency IDentification

RFID data is dirty


A simple experiment:
2 RFID-enabled
shelves
10 static tags
5 mobile tags

RFID Data Cleaning


RFID data has many dropped readings
Typically, use a smoothing filter to interpolate
SELECT
SELECT distinct
distinct tag_id
tag_id
But,
how
to
set
the
size
But,
how
to
set
the
FROM
RFID_stream
[RANGE
5
sec]
FROM RFID_stream [RANGE 5 size
sec]
GROUP
GROUP BY
BY tag_id
tag_id
of
the window?

of the window?

Smoothed
output
Smoothing Filter

Raw
readings

Time

Window Size for RFID Smoothing


Fido moving

Fido resting

Reality
Reality
Raw
Raw readings
readings
Small
Small window
window
Large
Large window
window

Need
Need to
to balance
balance completeness
completeness vs.
vs.
capturing
capturing tag
tag movement
movement

Truly Declarative Smoothing


Problem: window size non-declarative
Application wants a clean stream of data
Window size is how to get it

Solution: adapt the window size in


response to data

Itinerary
Introduction: RFID data cleaning
A statistical sampling perspective
SMURF
Per-tag cleaning
Multi-tag cleaning

Ongoing work
Conclusions

A Statistical Sampling Perspective


Key Insight:
RFID data
random sample of present tags
Map RFID smoothing to a sampling
experiment

RFIDs Gory Details


Antenna & reader

Tags
Read Cycle
Cycle
Read
(Epoch)
(Epoch)

E0

E1

E2

E3

E4

E5

E6

E7

E8

E9

Tag List
Tag 1

Epoch

TagID

ReadRate

Tag 2

.9

.6

.3

Tag 3
Tag 4
(For Alien readers)

RFID Smoothing to Sampling


RFID

Sampling

Read cycle (epoch)

Sample trial

Reading

Single sample

Smoothing window

Repeated trials

Read rate

Probability of inclusion (pi)

Now use sampling theory to drive adaptation!

SMURF
Statistical Smoothing for Unreliable RFID Data
Adapts window based on statistical properties
Mechanisms for:

Per-tag and multi-tag cleaning

Per-Tag Smoothing: Model and


Background
Use a binomial sampling model
1

Si

pi

piavg

(Read rate
of tag i)

0
E0

E1

E2

E3

E4

E5

E6

E7

E8

Smoothing Window
wi Bernoulli trials

E9

Time
(epochs)

Per-Tag Smoothing: Completeness

If the tag is there, read it with high probability


Want a large window
1
pi

0
E0

Reading with a
low pi

E1

E2

E3

E4

E5

E6

E7

E8

E9

Time
(epochs)

Expand the window

Per-Tag Smoothing: Completeness

1
wi avg
pi

* ln

Desired window
size for tag
i
With probability
Expected
epochs
needed to read 1-

Per-Tag Smoothing: Transitions


Detect transitions as statistically
significant changes in the data

The tag has likely left by


this point

1
pi

0
E0

E1

E2

E3

Statistically significant
difference

E4

E5

E6

E7

E8

E9

Time
(epochs)

Flag a transition and


shrink the window

Per-Tag Smoothing: Transitions


Statistically significant

|| Si | wi * p

avg
i

# observed # expected
readings
readings

| 2 wi * p

avg
i

* (1 p

avg
i

Is the difference
statistically significant?

SMURF in Action
Fido moving

Fido resting

SMURF
SMURF

Experiments with real and simulated data


show similar results

Multi-tag Cleaning
Some applications only need aggregates
E.g., count of items on each shelf
Dont need to track each tag!

Use statistical mechanisms for both:


Aggregate computation
Window adaptation

Aggregate Computation

estimators (Horvitz-Thompson)
Count:

P[tag i seen in a window of size w]:

1
N w iSw

i 1 (1 piavg ) w
Use small windows to capture movement
Use the estimator to compensate for lost readings

Window Adaptation
Upper bound window similar to per-tag
Transition based
1 on
variance
within
1
w avg * ln
subwindows p

Count

E0

E1

Nw


Var N w Var N w'

E2

E3

E4

E5

E6

E7

E8

Nw
E9

Time
(epochs)

Multi-tag Scenario

Ongoing Work: Spatial Smoothing


With multiple readers, more complicated
Two rooms, two readers per room
C

B
Reinforcement

D
Arbitration

A?addressed
B? A U B? by
A statistical
B? A? C?
All
are
framework!
U

Beyond RFID
Other
Other sensor
sensor data
data

-estimator for other aggregates


Use SMURF for sensor networks
Other
Other streaming
streaming data
data

Use SMURF in general streaming systems


(e.g., TelegraphCQ)
Remove RANGE clause from CQL

Related Work
Commercial RFID middleware
Smoothing filters: need to set smoothing window

RFID-related work
Rao et al., StreamClean: complementary
Intel Seattle, HiFi, ESP: static window size

BBQ, MauveDB
Heavyweight, model-based
SMURF is non-parametric, sampling-based

Statistical filters (digital signal processing & DB)


Non-linear digital filters inspired SMURF design

Conclusions

Current smoothing filters not adequate


Not declarative!
SMURF: Declarative smoothing filter
Uses statistical sampling to adapt window size

Thanks!
Questions?

Das könnte Ihnen auch gefallen