A General Result For Selecting Balanced Unequal Probability Samples From a Stream
Author(s)
Date issued
August 1, 2019
In
Information Processing Letters
Vol
152
No
105840
From page
1
To page
6
Reviewed by peer
1
Subjects
balanced sampling Chao method two phases sampling stream
Abstract
Probability sampling methods were developed in the framework of survey statistics. Recently sampling methods are the subject of a renewed interest for the reduction of the size of large data sets. A particular application is sampling from a data stream.
The stream is supposed to be so huge that the data cannot be saved. When a new unit appears, the decision to conserve it or not must be taken directly without examining all the units that already appeared in the stream. In this paper, we examine the existing possible methods for sampling with unequal probabilities from a stream. Next we propose a general result about sampling in several phases from a balanced sample that enables us to propose several new solutions for sampling and multi-phase sampling from a stream. Several new applications of this general result are developed.
The stream is supposed to be so huge that the data cannot be saved. When a new unit appears, the decision to conserve it or not must be taken directly without examining all the units that already appeared in the stream. In this paper, we examine the existing possible methods for sampling with unequal probabilities from a stream. Next we propose a general result about sampling in several phases from a balanced sample that enables us to propose several new solutions for sampling and multi-phase sampling from a stream. Several new applications of this general result are developed.
Publication type
journal article
File(s)
