Here, we use Apache Pig to represent for map reduce Job.
The script will load the old data, new data and do the sorting base on the collect date. And only pick records which just collect today and filter records which processed from few days ago. |
|