首页 > 解决方案 > Geomesa BBOX Query is not returning all results

问题描述

I was playing with Geomesa (using HBase) BBOX query on OSM node data. I found for a specific region geomesa is not returning all the node in the bounding box.

For example, I fired 3 queries:

  1. BBOX(-122.0,47.4,-122.01,47.5) - Output has 5,477 Unique Features
  2. BBOX(-122.0,47.5,-122.01,47.6) - Output has 9,879 Unique Features
  3. BBOX(-122.0,47.4,-122.01,47.6) - Output has 13,374 Unique Features

Looking into these bounding box I think Feature of Query 1 + Query 2 should be equal to Query 3. But actually, they are not same. The sad part is the Summation of Quer1 and Query2 has some elements which are not present in the Query 3 data itself.

Below is the image after plotting it on Kepler. Can anyone help to understand what is the issue and how to find the root cause of it?

Missing points in Query 3.

I am seeing below Exception:

19/09/27 14:57:34 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=38583 ms ago, cancelled=false, msg=java.io.FileNotFoundException: File not present on S3
    at com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.read(S3FSInputStream.java:133)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:738)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1493)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1770)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1596)
    at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:454)
    at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
    at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:651)
    at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:601)
    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:302)
    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:201)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:391)
    at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:224)
    at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2208)
    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6112)
    at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6086)
    at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2841)
    at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2821)
    at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2803)
    at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2797)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2697)
    at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3012)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)

标签: hbasegeomesa

解决方案


这看起来像一个 S3 一致性问题。尝试运行:

emrfs sync -m <your DynamoDB catalog table> s3://<your bucket>/<your hbase root dir>

然后重新运行您的查询。S3 和用于管理 HBase 的 S3 一致性模型的 DynamoDB 表不同步是很常见的。将此同步命令作为 cron 作业运行有助于避免此问题或在发生此问题时自动解决。


推荐阅读