hbase - Geomesa BBOX Query is not returning all results
问题描述
I was playing with Geomesa (using HBase) BBOX query on OSM node data. I found for a specific region geomesa is not returning all the node in the bounding box.
For example, I fired 3 queries:
- BBOX(-122.0,47.4,-122.01,47.5) - Output has 5,477 Unique Features
- BBOX(-122.0,47.5,-122.01,47.6) - Output has 9,879 Unique Features
- BBOX(-122.0,47.4,-122.01,47.6) - Output has 13,374 Unique Features
Looking into these bounding box I think Feature of Query 1 + Query 2 should be equal to Query 3. But actually, they are not same. The sad part is the Summation of Quer1 and Query2 has some elements which are not present in the Query 3 data itself.
Below is the image after plotting it on Kepler. Can anyone help to understand what is the issue and how to find the root cause of it?
I am seeing below Exception:
19/09/27 14:57:34 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=38583 ms ago, cancelled=false, msg=java.io.FileNotFoundException: File not present on S3
at com.amazon.ws.emr.hadoop.fs.s3.S3FSInputStream.read(S3FSInputStream.java:133)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:738)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1493)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1770)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1596)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:454)
at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:269)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:651)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:601)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:302)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:201)
at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:391)
at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:224)
at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2208)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6112)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6086)
at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2841)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2821)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2803)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2797)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2697)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3012)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
解决方案
这看起来像一个 S3 一致性问题。尝试运行:
emrfs sync -m <your DynamoDB catalog table> s3://<your bucket>/<your hbase root dir>
然后重新运行您的查询。S3 和用于管理 HBase 的 S3 一致性模型的 DynamoDB 表不同步是很常见的。将此同步命令作为 cron 作业运行有助于避免此问题或在发生此问题时自动解决。
推荐阅读
- reactjs - Strip API 错误 - 印度法规导致 React App 支付错误
- javascript - 使用 VueJS 创建字典列表
- python-requests - 通过邮递员发帖但使用 python 请求我收到错误 413
- javascript - 更改绘图热图中的轴限制
- php - 使用 Laravel Valet 重新路由默认索引文件夹
- python - How do I reference a list from another function?
- jwt - JWT 签名的不对称签名算法
- python - 从 pytorch 张量中按索引删除一行
- apache-nifi - NIFI 配置使用 Kinesis Stream
- c++ - 带有 xtensor fftw 的多维数组的 fft 如何工作?