地理空间信息GEOSPATIAL INFORMATIONJul.,2018Vol.16,No.7基于Hadoop的地理国情普查数据存储研究
齐东兰1,魏永强1,向 娟1
(1.国家测绘地理信息局重庆测绘院,重庆 401120)
摘 要:根据地理国情普查数据类型多样、数据结构复杂、数据量巨大等特点,研究了与Hadoop相关的关键技术;采用HBase和HDFS存储数据,实现了地理国情普查数据的分布式存储,提高了数据的访问效率、性能和稳定性。关键词:地理国情普查数据;Hadoop;分布式存储
中图分类号:P208 文献标志码:B
文章编号:1672-4623(2018)07-0031-03
目前,我国正处于城镇化加速发展时期,北京、上海、深圳等地区“城市病”问题突显,城市管理、交通拥堵、社会治理和公共服务等矛盾日益突出。利用大数据、互联网、物联网和云计算等技术建设智慧城市,是提高城市可持续发展能力的重要手段和途径。数据是智慧城市发展的基础
[1]
2基于Hadoop的地理国情普查数据存储设计
2.1 地理国情普查数据存储的逻辑结构设计
地理国情数据可分为矢量数据、栅格数据、表格数据和文档数据等,矢量数据包括地表覆盖数据、地理国情要素数据、专题数据;栅格数据包括地形地貌数据、遥感影像数据及其元数据;表格数据为地理国情统计分析成果数据;文档数据为遥感影像解译样本数据等
[6]
,地理国情普查数据助
力智慧城市建设,如何有效管理地理国情数据以更好地服务智慧城市的建设是必须解决的问题。Hadoop以其高可靠性、高扩展性、高效性和高容错性,特别是在海量的非结构化或半结构化数据上的分析处理优势为地理国情普查数据的存储提供了一种思路。
[2]
。
地理国情普查数据在Hadoop环境下的存储主要包括3个部分:①原始数据存放区,主要存放地理国情普查数据,包括地形地貌数据、影像数据、遥感影像解译样本数据、地表覆盖数据、地理国情要素数据等原始数据;②数据存储管理区,主要提供需进行数据库管理的数据存储区域,对处理后的数据进行及时存储并通过交换机快速将数据从该区域迁移到成果存储管理区;③成果存储管理区,主要存储产品制作、数据分析或整合处理后的成果发布数据,包括专题数据、地理国情统计分析成果数据等。2.2 存储规则设计
根据地理国情普查数据和Hadoop的特点,在HBase中采用基于JSON格式的GeoJSON来实现对矢量数据空间几何实体的描述;采用HDFS和HBase共同存储栅格数据、表格数据和文档数据。
矢量数据包括空间坐标信息、属性信息和拓扑信息
[7]
,
1关键技术
Hadoop是Apache开源组织开发的一个开源云
计算框架,由HDFS、MapReduce、HBase、Hive、ZooKeeper、Chukwa等项目组成逊、Yahoo等大型网站上应用。
HDFS是Hadoop的分布式文件系统,为分布式计算存储提供了底层支持
[4]
[3]
,并在IBM、亚马
,适合部署在廉价的机器上,
具备高容错性,且能提供高吞吐量的数据访问,适用于多种大规模数据集的应用程序。HDFS将各文件以块序列的形式存储在不同的数据节点上,以达到存储海量数据的目的;同时,为了保证高容错性,它将文件块复制成多份存储在不同的数据节点上。
HBase是一个建立在HDFS之上的高可靠性、高性能、面向列、可伸缩的分布式存储系统,也称分布式数据库
[5]
。为了提高数据访问性能并对数据故障进行有效
隔离,将矢量数据分层分块存储,图层数据表相互之间没有联系。根据矢量数据的特点设计基于HBase的存储图层表结构,空间坐标信息存储在空间数据列族,属性信息存储在属性列族,拓扑信息存储在拓扑列族,如表1所示。
,具备海量数据高效存储、实时读写以及
查询功能。其实质是一个稀疏的、长期存储的、度的、排序的映射表。表的索引是行关键字、列关键字和时间戳,所有数据都以字符串类型存储。
收稿日期:2017-11-30。
项目来源:国家基础测绘科技与标准计划资助项目(2017KJ0303、2017KJ0304)。
·32·
表1 矢量数据表结构
行
时间戳键
T1T2T31
T4T5T6…
属性列族信息NameWidthFEATIDElemSTimeElemETimeAREACODE
513330值嘎拉河5.615220150630
坐标列族信息
值
地理空间信息
对应的元数据表。
拓扑列族信息
值
第16卷第7期
地理国情普查表格数据和文档数据均存储在集群环境下的HDFS中,并在HBase中建立表格数据和文档数据的路径信息表,表结构类似于栅格数据。2.3 物理结构设计
地理国情要素数据、专题数据、地表覆盖数据、地理国情统计分析成果数据、元数据等矢量数据按照要素层以HBase分布式数据库的结构进行组织;遥感影像数据、地形地貌数据、遥感影像解译样本数据等按照数据块以HDFS文件系统方式进行物理存储和组织。所有数据按照一定的规则分布式存储在云计算环境下,如图1所示。
为了快速读取地理国情普查矢量图层中的属性和拓扑信息,建立了图层数据字典表,以图层名称为行健,表中包括矢量空间数据属性和拓扑信息的名称及其数据类型,结构如表2所示。
表2 矢量数据图层的数据字典表
行键
时间戳T1T2
图层名称
T3T4T5…
属性列族信息NameWidthFEATIDElemSTimeElemETime
值textshorttexttexttext
拓扑列族信息
值
3环境搭建与实验分析
为验证本文所提出的基于Hadoop的地理国情普查
数据存储的可行性,搭建了Hadoop集群环境,并以某测区的地理国情普查数据为实验数据,进行了数据的入库和查询测试。3.1 环境搭建
在1台实体服务器(以下简称宿主机)中构建8台虚拟机作为Hadoop集群,以其中1台为主节点(负责整个集群下数据管理和任务分解),另外7台为从节点(负责分布式数据存储和任务执行),节点间通过虚拟网络交换机进行网络联通,每台虚拟机上安装并配置Hadoop环境,宿主机硬件信息如表3所示,每台虚拟机内存为2 GB,硬盘为1 TB。操作系统为Windows7,软件环境为VMware Workstation 10.0.4。
HBase地理国情栅格数据存储在集群环境下的HDFS中,栅格数据的相关元数据则保存在分布式数据库HBase中。为了能快速准确地检索到栅格数据,使用HBase对栅格数据的绝对路径信息进行管理。以栅格数据名称为行键,栅格数据的绝对路径、描述信息为列族,对地理国情普查的地形地貌数据、影像数据分别建立
地理国情普查数据矢量数据主节点地理国情要素数据专题数据地表覆盖数据从节点1从节点2元数据遥感影像数据地形地貌数据从节点3HDFS主节点从节点1数据块1数据块1数据块3数据块2栅格数据从节点2数据块1数据块1数据块3表格数据地理国情统计分析成果数据文档数据从节点3数据块3数据块2数据块2遥感影像解译样本数据数据块2图1 地理国情普查数据物理存储结构
第16卷第7期齐东兰等:基于Hadoop的地理国情普查数据存储研究
·33·
表3 宿主机硬件信息表
项目配置
CPU4颗IntelXeon E7-4820 位直连架构八核处理器,
主频2.0 GHz
内存 G
硬盘10块1TB,10 K转SAS 3.5寸热拔插硬盘
RAID512 MB缓存SAS RAID卡
网卡
5个 RJ45 网络接口,4个千兆以太网卡,1个千兆
以太网卡管理口
3.2 实验分析
1)入库效率测试。以测区遥感影像数据为测试数据(选用77 GB、200 GB 、500 GB和1 TB的数据量),分别测试基于单节点和Hadoop集群环境的入库效率。在单节点环境下入库时间分别为12 min、72 min、228 min和18 h;在Hadoop集群环境下入库时间分别为7 min、48 min、110 min和8 h,其效率提高了2倍,如图2所示。
20181614h/12时10耗8单节点6Hadoop4集群
20
77GB200GB500GB1TB数据量
图2 数据入库耗时对比
2)查询效率测试。以公路、乡村道路和水域数据32 126条、 4条和168 727条,数据量分别为50 MB、140 MB和300 MB)为测试数据,分别测试基于单节点和Hadoop集群环境的数据查询效率。在单节点环境下查询耗时分别为30 s、105 s和2 s;在Hadoop集群环境下查询耗时分别为21 s、76 s和149 s,效率提高了1.5倍,如图3所示。
测试结果显示,随着数据量的增加,两种方案的数据入库和查询时间均有所增加,但基于单节点的数据存储耗时始终较长;在数据量较少的情况下,两种方案相差不大,当数据量超过一定数量时,云计算环
境下的数据存储效率具有明显优势。
4.3.5ni3m/2.5时2耗1.5单节点1Hadoop0.5集群
0
50140300数据量/MB
图3 数据查询耗时对比
4结 语
本文采用HBase和HDFS存储地理国情普查数据,能有效解决地理国情普查数据海量存储问题,并能提高数据库的访问效率、性能和稳定性。今后工作中,需要进一步研究云计算环境下大数据的存储、管理和处理,以实现海量数据的高效计算,可视化显示,重要地理信息的统一监管、分级发布、统一服务,从而使地理国情普查成果更好地为、企业和公众服务。
参考文献
[1] 王静远,李超,熊璋,等.以数据为中心的智慧城市研究综述[J].
计算机研究与发展, 2014, 51(2): 239-259
[2] 李晓东,叶思水.基于 Hadoop的高可靠分布式计算平台的构
建[J].北京电子科技学院学报, 2014, 22(2): 25-29
[3] 孙福权,张达伟,程勖,等.基于 Hadoop企业私有云存储平
台的构建[J].辽宁工程技术大学学报(自然科学版), 2011, 30(6): 913-916
[4] 尹芳,冯敏,诸云强,等.基于开源 Hadoop的矢量空间数据分
布式处理研究[J].计算机工程与应用, 2013, 49(16): 25-29[5] 陈东辉,曾乐,梁中军,等.基于HBase的气象地面分钟数据
分布式存储系统[J].计算机应用, 2014, 34(9): 2 617-2 621[6] 雷瑛,鲍立尚.甘肃省第一次全国地理国情普查数据库设计[J].
测绘与空间地理信息, 2016, 32(2): 161-163
[7] 陈洁,褚龙现,夏栋梁.一种支持并行处理的矢量数据存储与
查询方法[J].电子设计工程, 2017, 25(10): 31-33第一作者简介:齐东兰,工程师,主要从事地理信息系统研发与应用方面的工作。
(记录条数分别为·IV·GEOSPATIAL INFORMATIONIndex(Vol.16,No.7)
Discussing on the Knowledge Service of Geographical Conditions Monitoring by LI JiansongAbstract In view of the current research and application activities of geographical conditions monitoring, we discussed the knowledge service technology system of geographical conditions monitoring in detail. In this paper, we expounded the concept and theoretical basis of the geographical conditions monitoring service chain, and established the knowledge service scale system and technical system of the geographical conditions monitoring. And then, we systematically summed up the methods of data processing and analysis and evaluation of geographical conditions monitoring data, analyzed the existing problems in the current work, and put forward some ideas to solve these problems.
Key words geographical conditions, service chain, monitoring system, analysis and evaluation (Page:1)
Research on 3D Bridge Monitoring and Management Based on BDS and
BIM by WANG Li
Abstract In this paper, we proposed a method of bridge monitoring and
management based on the techniques of BDS and BIM, by studying high accuracy
3D bridge modeling and integration of Beidou sensor model into the bridge BIM.
And then, we used this method to develop a bridge real-time and dynamic 3D
monitoring and management system, which was carried on certain large bridge in
Guizhou Province with good effects.
Key words BDS, BIM, bridge management, bridge monitoring (Page:5)
Research on Self-adaption Generation Algorithm of Building Tuple and R
Matrix in Real Estate Mapping by HUANG Xiao
Abstract Starting from the features of service relation between building tuples
in the sense of sharing of building area in real estate mapping, through the
sufficiently necessary and minimized definitions of common building meta-attributes, we used the point set topology related theorems and operation rules of
set theory to adaptively generate all building tuples of building blocks and their
binary relation matrixes, and designed the corresponding algorithms to improve
the automation level of sharing apportion of common building area, and to
perfect the application of aggregate computing rules in real estate mapping theory.
This study can provide a new idea for real estate mapping and intelligent service
system research and development.
Key words real estate mapping, building tuple, relation set, relation matrix, point
set topology (Page:8)
Destriping Method Combined with Frequency Filter and Histogram
Matching by LUO Min
Abstract Aim at the stripe noise in high-resolution remote sensing images, we
proposed a destriping method combined with frequency filter and histogram
matching. Firstly, the method used the 2D multi-scale wavelet decomposition to
extract the stripe noise, and then used the butterworth filter to filter out the stripe
noise component in the frequency domain. Finally, this method did column-based
histogram matching to the original image by using the filtered result image as
the reference image.The experimental results show that the proposed method can
obtain better results, and can effectively remove the stripe noise.
Key words high-resolution remote sensing image, stripe noise, frequency filter,
histogram matching (Page:11)
Research on Mixed Noise Filtering Algorithm of Digital Image
Abstract For the digital image which was contaminated with mixed noise,
by SHAN Liangliang
based on the mean filter and the median filter, we presented a kind of filtering
algorithm for mixed noise. Firstly, the algorithm set the threshold value by the
characteristics of the mixed noise. Secondly, according to the different thresholds,
the algorithm separated the Gaussian noise standard the salt and pepper noise in
the mixed noise. Finally, different filters were used for different noise. According
to the Matlab simulation test and evaluation standard analysis, the results show
that the filtering algorithm is faster for image processing and easy to accomplish.
The algorithm not only has good filtering effect on the digital images, but also can
keep the detail messages of the images.
Key words digital image, filter, mixed noise, evaluation standard (Page:13)
Study on Existing Problems and Countermeasures in the Construction and
Application of MapWorld Nodes by WU Xianliang
Abstract In this paper, we analyzed the existing problems in construction and
application of MapWorld nodes at all levels from user's perspective at first. And
then, based on Smart City construction and new basic surveying and mapping,
we proposed some corresponding countermeasures on the platform upgrade,
data update, maintenance and operation. This study can provide some references for the management and technical personnel who participate in construction, maintenance and operation the MapWorld nodes at all levels.Key words Digital City, Smart City, geographical information basic engineering, MapWorld, geospatial framework (Page:16)Adaptability Study on the Building Extraction Methods Based on Airborne
LiDAR Point Cloud Data by LI KaiweiAbstract To improve the accuracy of building extraction based on airborne LiDAR point cloud data, we analyzed several existing building extraction methods in this paper at first. And then, we selected Fey Ingen City building LiDAR point cloud data to experiment the extraction of buildings in consideration of the terrain, trees, buildings and buildings density. Finally, we made quantitative precision assessment of the extraction results. The results show that the building extraction accuracy based on airborne LiDAR point cloud data is 93.1, while the image processing method based on mathematical morphology and Delaunay triangulation methods are much more restricted by the building shape, terrain and others and their accuracy are 87.6 and 81.3 respectively. The study shows that the building extraction accuracy based on airborne LiDAR point cloud data is much higher and less restricted.
Key words LiDAR point cloud, building extraction, DSM, morphological, Delaunay (Page:19)
Research on Satellite Orbit Fitting Method Based on ALOS PALSAR DataAbstract Taking the ALOS PALSAR data for example, we introduced two by ZHENG Jiemethods of satellite orbit description, which were the polynomial orbit fitting method and the orbit number description method. And then, we used these two methods to determine the satellite orbital state vector at any time. At the same time, combining with Matlab, we compared the accuracy of these two methods. The experimental results show that the fitting accuracy of the polynomial orbit fitting method is better than that of the orbit number description method. The purpose of the satellite orbit fitting is to improve the accuracy of radar location, which is of great significance for the correction of radar data and its related research.Key words ALOS PALSAR data, the polynomial orbit fitting method, the orbit number description method, satellite orbit state vector (Page:23)Discussion on the Integrated Cloud Platform Construction for Small Cities and Towns Based on Cloud Computing by CAI WeiAbstract In this paper, we put forward the construction thought of the integrated cloud platform for small cities and towns at first. And then, we introduced the characteristics and construction framework of the integrated cloud platform. Finally, we made a prospect for the implementation of the platform design.Key words small cities and towns, cloud computing, integration (Page:28)Research on Storage of Geographical Conditions Census Data Based on Hadoop by QI DonglanAbstract According to the various types, complex structure, large of geographical conditions census data, we studied the key technologies related to Hadoop, and used HBase and HDFS to store data, which could realize the distributed storage of geographical conditions census data, and improve the visit efficiency, performance and stability of data.Key words geographical conditions census data, Hadoop, distributed storage (Page:31)Construction and Application of Panoramic Map Based on 3D Model Abstract Planning publicity is an important part of public participation in urban by HE Xingfuplanning. As people pay more attention to city planning, planning publicity based on diagram and instruction text cannot meet the requirements for public who need more information about planning project. We used 3D model to construct panoramic map, which was used for planning review in large cities, and displayed the map through Website, App, WeChat official account. This study can greatly broaden planning scheme display channels and means, and provide abundant information for public participation.Key words planning publicity, public participation, 3D model, panoramic map, mobile terminal (Page:34)Application of Chinese Remote Sensing Images in Coastline Change Monitoring by WANG LeAbstract Based on the satellite images of GF-1, ZY-3 and ZY1-02C, through image preprocessing, remote sensing interpretation, coastline drawing position, tidal correction, field investigation and historical coastline comparison, we obtained status data and dynamic changes of the mainland coastline of Zhejiang Province in this paper. The results show that using Chinese remote sensing images to monitor mainland coastline changes is feasible, which can improve the frequency and accuracy of monitoring. It is suitable for application in normalized monitoring of mainland coastline at the provincial level.Key words coastline, Chinese remote sensing image, change monitoring, operational monitoring, Zhejiang Province (Page:38)Evaluation of Ecosystem Service Value Based on Geographical Conditions Census Achievement by LEI YuzhouAbstract Based on the land coverage achievement of geographical conditions census and the corrected model of ecosystem service value, taking Dujiangyan
City for example, we used the geographical grid and spatial self-correlation
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- sarr.cn 版权所有 赣ICP备2024042794号-1
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务