0%

如何基于pyshp读取和写入shp文件

前言 Introduction

Pyshp库是处理shapefile文件常用的第三方库,可以读取和写入shapefile文件。其中,shapefile文件是Eris公司于上世纪90年代初提出的一种矢量图形格式,用于存储具有地理空间特征的信息。Pyshp库内置了Shapefile类、ShapeRecords类、Reader类和Writer类,主要用来处理shapefile格式文件。

# Pyshp的安装及基本使用

Pyshp库的安装

使用pyshp库,首先需要将其安装到Python环境中。这可以通过pip命令轻松实现,在终端中输入以下命令即可安装:

1
>>> pip install pyshp

Pyshp库的基本使用

使用pyshp库读取shapefile文件格式,需要使用pyshp内置的Shapefile类中的Reader()方法,其文件读取方法如下:

  1. 导入pyshp库
1
>>> import shapefile
  1. 打开shapefile文件

在读入shapefile文件过程中,注意文件的编码方式,常用的文件编码方式为:gbk、utf-8、GB2312和Unicode等。

1
>>> shp = shapefile.Reader('example.shp', encoding="UTF-8")

  1. 读取shapefile文件中的数据
1
2
3
4
5
6
7
8
>>> shape_recs = shp.shapeRecords() # 获取所有的要素及其对应的属性
>>> for shape_rec in shape_recs:
>>> shp_bound = shape_rec.shape # 返回子要素的几何信息
>>> record = shape_rec.record # 返回子要素的属性特征
>>> print(shp)
shapefile Reader
663 shapes (type 'POLYGON')
663 records (44 fields)

也可以从压缩文件或矢量文件的URL进行读取:

1
2
3
4
5
>>> # from a zipped shapefile on website
>>> sf = shapefile.Reader("https://biogeo.ucdavis.edu/data/diva/rrd/NIC_rrd.zip")

>>> # from a shapefile collection of files in a github repository
>>> sf = shapefile.Reader("https://github.com/nvkelso/natural-earth-vector/blob/master/110m_cultural/ne_110m_admin_0_tiny_countries.shp?raw=true")
  1. 读取shapefile的元数据

shapefile有多种要素类型,其信息保存在dbf文件中:

1
2
3
>>> sf = shapefile.Reader("shapefiles/blockgroups.dbf")
>>> sf.shapeType
5

编号对应的形状类型如下,需注意的是编号并不是连续的,有几个尚未使用的保留编号。 - 空=0 - 点 = 1 - 折线 = 3 - 多边形 = 5 - 多点 = 8 - 点 = 11 - 折线Z = 13 - 多边形 = 15 - 多点 = 18 - 点TM = 21 - 折线 = 23 - 多边形 = 25 - 多点 = 28 - 多面片 = 31 - shape类型在shapefile模块中也被定义为常量,以便我们可以更直观地比较类型:

1
2
3
4
5
>>> sf.shapeType == shapefile.POLYGON
True
# 或者简单的方式
>>> sf.shapeTypeName == 'POLYGON'
True
  1. shapefile的角标
1
2
3
4
>>> len(sf)
663
>>> sf.bbox
[-122.515048, 37.652916, -122.327622, 37.863433] # 外接矩形的角标
  1. 属性表信息

要查看Reader对象的属性表和字段信息,可以调用“fields”属性:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
>>> fields = sf.fields

>>> assert fields == [("DeletionFlag", "C", 1, 0), ["AREA", "N", 18, 5],
... ["BKG_KEY", "C", 12, 0], ["POP1990", "N", 9, 0], ["POP90_SQMI", "N", 10, 1],
... ["HOUSEHOLDS", "N", 9, 0],
... ["MALES", "N", 9, 0], ["FEMALES", "N", 9, 0], ["WHITE", "N", 9, 0],
... ["BLACK", "N", 8, 0], ["AMERI_ES", "N", 7, 0], ["ASIAN_PI", "N", 8, 0],
... ["OTHER", "N", 8, 0], ["HISPANIC", "N", 8, 0], ["AGE_UNDER5", "N", 8, 0],
... ["AGE_5_17", "N", 8, 0], ["AGE_18_29", "N", 8, 0], ["AGE_30_49", "N", 8, 0],
... ["AGE_50_64", "N", 8, 0], ["AGE_65_UP", "N", 8, 0],
... ["NEVERMARRY", "N", 8, 0], ["MARRIED", "N", 9, 0], ["SEPARATED", "N", 7, 0],
... ["WIDOWED", "N", 8, 0], ["DIVORCED", "N", 8, 0], ["HSEHLD_1_M", "N", 8, 0],
... ["HSEHLD_1_F", "N", 8, 0], ["MARHH_CHD", "N", 8, 0],
... ["MARHH_NO_C", "N", 8, 0], ["MHH_CHILD", "N", 7, 0],
... ["FHH_CHILD", "N", 7, 0], ["HSE_UNITS", "N", 9, 0], ["VACANT", "N", 7, 0],
... ["OWNER_OCC", "N", 8, 0], ["RENTER_OCC", "N", 8, 0],
... ["MEDIAN_VAL", "N", 7, 0], ["MEDIANRENT", "N", 4, 0],
... ["UNITS_1DET", "N", 8, 0], ["UNITS_1ATT", "N", 7, 0], ["UNITS2", "N", 7, 0],
... ["UNITS3_9", "N", 8, 0], ["UNITS10_49", "N", 8, 0],
... ["UNITS50_UP", "N", 8, 0], ["MOBILEHOME", "N", 7, 0]]

shapefile的第一个字段往往是索引或者不实用的字段,在多数情况下,都忽略第一个字段,以下为获取所有字段名的列表或属性表:

1
2
3
4
5
6
# 字段列表
>>> fieldnames = [f[0] for f in sf.fields[1:]]
# 属性表
>>> records = sf.records()
>>> len(records)
663

  1. 新建shapefile文件

创建 shapefile,首先启动一个新的 Writer 实例,向其传递要保存到的文件路径和名称,并新建一个字段:

1
2
3
>>> w = shapefile.Writer('shapefiles/test/testfile')
>>> w.field('field1', 'C')

类似 python 文件的对象读取 shapefile 一样,也可以向它们写入:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
>>> try:
... from StringIO import StringIO
... except ImportError:
... from io import BytesIO as StringIO
>>> shp = StringIO()
>>> shx = StringIO()
>>> dbf = StringIO()
>>> w = shapefile.Writer(shp=shp, shx=shx, dbf=dbf)
>>> w.field('field1', 'C')
>>> w.record()
>>> w.null()
>>> w.close()

>>> # To read back the files you could call the "StringIO.getvalue()" method later.
>>> assert shp.getvalue()
>>> assert shx.getvalue()
>>> assert dbf.getvalue()

>>> # In fact, you can read directly from them using the Reader
>>> r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
>>> len(r)
1

  1. 添加属性表中的记录

在添加记录之前,必须首先创建字段来定义每个属性中将包含哪些类型的值。有多种不同的字段类型,所有这些类型都支持将 None 值存储为 NULL。文本字段是使用“C”类型创建的,第三个“size”参数可以自定义为文本值的预期长度以节省空间:

1
2
3
4
5
6
7
8
9
10
>>> w = shapefile.Writer('shapefiles/test/dtype')
>>> w.field('TEXT', 'C')
>>> w.field('SHORT_TEXT', 'C', size=5)
>>> w.field('LONG_TEXT', 'C', size=250)
>>> w.null()
>>> w.record('Hello', 'World', 'World'*50)
>>> w.close()

>>> r = shapefile.Reader('shapefiles/test/dtype')
>>> assert r.record(0) == ['Hello', 'World', 'World'*50]
  1. 添加多边形要素 与 LineString 类似,多边形形状由多个多边形组成,并且必须以多边形列表的形式给出。主要区别在于多边形必须至少有 4 个点,并且最后一个点必须与第一个点相同。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    >>> w = shapefile.Writer('shapefiles/test/polygon')
    >>> w.field('name', 'C')

    >>> w.poly([
    ... [[113,24], [112,32], [117,36], [122,37], [118,20]], # poly 1
    ... [[116,29],[116,26],[119,29],[119,32]], # hole 1
    ... [[15,2], [17,6], [22,7]] # poly 2
    ... ])
    >>> w.record('polygon1')

    >>> w.close()

  2. 写入投影信息

.prj 文件或投影文件是一个简单的文本文件,它存储 shapefile 的地图投影和坐标参考系统,以帮助在地图上正确定位几何图形。以下用WGS84的投影坐标系作为案例,更多的投影信息可以上网站上查找。

1
2
3
4
5
6
7
>>> with open("{}.prj".format(filename), "w") as prj:
>>> wkt = 'GEOGCS["WGS 84",'
>>> wkt += 'DATUM["WGS_1984",'
>>> wkt += 'SPHEROID["WGS 84",6378137,298.257223563]]'
>>> wkt += ',PRIMEM["Greenwich",0],'
>>> wkt += 'UNIT["degree",0.0174532925199433]]'
>>> prj.write(wkt)