前言 Introduction

Pyshp库是处理shapefile文件常用的第三方库，可以读取和写入shapefile文件。其中，shapefile文件是Eris公司于上世纪90年代初提出的一种矢量图形格式，用于存储具有地理空间特征的信息。Pyshp库内置了Shapefile类、ShapeRecords类、Reader类和Writer类,主要用来处理shapefile格式文件。

# Pyshp的安装及基本使用

Pyshp库的安装

使用pyshp库，首先需要将其安装到Python环境中。这可以通过pip命令轻松实现，在终端中输入以下命令即可安装：

1	>>> pip install pyshp

Pyshp库的基本使用

使用pyshp库读取shapefile文件格式，需要使用pyshp内置的Shapefile类中的Reader()方法，其文件读取方法如下：

导入pyshp库

1	>>> import shapefile

打开shapefile文件

在读入shapefile文件过程中，注意文件的编码方式，常用的文件编码方式为：gbk、utf-8、GB2312和Unicode等。

1	>>> shp = shapefile.Reader('example.shp', encoding="UTF-8")

读取shapefile文件中的数据

>>> shape_recs = shp.shapeRecords() # 获取所有的要素及其对应的属性
>>> for shape_rec in shape_recs:
>>>    shp_bound = shape_rec.shape # 返回子要素的几何信息
>>>    record = shape_rec.record   # 返回子要素的属性特征
>>> print(shp)
shapefile Reader
    663 shapes (type 'POLYGON')
    663 records (44 fields)

也可以从压缩文件或矢量文件的URL进行读取：

>>> # from a zipped shapefile on website
>>> sf = shapefile.Reader("https://biogeo.ucdavis.edu/data/diva/rrd/NIC_rrd.zip")

>>> # from a shapefile collection of files in a github repository
>>> sf = shapefile.Reader("https://github.com/nvkelso/natural-earth-vector/blob/master/110m_cultural/ne_110m_admin_0_tiny_countries.shp?raw=true")

读取shapefile的元数据

shapefile有多种要素类型，其信息保存在dbf文件中：

1
2
3

>>> sf = shapefile.Reader("shapefiles/blockgroups.dbf")
>>> sf.shapeType
5

编号对应的形状类型如下，需注意的是编号并不是连续的，有几个尚未使用的保留编号。 - 空=0 - 点 = 1 - 折线 = 3 - 多边形 = 5 - 多点 = 8 - 点 = 11 - 折线Z = 13 - 多边形 = 15 - 多点 = 18 - 点TM = 21 - 折线 = 23 - 多边形 = 25 - 多点 = 28 - 多面片 = 31 - shape类型在shapefile模块中也被定义为常量，以便我们可以更直观地比较类型：

>>> sf.shapeType == shapefile.POLYGON
True
# 或者简单的方式
>>> sf.shapeTypeName == 'POLYGON'
True

shapefile的角标

>>> len(sf)
663
>>> sf.bbox
[-122.515048, 37.652916, -122.327622, 37.863433]    # 外接矩形的角标

属性表信息

要查看Reader对象的属性表和字段信息，可以调用“fields”属性：

>>> fields = sf.fields

>>> assert fields == [("DeletionFlag", "C", 1, 0), ["AREA", "N", 18, 5],
... ["BKG_KEY", "C", 12, 0], ["POP1990", "N", 9, 0], ["POP90_SQMI", "N", 10, 1],
... ["HOUSEHOLDS", "N", 9, 0],
... ["MALES", "N", 9, 0], ["FEMALES", "N", 9, 0], ["WHITE", "N", 9, 0],
... ["BLACK", "N", 8, 0], ["AMERI_ES", "N", 7, 0], ["ASIAN_PI", "N", 8, 0],
... ["OTHER", "N", 8, 0], ["HISPANIC", "N", 8, 0], ["AGE_UNDER5", "N", 8, 0],
... ["AGE_5_17", "N", 8, 0], ["AGE_18_29", "N", 8, 0], ["AGE_30_49", "N", 8, 0],
... ["AGE_50_64", "N", 8, 0], ["AGE_65_UP", "N", 8, 0],
... ["NEVERMARRY", "N", 8, 0], ["MARRIED", "N", 9, 0], ["SEPARATED", "N", 7, 0],
... ["WIDOWED", "N", 8, 0], ["DIVORCED", "N", 8, 0], ["HSEHLD_1_M", "N", 8, 0],
... ["HSEHLD_1_F", "N", 8, 0], ["MARHH_CHD", "N", 8, 0],
... ["MARHH_NO_C", "N", 8, 0], ["MHH_CHILD", "N", 7, 0],
... ["FHH_CHILD", "N", 7, 0], ["HSE_UNITS", "N", 9, 0], ["VACANT", "N", 7, 0],
... ["OWNER_OCC", "N", 8, 0], ["RENTER_OCC", "N", 8, 0],
... ["MEDIAN_VAL", "N", 7, 0], ["MEDIANRENT", "N", 4, 0],
... ["UNITS_1DET", "N", 8, 0], ["UNITS_1ATT", "N", 7, 0], ["UNITS2", "N", 7, 0],
... ["UNITS3_9", "N", 8, 0], ["UNITS10_49", "N", 8, 0],
... ["UNITS50_UP", "N", 8, 0], ["MOBILEHOME", "N", 7, 0]]

shapefile的第一个字段往往是索引或者不实用的字段，在多数情况下，都忽略第一个字段，以下为获取所有字段名的列表或属性表：

# 字段列表
>>> fieldnames = [f[0] for f in sf.fields[1:]]
# 属性表
>>> records = sf.records()
>>> len(records)
663

新建shapefile文件

创建 shapefile，首先启动一个新的 Writer 实例，向其传递要保存到的文件路径和名称，并新建一个字段：

1
2
3

>>> w = shapefile.Writer('shapefiles/test/testfile')
>>> w.field('field1', 'C')

类似 python 文件的对象读取 shapefile 一样，也可以向它们写入：

>>> try:
...     from StringIO import StringIO
... except ImportError:
...     from io import BytesIO as StringIO
>>> shp = StringIO()
>>> shx = StringIO()
>>> dbf = StringIO()
>>> w = shapefile.Writer(shp=shp, shx=shx, dbf=dbf)
>>> w.field('field1', 'C')
>>> w.record()
>>> w.null()
>>> w.close()

>>> # To read back the files you could call the "StringIO.getvalue()" method later.
>>> assert shp.getvalue()
>>> assert shx.getvalue()
>>> assert dbf.getvalue()

>>> # In fact, you can read directly from them using the Reader
>>> r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
>>> len(r)
1

添加属性表中的记录

在添加记录之前，必须首先创建字段来定义每个属性中将包含哪些类型的值。有多种不同的字段类型，所有这些类型都支持将 None 值存储为 NULL。文本字段是使用“C”类型创建的，第三个“size”参数可以自定义为文本值的预期长度以节省空间：

>>> w = shapefile.Writer('shapefiles/test/dtype')
>>> w.field('TEXT', 'C')
>>> w.field('SHORT_TEXT', 'C', size=5)
>>> w.field('LONG_TEXT', 'C', size=250)
>>> w.null()
>>> w.record('Hello', 'World', 'World'*50)
>>> w.close()

>>> r = shapefile.Reader('shapefiles/test/dtype')
>>> assert r.record(0) == ['Hello', 'World', 'World'*50]

添加多边形要素与 LineString 类似，多边形形状由多个多边形组成，并且必须以多边形列表的形式给出。主要区别在于多边形必须至少有 4 个点，并且最后一个点必须与第一个点相同。

>>> w = shapefile.Writer('shapefiles/test/polygon')
>>> w.field('name', 'C')

>>> w.poly([
...	        [[113,24], [112,32], [117,36], [122,37], [118,20]], # poly 1
...	        [[116,29],[116,26],[119,29],[119,32]], # hole 1
...         [[15,2], [17,6], [22,7]]  # poly 2
...        ])
>>> w.record('polygon1')

>>> w.close()

写入投影信息

.prj 文件或投影文件是一个简单的文本文件，它存储 shapefile 的地图投影和坐标参考系统，以帮助在地图上正确定位几何图形。以下用WGS84的投影坐标系作为案例，更多的投影信息可以上网站上查找。

>>> with open("{}.prj".format(filename), "w") as prj:
>>>     wkt = 'GEOGCS["WGS 84",'
>>>     wkt += 'DATUM["WGS_1984",'
>>>     wkt += 'SPHEROID["WGS 84",6378137,298.257223563]]'
>>>     wkt += ',PRIMEM["Greenwich",0],'
>>>     wkt += 'UNIT["degree",0.0174532925199433]]'
>>>     prj.write(wkt)

BNU-LQP

如何基于pyshp读取和写入shp文件

前言 Introduction

Pyshp库的安装

Pyshp库的基本使用