查看单个帖子
nh_wzg
 
nh_wzg 的头像
核心会员
 
资 料:
注册日期: Jul 2000
帖子: 3,740 声望值: 3
精华: 5,解答: 10
#5 旧 2021-03-13, 10:27:43 默认
nh_wzg 当前离线  

引用:
作者: aspirer 查看帖子
手机无法细看。
但是自从py3.6起
可以直接format的类似这种。最前面的f可大写也可小写。
str1=f"f4{i}=f4{i} …"
先谢指点,

搜索后,可以参考阅读这个,f-strings

https://realpython.com/python-f-strings/

=== 类似在Word中提取表格并通常化为Dataframe的例子===
[python]读取word文档中的数据,整理成excel表
https://zhuanlan.zhihu.com/p/149413211

python读取docx中的表格形成三元组
https://blog.csdn.net/weixin_4439852...ails/114107029

python读取word 中指定位置的表格及表格数据
https://www.jb51.net/article/172631.htm

用python解析word文件(二):table
https://www.cnblogs.com/anpengapple/p/8372987.html

使用Python读取word文件里的表格信息
http://www.siyuanblog.com/?p=2109

python提取docx文档的信息(文本+表格)
https://blog.csdn.net/weixin_4208138...ails/108235948

===
python_docx 官方文档
https://python-docx.readthedocs.io/e...api/table.html

代码:
pip install python-docx
V0.8.10 在python 3.8.5环境下面import出错

Having problems installing python-docx 》实际是lxml安装版本的兼容性出错导致4.6.1
https://stackoverflow.com/questions/...ng-python-docx

Installing lxml, libxml2, libxslt for Python 3.5 on Windows 10
https://stackoverflow.com/questions/...-on-windows-10

Unofficial Windows Binaries for Python Extension Packages 》找到【lxml】分部,对应本地的python 版本及系统版本进行下载whl文件。
https://www.lfd.uci.edu/~gohlke/pyth...#libxml-python
比如:lxml-4.6.2-cp38-cp38-win32.whl
对应python 3.8.*,32bit Win OS,的lxml模块预编译文件

https://github.com/lxml/lxml

PIP环境安装whl文件的操作:
https://pip.pypa.io/en/latest/user_g...ng-from-wheels
代码:
[py -m ]pip install SomePackage-1.0-py2.py3-none-any.whl
pip install lxml-4.6.2-cp38-cp38-win32.whl
进入python 验证:

import docx
import lxml

不再有出错信息
====

how to create a dataframe from a table in a word document (.docx) file using pandas
https://stackoverflow.com/questions/...file-using-pan

python -docx to extract table from word docx
https://stackoverflow.com/questions/...from-word-docx

python-docx: Parse a table to Panda Dataframe
https://stackoverflow.com/questions/...anda-dataframe

下面代码是官方主页上面提供的样例代码,准备好一个monty-truth2.png文件在python代码目录下,
运行就可生成样例文件:demo.docx

代码:
from docx import Document
from docx.shared import Inches

document = Document()

document.add_heading('Document Title', 0)

p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')

document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
document.add_paragraph(
    'first item in ordered list', style='List Number'
)

document.add_picture('monty-truth2.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
)

table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break()

document.save('demo.docx')
下面的代码是对生成的demo.docx文档,提取表格数据的样例代码:

代码:
import pandas as pd
from docx import Document
# from docx.shared import Inches
a4="demo.docx"
document = Document(a4)

## tables=document.tables

## 这个是比较成形的读取docx文档内表格的语句
## 需要安装好python-docx 0.8.10 / lxml 4.6.1/2 模块
## 需要知晓导出的document.tables列表中的第几号表格
## 生成的是pandas dataframe格式数据结果

## for table in document.tables:
## len(document.tables)  #query tables numbers
table=document.tables[0] #提取0号表格的内容
# doctbls=[]
tbllist=[]
rowlist=[]
for i, row in enumerate(table.rows):
    for j, cell in enumerate(row.cells):
        rowlist.append(cell.text)
    tbllist.append(rowlist)
    rowlist=[]
# doctbls=doctbls+tbllist
# finaltables=pd.DataFrame(doctbls)
f5=pd.DataFrame(tbllist)

此帖于 2021-05-25 15:01:21 被 nh_wzg 编辑. .


平和精确简约应成为精品成员的三大基本要求!!!
nh_wzg
回复时引用此帖