AWSTextract（OCR）未检测到某些单元格 _编程开发

AWSTextract（OCR）未检测到某些单元格

创始人

2024-09-27 15:31:24

0次

你可以在代码中使用Amazon Textract Python SDK的detect_tables函数来检测表格。在表格检测后，您可以使用Amazon Textract Python SDK的get_tabular_data函数来提取表格中的单元格数据。

下面是一个使用Python和Amazon Textract Python SDK的示例代码：

import boto3

# set up AWS credentials and create Textract client
session = boto3.Session(profile_name='default', region_name='us-east-1')
textract = session.client('textract')

# process the image using Textract
response = textract.analyze_document(
    Document={
        'S3Object': {
            'Bucket': 'my-bucket',
            'Name': 'my-image.jpg'
        }
    },
    FeatureTypes=['TABLES']
)

# extract table data
for table in response['Blocks']:
    if table['BlockType'] == 'TABLE':
        rows = {}
        for relationship in table['Relationships']:
            if relationship['Type'] == 'CHILD':
                for child_id in relationship['Ids']:
                    cell = response['Blocks'][child_id]
                    if cell['BlockType'] == 'CELL':
                        row_index = cell['RowIndex']
                        col_index = cell['ColumnIndex']
                        text = cell['Text']
                        rows.setdefault(row_index, {})[col_index] = text

        # print the table content
        for row_index, row in rows.items():
            row_values = [row.get(col_index, '') for col_index in range(max(row.keys()) + 1)]
            print(row_values)

这个示例代码使用了Amazon Textract Python SDK的analyze_document函数来检测表格。它返回一个包含所有检测到的块（包括表格）的响应对象。然后，它遍历响应对象以查找表格块并提取表格单元格的数据。最后，它打印出表格中的所有行。

请注意，Amazon Textract不保证能够检测所有类型的表格或所有单元格。因此，如果您的表格不是常

上一篇：AWSTextract只能从多页PDF中提取第一页的表格和表单

下一篇：AWS特征存储中创建特征组时出现413错误。

AWSTextract（OCR）未检测到某些单元格

相关内容

热门资讯