1.首先,安装AWS Textract的dotnet sdk C#。可以使用NuGet包管理器在Visual Studio中安装“AWSSDK.Textract”包。
2.在C#代码中,首先要引用必要的命名空间:
using Amazon.Textract; using Amazon.Textract.Model;
3.创建一个AmazonTextractClient客户端对象,以便访问AWS Textract:
AmazonTextractClient client = new AmazonTextractClient();
4.构造一个DetectDocumentTextRequest对象,指定要检测文档中的哪些信息:
DetectDocumentTextRequest request = new DetectDocumentTextRequest { Document = new Document { S3Object = new S3Object { Bucket = "my-bucket", Name = "sample-document.pdf" } } };
5.调用AWS Textract的DetectDocumentText方法并获取响应结果:
DetectDocumentTextResponse response = client.DetectDocumentText(request);
6.从响应结果中提取指定的信息。下面的代码示例演示如何获取文档中所有表格单元格的值:
foreach (Block block in response.Blocks) { if (block.BlockType == BlockType.TABLE_CELL) { string cellText = ""; foreach (Relationship relationship in block.Relationships) { if (relationship.Type == "CHILD") { foreach (string id in relationship.Ids) { Block cell = response.Blocks.Find(b => b.Id == id); if (cell.BlockType == BlockType.WORD || cell.BlockType == BlockType.SELECTION_ELEMENT) { cellText += cell.Text + " "; } } } } Console.WriteLine(cellText); } }
通过以上步骤,您可以使用AWS Textract的dotnet sdk C#读取指定的文档信息。