我们可以使用手动编写的Schema来序列化嵌套的结构。以下是示例代码:
from avro import schema
from avro import io
from avro import datafile
# 构建外层的Schema
outer_schema = schema.Parse("""
{
"namespace": "example.avro",
"type": "record",
"name": "OuterRecord",
"fields": [
{"name": "inner", "type": "InnerRecord"}
]
}
""")
# 构建内层的Schema
inner_schema = schema.Parse("""
{
"namespace": "example.avro",
"type": "record",
"name": "InnerRecord",
"fields": [
{"name": "value1", "type": "string"},
{"name": "value2", "type": "int"}
]
}
""")
# 构建Record对象
inner_record = {"value1": "hello", "value2": 4}
outer_record = {"inner": inner_record}
# 序列化数据并写入文件
data_writer = io.DatumWriter(outer_schema)
data_file_writer = datafile.DataFileWriter(
open("example.avro", "wb"),
data_writer,
outer_schema,
codec="snappy"
)
data_file_writer.append(outer_record)
data_file_writer.close()
# 读取并反序列化数据
data_file_reader = datafile.DataFileReader(
open("example.avro", "rb"),
io.DatumReader()
)
for record in data_file_reader:
print(record)
# 访问内层数据
inner_value1 = record["inner"]["value1"]
inner_value2 = record["inner"]["value2"]
print(f"inner_value1: {inner_value1}")
print(f"inner_value2: {inner_value2}")
data_file_reader.close()
上述代码中,我们手动编写了外层和内层的Schema,并且使用DatumWriter进行序列化,使用DataFileWriter将数据写入文件,使用DataFileReader读取并反序列化数据。
值得注意的是,我们可以通过record["inner"]["value1"]访问内层数据。
上一篇:Avro编码器空指针异常