TFRecord是官方推荐使用的tensorflow模型数据存储格式。基于该格式的模型数据,可以实现较小空间大小的数据携带。
这里使用java的API,对TFRecord中基本数据结构:BytesList、FloatList、Int64List、Feature、Features、Example、FeatureList、FeatureLists、SequenceExample进行说明。
关键数据结构的proto定义
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| message BytesList { repeated bytes value = 1; } message FloatList { repeated float value = 1 [packed = true]; } message Int64List { repeated int64 value = 1 [packed = true]; } message Feature { oneof kind { BytesList bytes_list = 1; FloatList float_list = 2; Int64List int64_list = 3; } }; message Features { map<string, Feature> feature = 1; }; message Example { Features features = 1; };
message FeatureList { repeated Feature feature = 1; }; message FeatureLists { map<string, FeatureList> feature_list = 1; }; message SequenceExample { Features context = 1; FeatureLists feature_lists = 2; };
|
java代码中需要引入的数据类型
1 2 3 4 5 6 7 8 9 10
| import com.google.protobuf.ByteString; import org.tensorflow.example.BytesList; import org.tensorflow.example.FloatList; import org.tensorflow.example.Int64List; import org.tensorflow.example.Feature; import org.tensorflow.example.Features; import org.tensorflow.example.FeatureList; import org.tensorflow.example.FeatureLists; import org.tensorflow.example.Example; import org.tensorflow.example.SequenceExample;
|
依赖包配置
1 2 3 4 5
| <dependency> <groupId>org.tensorflow</groupId> <artifactId>proto</artifactId> <version>${tensorflow.version}</version> </dependency>
|
BytesList, FloatList, Int64List
这三种类型是TFRecord的基本数据结构,内部包装的是三种不同类型的列表,并且提供的操作,都是基于内部包装类型进行的。
BytesList是对List<byteString>
类型的包装
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| BytesList.Builder bytesListBuilder = BytesList.newBuilder(); BytesList bytesList = bytesListBuilder .addValue(ByteString.copyFromUtf8("A")) .addValue(ByteString.copyFromUtf8("B")) .setValue(0, ByteString.copyFromUtf8("C")) .addValue(ByteString.copyFromUtf8("D")) .build(); System.out.println(bytesList);
|
FloatList是对List<float32>
类型的包装
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| FloatList.Builder floatListBuilder = FloatList.newBuilder(); FloatList floatList = floatListBuilder .addValue(1F).addValue(2F).addValue(3F) .mergeFrom(FloatList.newBuilder().addValue(4F).addValue(5F).build()) .build(); System.out.println(floatList);
|
Int64List是对List<int64>
类型的包装
1 2 3 4 5 6 7 8 9 10 11 12 13
| Int64List.Builder int64ListBuilder = Int64List.newBuilder(); Int64List int64List = int64ListBuilder .addAllValue(Arrays.asList(1L, 2L, 3L, 4L)) .build(); System.out.println(int64List);
|
Feature, FeatureList
Feature是对BytesList,Int64List,FloatList三种类型中的一种进行了包装。通过Feature的包装,隐藏了不同特征列的类型差异。
FeatureList是对List<Feature>
的包装。每个Feature包装的list(bytesList, Int64List, FloatList)长度可以不同。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| Feature.Builder featureBuilder = Feature.newBuilder(); Feature feature = featureBuilder .setBytesList(bytesList) .setInt64List(int64List) .setFloatList(floatList) .build(); System.out.println(feature);
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| FeatureList.Builder featureListBuilder = FeatureList.newBuilder(); FeatureList featureList = featureListBuilder .addFeature(feature) .addFeature(Feature.newBuilder().setInt64List(int64List).build()) .build(); System.out.println(featureList);
|
Features, FeatureLists
Features是对Map<String, Feature>
的包装。
FeatureLists是对Map<String, FeatureList>
的包装。
不同的key对应的不同的特征名称。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| Feature.Builder bytesFeatureBuilder = Feature.newBuilder(); Feature bytesFeature = bytesFeatureBuilder.setBytesList(bytesList).build(); Feature.Builder floatFeatureBuilder = Feature.newBuilder(); Feature floatFeature = floatFeatureBuilder.setFloatList(floatList).build(); Features.Builder featuresBuilder = Features.newBuilder(); Features features = featuresBuilder .putFeature("bytesFeatureKey", bytesFeature) .putFeature("floatFeatureKey", floatFeature) .putFeature("emptyFeatureKey", Feature.newBuilder().build()) .build(); System.out.println(features);
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| FeatureLists.Builder featureListsBuilder = FeatureLists.newBuilder(); FeatureLists featureLists = featureListsBuilder .putAllFeatureList(new HashMap<String, FeatureList>() {{ put("featureListKey", featureList); put("emptyFeatureListKey", FeatureList.newBuilder().build()); }}) .build(); System.out.println(featureLists);
|
Example, SequenceExample
Example和SequenceExample是TFRecord最终序列化的两种格式。
Example是对Features的包装。
SequenceExample是对Features和FeatureLists的包装。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
| Example.Builder exampleBuilder = Example.newBuilder(); Example example = exampleBuilder .setFeatures(features) .build(); System.out.println(example);
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
| SequenceExample.Builder sequenceExampleBuilder = SequenceExample.newBuilder(); SequenceExample sequenceExample = sequenceExampleBuilder .setContext(features) .setFeatureLists(featureLists) .build(); System.out.println(sequenceExample);
|