by Michael Milvich
Here at Anvil we have increasingly run into embedded systems utilizing the nanopb - Protocol Buffers for Embedded Systems project. Nanopb is a small code size Protocol Buffer implementation targeting memory restricted systems. Nanopb includes options to specify max sizes and counts for strings, byte arrays, and repeated fields making deserialization more deterministic and less likely to run into memory exhaustion issues and other potential security issues. If you are making an embedded system with Protocol Buffers, it is worth checking out.
As reverse engineers, we are always excited when we discover metadata. Metadata helps us by providing meaning to the instructions and data we encounter. We want to take advantage of that whenever we can. When using Protocol Buffers the encoders and decoders rely on metadata. We can use this metadata to recover the structures and when used for network communications, the structure of an unknown/proprietary protocol!
When first encountering metadata, it can be quite opaque! It is a rather meaningless set of numbers:
This is where a Protobuf Decompiler becomes handy; it can parse the metadata and return the original Protobuf definition:
// Decompiled nanopb protobuf
syntax = "proto2";
import "nanopb.proto"; // include from the nanopb project
message Message_100008148 {
required int32 field_1 = 1;
required int64 field_2 = 2;
required uint32 field_3 = 3;
required uint64 field_4 = 4;
required sint32 field_5 = 5;
required sint64 field_6 = 6;
required bool field_7 = 7;
required fixed32 field_8 = 8;
....
Some information is lost in the translation as the original Protobuf compiler did not retain the names of the fields; only the necessary bits required to encode and decode messages are included in the metadata. So while we cannot recover the exact original Protobuf definition, we will get back a definition that can encode and decode the same messages.
There are a number of excellent tools that can analyze the metadata created by Google’s protoc
tool and recover the original .proto
definition files from compiled binaries:
Not surprisingly, the tools targeting metadata created by protoc
are unable to understand nanopb’s metadata. So we wrote our own tool to decompile and extract .proto
definition files from compiled nanopb binaries.
Our nanopb-decompiler is an IDA python script that can recreate .proto
files from binaries compiled with 0.3.x, and 0.4.x versions of nanopb.
Features
- Can extract
.proto
definitions from 0.3.x version, and 0.4.x versions. - Recovers nanopb’s options, such as
max_size
andmax_count
. - Recovers default values.
Caveats
- enum - Enumerations are not retained and most likely will be decompiled as basic 32-bit unsigned integers.
- packed - Metadata does not retain the packed option. Repeated fields can still be encoded/decoded, but without the packed optimizations.
- float/fixed32/sfixed32 | double/fixed64/sfixed64 - The nanopb decoder treats these types as 32-bits|64-bits of raw data. There is not enough metadata to determine if the data is an int, unsigned int, or a float/double. Our nanopb-decompiler reports these fields as a fixed32|fixed64, and defaults will be shown as unsgined ints.
- pre 0.3.0 - Unlikely to work on version before 0.3.0.
- alpha quality - We have only tested a limited number of protobufs, especially with all the nanopb options. If you find something that doesn't work, file a bug, or better yet send a pull request. 🙂
About the Author
Michael Milvich is a Fellow at Anvil Secure. Prior to joining Anvil, Michael worked as a Senior Principal Consultant IOActive Inc, and as a Cyber Security Researcher at Idaho National Laboratory (INL). Michael got his start in embedded security hacking SCADA and ICS systems and later broadened to encompass a wide variety of embedded systems across many industries. Michael’s strong technical background combined with his years of general consulting have been utilized to assist some of the leading technologies and most advanced security groups in improving their security posture.