Descriptor¶
Package for parsing and processing descriptor data.
Module Overview:
parse_file - Parses the descriptors in a file.
create - Creates a new custom descriptor.
create_signing_key - Cretes a signing key that can be used for creating descriptors.
Descriptor - Common parent for all descriptor file types.
|- get_path - location of the descriptor on disk if it came from a file
|- get_archive_path - location of the descriptor within the archive it came from
|- get_bytes - similar to str(), but provides our original bytes content
|- get_unrecognized_lines - unparsed descriptor content
+- __str__ - string that the descriptor was made from
-
stem.descriptor.__init__.
DocumentHandler
(enum)¶ Ways in which we can parse a
NetworkStatusDocument
.Both ENTRIES and BARE_DOCUMENT have a ‘thin’ document, which doesn’t have a populated routers attribute. This allows for lower memory usage and upfront runtime. However, if read time and memory aren’t a concern then DOCUMENT can provide you with a fully populated document.
Handlers don’t change the fact that most methods that provide descriptors return an iterator. In the case of DOCUMENT and BARE_DOCUMENT that iterator would have just a single item - the document itself.
Simple way to handle this is to call next() to get the iterator’s one and only value…
import stem.descriptor.remote from stem.descriptor import DocumentHandler consensus = next(stem.descriptor.remote.get_consensus( document_handler = DocumentHandler.BARE_DOCUMENT, )
DocumentHandler
Description
ENTRIES
Iterates over the contained
RouterStatusEntry
. Each has a reference to the bare document it came from (through its document attribute).DOCUMENT
NetworkStatusDocument
with theRouterStatusEntry
it contains (through its routers attribute).BARE_DOCUMENT
NetworkStatusDocument
without a reference to its contents (theRouterStatusEntry
are unread).
-
stem.descriptor.__init__.
parse_file
(descriptor_file, descriptor_type=None, validate=False, document_handler='ENTRIES', normalize_newlines=None, **kwargs)[source]¶ Simple function to read the descriptor contents from a file, providing an iterator for its
Descriptor
contents.If you don’t provide a descriptor_type argument then this automatically tries to determine the descriptor type based on the following…
The @type annotation on the first line. These are generally only found in the CollecTor archives.
The filename if it matches something from tor’s data directory. For instance, tor’s ‘cached-descriptors’ contains server descriptors.
This is a handy function for simple usage, but if you’re reading multiple descriptor files you might want to consider the
DescriptorReader
.Descriptor types include the following, including further minor versions (ie. if we support 1.1 then we also support everything from 1.0 and most things from 1.2, but not 2.0)…
Descriptor Type
Class
server-descriptor 1.0
extra-info 1.0
microdescriptor 1.0
directory 1.0
unsupported
network-status-2 1.0
dir-key-certificate-3 1.0
network-status-consensus-3 1.0
network-status-vote-3 1.0
network-status-microdesc-consensus-3 1.0
bridge-network-status 1.0
bridge-server-descriptor 1.0
bridge-extra-info 1.1 or 1.2
torperf 1.0
unsupported
bridge-pool-assignment 1.0
unsupported
tordnsel 1.0
hidden-service-descriptor 1.0
If you’re using python 3 then beware that the open() function defaults to using text mode. Binary mode is strongly suggested because it’s both faster (by my testing by about 33x) and doesn’t do universal newline translation which can make us misparse the document.
my_descriptor_file = open(descriptor_path, 'rb')
- Parameters
descriptor_file (str,file,tarfile) – path or opened file with the descriptor contents
descriptor_type (str) – descriptor type, this is guessed if not provided
validate (bool) – checks the validity of the descriptor’s content if True, skips these checks otherwise
document_handler (stem.descriptor.__init__.DocumentHandler) – method in which to parse the
NetworkStatusDocument
normalize_newlines (bool) – converts windows newlines (CRLF), this is the default when reading data directories on windows
kwargs (dict) – additional arguments for the descriptor constructor
- Returns
iterator for
Descriptor
instances in the file- Raises
ValueError if the contents is malformed and validate is True
TypeError if we can’t match the contents of the file to a descriptor type
IOError if unable to read from the descriptor_file
-
class
stem.descriptor.__init__.
Descriptor
(contents, lazy_load=False)[source]¶ Bases:
object
Common parent for all types of descriptors.
-
classmethod
content
(attr=None, exclude=(), sign=False)[source]¶ Creates descriptor content with the given attributes. Mandatory fields are filled with dummy information unless data is supplied. This doesn’t yet create a valid signature.
New in version 1.6.0.
- Parameters
attr (dict) – keyword/value mappings to be included in the descriptor
exclude (list) – mandatory keywords to exclude from the descriptor, this results in an invalid descriptor
sign (bool) – includes cryptographic signatures and digests if True
- Returns
str with the content of a descriptor
- Raises
ImportError if cryptography is unavailable and sign is True
NotImplementedError if not implemented for this descriptor type
-
classmethod
create
(attr=None, exclude=(), validate=True, sign=False)[source]¶ Creates a descriptor with the given attributes. Mandatory fields are filled with dummy information unless data is supplied. This doesn’t yet create a valid signature.
New in version 1.6.0.
- Parameters
attr (dict) – keyword/value mappings to be included in the descriptor
exclude (list) – mandatory keywords to exclude from the descriptor, this results in an invalid descriptor
validate (bool) – checks the validity of the descriptor’s content if True, skips these checks otherwise
sign (bool) – includes cryptographic signatures and digests if True
- Returns
Descriptor
subclass- Raises
ValueError if the contents is malformed and validate is True
ImportError if cryptography is unavailable and sign is True
NotImplementedError if not implemented for this descriptor type
-
get_path
()[source]¶ Provides the absolute path that we loaded this descriptor from.
- Returns
str with the absolute path of the descriptor source
-
get_archive_path
()[source]¶ If this descriptor came from an archive then provides its path within the archive. This is only set if the descriptor came from a
DescriptorReader
, and is None if this descriptor didn’t come from an archive.- Returns
str with the descriptor’s path within the archive
-
classmethod