python-hl7 is a simple library for parsing messages of Health Level 7
(HL7) version 2.x into Python objects. python-hl7 includes a simple
client that can send HL7 messages to a Minimal Lower Level Protocol (MLLP)
server (mllp_send).
HL7 is a communication protocol and message format for
health care data. It is the de-facto standard for transmitting data
between clinical information systems and between clinical devices.
The version 2.x series, which is often is a pipe delimited format
is currently the most widely accepted version of HL7 (there
is an alternative XML-based format).
python-hl7 currently only parses HL7 version 2.x messages into
an easy to access data structure. The library could eventually
also contain the ability to create HL7 v2.x messages.
0.3.0 breaks backwards compatibility by correcting
the indexing of the MSH segment and the introducing improved parsing down to
the repetition and sub-component level.
HL7 Messages have a limited number of levels. The top level is a Message.
A Message is comprised of a number of Fields (hl7.Field).
Fields can repeat (hl7.Repetition). The content of a field
is either a primitive data type (such as a string) or a composite
data type comprised of one or more Components (hl7.Component). Components
are in turn comprised of Sub-Components (primitive data types).
The result of parsing is accessed as a tree using python list conventions:
Note that since the first element of the segment is the segment name,
segments are effectively 1-based in python as well (because the HL7 spec does
not count the segment name as part of the segment itself):
Since many many types of segments only have a single instance in a message
(e.g. PID or MSH), hl7.Message.segment() provides a convienance
wrapper around hl7.Message.segments() that returns the first matching
hl7.Segment:
python-hl7 features a simple network client, mllp_send, which reads HL7
messages from a file or sys.stdin and posts them to an MLLP server.
mllp_send is a command-line wrapper around
hl7.client.MLLPClient. mllp_send is a useful tool for
testing HL7 interfaces or resending logged messages:
For receiving HL7 messages using the Minimal Lower Level Protocol (MLLP), take a
look at the related twisted-hl7 package.
If do not want to use twisted and are looking to re-write some of twisted-hl7’s
functionality, please reach out to us. It is likely that some of the MLLP
parsing and formatting can be moved into python-hl7, which twisted-hl7 and other
libraries can depend upon.
python-hl7 supports both Python 2.6+ and Python 3.3+. The library primarily
deals in unicode (the str type in Python 3).
Passing a byte string to hl7.parse(), requires setting the
encoding parameter, if using anything other than UTF-8. hl7.parse()
will always return a datastructure containing unicode.
hl7.Message can be forced back into a string using
unicode(message) in Python 2 and str(message) in Python 3.
Returns a instance of the hl7.Message that allows
indexed access to the data elements.
A custom hl7.Factory subclass can be passed in to be used when
constructing the message and it’s components.
Note
HL7 usually contains only ASCII, but can use other character
sets (HL7 Standards Document, Section 1.7.1), however as of v2.8,
UTF-8 is the preferred character set [1].
python-hl7 works on Python unicode strings. hl7.parse()
will accept unicode string or will attempt to convert bytestrings
into unicode strings using the optional encoding parameter.
encoding defaults to UTF-8, so no work is needed for bytestrings
in UTF-8, but for other character sets like ‘cp1252’ or ‘latin1’,
encoding must be set appropriately.
If key is an integer, __getitem__ acts list a list, returning
the hl7.Segment held at that index:
>>> h[1][[u'PID'], ...]
If the key is a string of length 3, __getitem__ acts like a dictionary,
returning all segments whose segment_id is key
(alias of hl7.Message.segments()).
ack_code options are one of AA (accept), AR (reject), AE (error)
(see HL7 Table 0008 - Acknowledgment Code)
message_id control message ID for ACK, defaults to unique generated ID
application name of sending application, defaults to receiving application of message
facility name of sending facility, defaults to receiving facility of message
If the parse tree is deeper than the specified path continue
following the first child branch until a leaf of the tree is
encountered and return that value (which could be blank).
Example:
PID.F3.R1.C2 = ‘Sub-Component1’ (assume .SC1)
If the parse tree terminates before the full path is satisfied
check each of the subsequent paths and if every one is specified
at position 1 then the leaf value reached can be returned as the
result.
Second level of an HL7 message, which represents an HL7 Segment.
Traditionally this is a line of a message that ends with a carriage
return and is separated by pipes. It contains a list of
hl7.Field instances.
Third level of an HL7 message, that traditionally is surrounded
by pipes and separated by carets. It contains a list of strings
or hl7.Repetition instances.
Wraps a byte string, unicode string, or hl7.Message
in a MLLP container and send the message to the server
If message is a byte string, we assume it is already encoded properly.
If message is unicode or hl7.Message, it will be encoded
according to hl7.client.MLLPClient.encoding
python-hl7 features a simple network client, mllp_send, which reads HL7
messages from a file or sys.stdin and posts them to an MLLP server.
mllp_send is a command-line wrapper around
hl7.client.MLLPClient. mllp_send is a useful tool for
testing HL7 interfaces or resending logged messages:
By default, mllp_send expects the FILE or stdin input to be a properly
formatted HL7 message (carriage returns separating segments) wrapped in a MLLP
stream (<SB>message1<EB><CR><SB>message2<EB><CR>...).
However, it is common, especially if the file has been manually edited in
certain text editors, that the ASCII control characters will be lost and the
carriage returns will be replaced with the platform’s default line endings.
In this case, mllp_send provides the --loose option, which attempts
to take something that “looks like HL7” and convert it into a proper HL7
message..
A tree has leaf values and nodes. Only the leaves of the tree can have a value.
All data items in the message will be in a leaf node.
After parsing, the data items in the message are in position in the parse tree, but
they remain in their escaped form. To extract a value from the tree you start at the
root of the Segment and specify the details of which field value you want to extract.
The minimum specification is the field number and repeat number. If you are after a
component or sub-component value you also have to specify these values.
If for instance if you want to read the value “Sub-Component2” from the example HL7
you need to specify: Field 3, Repeat 1, Component 2, Sub-Component 2 (PID.F1.R1.C2.S2).
Reading values from a tree structure in this manner is the only safe way to read data
from a message.
All values should be accessed in this manner. Even if a field is marked as being
non-repeating a repeat of “1” should be specified as later version messages
could have a repeating value.
To enable backward and forward compatibility there are rules for reading values when the
tree does not match the specification (eg PID.F1.R1.C2.S2) The common example of this is
expanding a HL7 “IS” Value into a Codeded Value (“CE”). Systems reading a “IS” value would
read the Identifier field of a message with a “CE” value and systems expecting a “CE” value
would see a Coded Value with only the identifier specified. A common Australian example of
this is the OBX Units field, which was an “IS” value previously and became a “CE” Value
in later versions.
Old Version: “|mmol/l|” New Version: “|mmol/l^^ISO+|”
Systems expecting a simple “IS” value would read “OBX.F6.R1” and this would yield a value
in the tree for an old message but with a message with a Coded Value that tree node would
not have a value, but would have 3 child Components with the “mmol/l” value in the first
subcomponent. To resolve this issue where the tree is deeper than the specified path the
first node of every child node is traversed until a leaf node is found and that value is
returned.
>>> h['PID.F3.R1.C2']u'Sub-Component1'
This is a general rule for reading values: If the parse tree is deeper than the specified
path continue following the first child branch until a leaf of the tree is encountered
and return that value (which could be blank).
Systems expecting a Coded Value (“CE”), but reading a message with a simple “IS” value in it
have the opposite problem. They have a deeper specification but have reached a leaf node and
cannot follow the path any further. Reading a “CE” value requires multiple reads for each
sub-component but for the “Identifier” in this example the specification would be “OBX.F6.R1.C1”.
The tree would stop at R1 so C1 would not exist. In this case the unsatisfied path elements
(C1 in this case) can be examined and if every one is position 1 then they can be ignored and
the leaf of the tree that was reached returned. If any of the unsatisfied paths are not in
position 1 then this cannot be done and the result is a blank string.
This is the second Rule for reading values: If the parse tree terminates before the full path
is satisfied check each of the subsequent paths and if every one is specified at position 1
then the leaf value reached can be returned as the result.
>>> h['PID.F1.R1.C1.S1']u'Field1'
This is a general rule for reading values: If the parse tree is deeper than the specified
path continue following the first child branch until a leaf of the tree is encountered
and return that value (which could be blank).
In the second example every value that makes up the Coded Value, other than the identifier
has a component position greater than one and when reading a message with a simple “IS”
value in it, every value other than the identifier would return a blank string.
Following these rules will result in excellent backward and forward compatibility. It is
important to allow the reading of values that do not exist in the parse tree by simply
returning a blank string. The two rules detailed above, along with the full tree specification
for all values being read from a message will eliminate many of the errors seen when
handling earlier and later message versions.
>>> h['PID.F10.R1']u''
At this point the desired value has either been located, or is absent, in which case a blank
string is returned.
HL7 messages are transported using the 7bit ascii character set. Only characters between
ascii 32 and 127 are used. Characters which cannot be transported using this range
of values must be ‘escaped’, that is replaced by a sequence of characters for transmission.
The stores values internally in the escaped format. When the message is composed using
‘unicode’, the escaped value must be returned.
When the accessor is used to reference the field, the field is automatically unescaped.
>>> h['PID.F2.R1']u'|'
The escape/unescape mechanism support replacing separator characters with their escaped
version and replacing non-ascii characters with hexadecimal versions.
The escape method returns a ‘str’ object. The unescape method returns a unicode object.
HL7 defines a protocol for encoding presentation characters, These include hightlighting,
and rich text functionality. The API does not currently allow for easy access to the
escape/unescape logic. You must overwrite the message class escape and unescape methods,
after parsing the message.
The test suite is located in tests/ and can be run several ways.
It is recommended to run the full tox suite so
that all supported Python versions are tested and the documentation is built
and tested. We provide a Makefile to create a virtualenv, install tox,
and run tox:
0.3.0 breaks backwards compatibility by correcting
the indexing of the MSH segment and the introducing improved parsing down to
the repetition and sub-component level.
Changed the numbering of fields in the MSH segment.
This breaks older code.
Parse all the elements of the message (i.e. down to sub-component). The
inclusion of repetitions will break older code.
Message (and Message.segments), Field, Repetition and Component can be
accessed using 1-based indices by using them as a callable.
Added Python 3 support. Python 2.6, 2.7, and 3.3 are officially supported.
hl7.parse() can now decode byte strings, using the encoding
parameter. hl7.client.MLLPClient can now encode unicode input
using the encoding parameter. To support Python 3, unicode is now
the primary string type used inside the library. bytestrings are only
allowed at the edge of the library now, with hl7.parse and sending
via hl7.client.MLLPClient. Refer to Python 2 vs Python 3 and Unicode vs Byte strings.
Testing via tox and travis CI added. See Contributing.
A massive thanks to Kevin Gill and
Emilien Klein for the initial code submissions
to add the improved parsing, and to
Andrew Wason for rebasing the initial pull
request and providing assistance in the transition.
mllp_send--loose algorithm modified to allow multiple messages per file.
The algorithm now splits messages based upon the presumed start of a message,
which must start with MSH|^~\&|
mllp_send now takes the --loose options, which allows
sending HL7 messages that may not exactly meet the standard (Windows newlines
separating segments instead of carriage returns).
Converted hl7.segment and hl7.segments into methods on
hl7.Message.
Support dict-syntax for getting Segments from a Message (e.g. message['OBX'])
Use unicode throughout python-hl7 since the HL7 spec allows non-ASCII characters.
It is up to the caller of hl7.parse() to convert non-ASCII messages
into unicode.
Refactored from single hl7.py file into the hl7 module.
Copyright (C) 2009-2011 John Paulett (john -at- paulett.org)
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. The name of the author may not be used to endorse or promote
products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.