Background
For better or for worse (better for me…maybe worse for everyone else), I decided to write my own static site generator to create this blog. One of the things I wanted to emulate was the split YAML/markdown scheme used by Jekyll, e.g.:
---
title: This is my post title!
description: This is a summary description of what the post is all about!
...
# Heading
Article prose.
What about the rest of the stream?
PyYAML makes it easy to load a single yaml doc
from a file or all of the yaml docs in a multi-doc file. However, it doesn’t
have any high-level functions that allow you to parse off just the first yaml
doc and leave the rest of the data hanging out there in the stream
(NOTE: You can, use Loader.get_data()
to load just the first document, if you don’t need to retain the file position
for subsequent reads!).
The Hack
I got around this using the PyYAML events API.
yml_topdoc.py
"""Parse document-level config from an input file."""
import yaml
def load_first_yaml(stream):
"""
Read only the first yaml doc from a multi-doc - one or more of which may
not be yaml.
"""
loader = yaml.SafeLoader(stream)
yaml_doc = {}
parse_events = []
doc_end = None
# Parse the yaml doc, event by event, until document end:
while True:
parse_event = loader.get_event()
if isinstance(parse_event, yaml.DocumentEndEvent):
doc_end = loader.get_mark()
break
parse_events.append(parse_event)
# Emit the parsed events as a doc and load:
if parse_events:
yaml_doc = yaml.safe_load(yaml.emit(parse_events))
# Reset the file pointer to the byte after the document end marker:
if doc_end is not None:
stream.seek(doc_end.index+1)
# Return the yaml doc.
# The stream provided as input now points immediately after the doc end.
return yaml_doc
if __name__ == '__main__':
import sys
import pprint
with open(sys.argv[1], 'rb') as f:
yaml_doc = load_first_yaml(f)
print('\n=== Yaml doc: ===')
pprint.pprint(yaml_doc, indent=4, compact=False)
print('\n=== Remainder of file: ===')
print(f.read())
Example
test.md
---
# YAML configuration data:
field: value
object:
stuff:
- 1
- 2
- c
...
# Markdown text starts here!
Text.
Test Run
(ins)(acanaday)-▹ python3 ./yml_topdoc.py ./test.md
=== Yaml doc: ===
{'field': 'value', 'object': {'stuff': [1, 2, 'c']}}
=== Remainder of file: ===
b'# Markdown text starts here!\n\nText.\n'