Transitioning from rich text to markdown can be a challenge, especially when dealing with internal links in a CMS like Umbraco. Here's a quick guide to simplify the process based on my recent experiences on migrating uMarketingSuite's documentation over to the Umbraco Docs! ✨
Getting Started
In order to make it possible to export pages in bulk with the minimal amount of effort nessecary, one of the the easiest route to take would be to create a simple API controller that generates a downloadable markdown file. The code below is updated for Umbraco 13, and may vary slightly if you're using an older (or newer) version.
If you are simply looking for code examples you can skip the explanations further down.
Start with a new UmbracoApiController and inject the IContentService and IUmbracoHelperAccessor through it's constructor. After that, add a method that returns an IActionResult that we can request through our browser, and give it an parameter of type int so that we can request the markdown for a specific page.
Within our method, we start by getting the content via the ContentService, and then get the property corresponding to the Rich Text Editor. Depending on your setup you may have multiple fields that you wish to export, or use something like a Block List Editor that contains elements with the editor, all of which are usable! For the current version of Umbraco the property is stored as a JSON string, where the "markup" key contains our actual Rich Text HTML.
Converting & Exporting
To convert our content from the Rich Text Editor HTML format to markdown, we have the option of choosing from various libraries to do the work for us, where in this example I'm using the open-source library ReverseMarkdown. This library contains a converter that accepts an html string, and returns the string in markdown format.
After the conversion, we can set various properties of our Response in order to trigger our browser to download the converted markdown file, and give the file a name based on the page id and item name, when an user makes a request to the corresponding url (/'umbraco/api/download/getMarkdown?id=ID_HERE').
With this in place, you end up with a fully working setup that allows you to convert & export your rich text content to markdown format... but there is still one thing that we have forgotten about; Internal links to Umbraco Content! When referencing pages in Umbraco using the Rich Text Editor, it stores a link to the document GUID, not a reference-able href link. Let's add a method to resolve that!
Umbraco follows the following pattern for local links: '{localLink:umb://document/<GUID>}', which we can detect using a regular expression. Once we have done that, we can extract the various referenced GUIDs and try to get the content and its URL corresponding to said GUID. If the content doesn't exist or isn't published, we can set a placeholder text instead!
Summary
By requesting the content & properties that contain Rich Text Editors, converting the content using one of the various available libraries for markdown conversion, parsing the internal urls to publicly accessible ones, and then returning the content as a downloadable file, we can easily export Umbraco Pages to markdown in bulk! 🚀
If you have any further questions, feel free to contact me over at my socials available at the Contact page, and I'd love to hear about either your success stories, or further troubles to help improve this blog! 😄