Markdown: Sanitizier Configuration (#9075)

* Support custom sanitization policy

Allowing the gitea administrator to configure sanitization policy allows
them to couple external renders and custom templates to support more
markup. In particular, the `pandoc` renderer allows generating KaTeX
annotations, wrapping them in `<span>` elements with class `math` and
either `inline` or `display` (depending on whether or not inline or
block mode was requested).

This iteration gives the administrator whitelisting powers; carefully
crafted regexes will thus let through only the desired attributes
necessary to support their custom markup.

Resolves: #9054

Signed-off-by: Alexander Scheel <alexander.m.scheel@gmail.com>

* Document new sanitization configuration

 - Adds basic documentation to app.ini.sample,
 - Adds an example to the Configuration Cheat Sheet, and
 - Adds extended information to External Renderers section.

Signed-off-by: Alexander Scheel <alexander.m.scheel@gmail.com>

* Drop extraneous length check in newMarkupSanitizer(...)

Signed-off-by: Alexander Scheel <alexander.m.scheel@gmail.com>

* Fix plural ELEMENT and ALLOW_ATTR in docs

These were left over from their initial names. Make them singular to
conform with the current expectations.

Signed-off-by: Alexander Scheel <alexander.m.scheel@gmail.com>
This commit is contained in:
Alexander Scheel
2019-12-07 14:49:04 -05:00
committed by techknowlogick
parent cecc31951c
commit ee7df7ba8c
5 changed files with 155 additions and 29 deletions

View File

@ -578,6 +578,24 @@ Two special environment variables are passed to the render command:
- `GITEA_PREFIX_SRC`, which contains the current URL prefix in the `src` path tree. To be used as prefix for links.
- `GITEA_PREFIX_RAW`, which contains the current URL prefix in the `raw` path tree. To be used as prefix for image paths.
Gitea supports customizing the sanitization policy for rendered HTML. The example below will support KaTeX output from pandoc.
```ini
[markup.sanitizer]
; Pandoc renders TeX segments as <span>s with the "math" class, optionally
; with "inline" or "display" classes depending on context.
ELEMENT = span
ALLOW_ATTR = class
REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
```
- `ELEMENT`: The element this policy applies to. Must be non-empty.
- `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty.
- `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute.
You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry.
## Time (`time`)
- `FORMAT`: Time format to diplay on UI. i.e. RFC1123 or 2006-01-02 15:04:05

View File

@ -68,4 +68,22 @@ RENDER_COMMAND = rst2html.py
IS_INPUT_FILE = false
```
If your external markup relies on additional classes and attributes on the generated HTML elements, you might need to enable custom sanitizer policies. Gitea uses the [`bluemonday`](https://godoc.org/github.com/microcosm-cc/bluemonday) package as our HTML sanitizier. The example below will support [KaTeX](https://katex.org/) output from [`pandoc`](https://pandoc.org/).
```ini
[markup.sanitizer]
; Pandoc renders TeX segments as <span>s with the "math" class, optionally
; with "inline" or "display" classes depending on context.
ELEMENT = span
ALLOW_ATTR = class
REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
[markup.markdown]
ENABLED = true
FILE_EXTENSIONS = .md,.markdown
RENDER_COMMAND = pandoc -f markdown -t html --katex
```
You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry. All three must be defined, but `REGEXP` may be blank to allow unconditional whitelisting of that attribute.
Once your configuration changes have been made, restart Gitea to have changes take effect.