Move all composer files inside the common directory

- 2022년 3월 개발팀 결정사항 적용
- 모듈 등 서드파티 자료 개발시 composer를 사용하면 상위 경로에 있는 코어의
  composer.json을 수정하고, 코어의 vendor 디렉토리를 건드리는 것이 기본값임
- 이를 방지하기 위해 코어의 composer.json과 vendor를 common 디렉토리 안으로
  이동하여, 모듈 경로에서 상위 폴더로 인식하지 않도록 함
This commit is contained in:
Kijin Sung 2022-12-26 16:33:32 +09:00
parent 7b912d21fc
commit 5fff6b6eab
1478 changed files with 2 additions and 2 deletions

View file

@ -0,0 +1,3 @@
github: colinodell
tidelift: "packagist/league/html-to-markdown"
custom: ["https://www.colinodell.com/sponsor", "https://www.paypal.me/colinpodell/10.00"]

View file

@ -0,0 +1,25 @@
name: "📃 Bug Report (Incorrect Markdown)"
description: I'm not getting the Markdown I expect
body:
- type: input
id: affected-versions
attributes:
label: Version(s) affected
placeholder: x.y.z
validations:
required: true
- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of the problem.
validations:
required: true
- type: textarea
id: how-to-reproduce
attributes:
label: How to reproduce
description: |
Provide the HTML input and any other information that would help us reproduce the problem.
validations:
required: true

View file

@ -0,0 +1,43 @@
name: "🐛 Bug Report (Other)"
description: Report all other errors and problems
body:
- type: input
id: affected-versions
attributes:
label: Version(s) affected
placeholder: x.y.z
validations:
required: true
- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of the problem.
validations:
required: true
- type: textarea
id: how-to-reproduce
attributes:
label: How to reproduce
description: |
HTML and/or any other information needed to reproduce the problem.
validations:
required: true
- type: textarea
id: possible-solution
attributes:
label: Possible solution
description: |
Optional: only if you have suggestions on a fix/reason for the bug
- type: textarea
id: additional-context
attributes:
label: Additional context
description: |
Optional: any other context about the problem: log messages, screenshots, etc.
- type: textarea
id: feedback
attributes:
label: Did this project help you today? Did it make you happy in any way?
description: |
Optional: Sometimes we get tired of reading bug reports and working on complex features, so if you have anything positive to share about how this library might have helped you we'd love to hear it!

View file

@ -0,0 +1,27 @@
name: "🚀 Feature Request"
description: RFC and ideas for new features and improvements
labels:
- enhancement
body:
- type: textarea
id: description
attributes:
label: Description
description: A clear and concise description of the problem.
validations:
required: true
- type: textarea
id: example
attributes:
label: Example
description: |
A simple example of the new feature in action (include PHP code, sample HTML/Markdown, etc.)
If the new feature changes an existing feature, include a simple before/after comparison.
validations:
required: true
- type: textarea
id: feedback
attributes:
label: Did this project help you today? Did it make you happy in any way?
description: |
Optional: Sometimes we get tired of reading bug reports and working on complex features, so if you have anything positive to share about how this library might have helped you we'd love to hear it!

View file

@ -0,0 +1,13 @@
# SECURITY POLICY
## Supported Versions
When a new **minor** version (`5.x`) is released, the previous one will continue to receive security and bug fixes for *at least* 3 months.
When a new **major** version is released (`4.0`, `5.0`, etc), the previous one will receive bug fixes for *at least* 3 months and security updates for 6 months after that new release comes out.
(This policy may change in the future and exceptions may be made on a case-by-case basis.)
## Reporting a Vulnerability
If you discover a security vulnerability within this package, please use the [Tidelift security contact form](https://tidelift.com/security) or email Colin O'Dell at <colinodell@gmail.com>. All security vulnerabilities will be promptly addressed. Please do not disclose security-related issues publicly until a fix has been announced.

View file

@ -0,0 +1,18 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 90
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 30
# Issues with these labels will never be considered stale
exemptLabels:
- pinned
- on hold
- security
# Label to use when marking an issue as stale
staleLabel: stale
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false

View file

@ -0,0 +1,104 @@
name: Tests
on:
push: ~
pull_request: ~
jobs:
phpcs:
name: PHPCS
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: shivammathur/setup-php@v2
with:
php-version: 7.2
extensions: curl, mbstring
coverage: none
tools: composer:v2, cs2pr
- run: composer update --no-progress
- run: vendor/bin/phpcs -q --report=checkstyle | cs2pr
phpunit:
name: PHPUnit on ${{ matrix.php }} ${{ matrix.composer-flags }}
runs-on: ubuntu-latest
strategy:
matrix:
php: ['7.2', '7.3', '7.4']
coverage: [true]
composer-flags: ['']
include:
- php: '8.0'
coverage: false
composer-flags: '--ignore-platform-req=php'
- php: '7.2'
coverage: false
composer-flags: '--prefer-lowest'
steps:
- uses: actions/checkout@v2
- uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php }}
extensions: curl, mbstring
coverage: pcov
tools: composer:v2
- run: echo "::add-matcher::${{ runner.tool_cache }}/phpunit.json"
- name: "Use PHPUnit 9.3+ on PHP 8"
run: composer require --no-update --dev phpunit/phpunit:^9.3
if: "matrix.php == '8.0'"
- run: composer update --no-progress ${{ matrix.composer-flags }}
- run: vendor/bin/phpunit --no-coverage
if: ${{ !matrix.coverage }}
- run: vendor/bin/phpunit --coverage-text --coverage-clover=coverage.clover
if: ${{ matrix.coverage }}
- run: php vendor/bin/ocular code-coverage:upload --format=php-clover coverage.clover
if: ${{ matrix.coverage }}
continue-on-error: true
phpstan:
name: PHPStan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: shivammathur/setup-php@v2
with:
php-version: 7.2
extensions: curl, mbstring
coverage: none
tools: composer:v2
- run: composer update --no-progress
- run: vendor/bin/phpstan analyse --no-progress
psalm:
name: Psalm
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: shivammathur/setup-php@v2
with:
php-version: 7.2
extensions: curl, mbstring
coverage: none
tools: composer:v2
- run: composer update --no-progress
- run: vendor/bin/psalm --no-progress --output-format=github

View file

@ -0,0 +1,357 @@
# Change Log
All notable changes to this project will be documented in this file.
Updates should follow the [Keep a CHANGELOG](http://keepachangelog.com/) principles.
## [Unreleased][unreleased]
## [5.1.0] - 2022-03-02
### Changed
- Changed horizontal rule style (#218, #219)
### Fixed
- Fixed `Element::getValue()` not handling possible nulls
## [5.0.2] - 2021-11-06
### Fixed
- Fixed missplaced comment nodes appearing at the start of the HTML input (#212)
## [5.0.1] - 2021-09-17
### Fixed
- Fixed lists not using the correct amount of indentation (#211)
## [5.0.0] - 2021-03-28
### Added
- Added support for tables (#203)
- This feature is disable by default - see README for how to enable it
- Added new `strip_placeholder_links` option to strip `<a>` tags without `href` attributes (#196)
- Added new methods to `ElementInterface`:
- `hasParent()`
- `getNextSibling()`
- `getPreviousSibling()`
- `getListItemLevel()`
- Added several parameter and return types across all classes
- Added new `PreConverterInterface` to allow converters to perform any necessary pre-parsing
### Changed
- Supported PHP versions increased to PHP 7.2 - 8.0
- `HtmlConverter::convert()` may now throw a `\RuntimeException` when unexpected `DOMDocument`-related errors occur
### Fixed
- Fixed complex nested lists containing heading and paragraphs (#198)
- Fixed consecutive emphasis producing incorrect markdown (#202)
## [4.10.0] - 2020-06-30
### Added
- Added the ability to disable autolinking with a configuration option (#187, #188)
## [4.9.1] - 2019-12-27
### Fixed
- Fixed issue with HTML entity escaping in text (#184)
## [4.9.0] - 2019-11-02
### Added
- Added new option to preserve comments (#177, #179)
## [4.8.3] - 2019-10-31
### Fixed
- Fixed whitespace preservation around `<code>` tags (#174, #178)
## [4.8.2] - 2019-08-02
### Fixed
- Fixed headers not being placed onto a new line in some cases (#172)
- Fixed handling of links containing spaces (#175)
### Removed
- Removed support for HHVM
## [4.8.1] - 2018-12-24
### Added
- Added support for PHP 7.3
### Fixed
- Fixed paragraphs following tables (#165, #166)
- Fixed incorrect list item escaping (#168, #169)
## [4.8.0] - 2018-09-18
### Added
- Added support for email auto-linking
- Added a new interface (`HtmlConverterInterface`) for the main `HtmlConverter` class
- Added additional test cases (#14)
### Changed
- The `italic_style` option now defaults to `'*'` so that in-word emphasis is handled properly (#75)
### Fixed
- Fixed several issues of `<code>` and `<pre>` tags not converting to blocks or inlines properly (#26, #70, #102, #140, #161, #162)
- Fixed in-word emphasis using underscores as delimiter (#75)
- Fixed character escaping inside of `<div>` elements
- Fixed header edge cases
### Deprecated
- The `bold_style` and `italic_style` options have been deprecated (#75)
## [4.7.0] - 2018-05-19
### Added
- Added `setOptions()` function for chainable calling (#149)
- Added new `list_item_style_alternate` option for converting every-other list with a different character (#155)
### Fixed
- Fixed insufficient newlines after code blocks (#144, #148)
- Fixed trailing spaces not being preserved in link anchors (#157)
- Fixed list-like lines not being escaped inside of lists items (#159)
## [4.6.2]
### Fixed
- Fixed issue with emphasized spaces (#146)
## [4.6.1]
### Fixed
- Fixed conversion of `<pre>` tags (#145)
## [4.6.0]
### Added
- Added support for ordered lists starting at numbers other than 1
### Fixed
- Fixed overly-eager escaping of list-like text (#141)
## [4.5.0]
### Added
- Added configuration option for list item style (#135, #136)
## [4.4.1]
### Fixed
- Fixed autolinking of invalid URLs (#129)
## [4.4.0]
### Added
- Added `hard_break` configuration option (#112, #115)
- The `HtmlConverter` can now be instantiated with an `Environment` (#118)
### Fixed
- Fixed handling of paragraphs in list item elements (#47, #110)
- Fixed phantom spaces when newlines follow `br` elements (#116, #117)
- Fixed link converter not sanitizing inner spaces properly (#119, #120)
## [4.3.1]
### Changed
- Revised the sanitization implementation (#109)
### Fixed
- Fixed tag-like content not being escaped (#67, #109)
- Fixed thematic break-like content not being escaped (#65, #109)
- Fixed codefence-like content not being escaped (#64, #109)
## [4.3.0]
### Added
- Added full support for PHP 7.0 and 7.1
### Changed
- Changed `<pre>` and `<pre><code>` conversions to use backticks instead of indendation (#102)
### Fixed
- Fixed issue where specified code language was not preserved (#70, #102)
- Fixed issue where `<code>` tags nested in `<pre>` was not converted properly (#70, #102)
- Fixed header-like content not being escaped (#76, #105)
- Fixed blockquote-like content not being escaped (#77, #103)
- Fixed ordered list-like content not being escaped (#73, #106)
- Fixed unordered list-like content not being escaped (#71, #107)
## [4.2.2]
### Fixed
- Fixed sanitization bug which sometimes removes desired content (#63, #101)
## [4.2.1]
### Fixed
- Fixed path to autoload.php when used as a library (#98)
- Fixed edge case for tags containing only whitespace (#99)
### Removed
- Removed double HTML entity decoding, as this is not desireable (#60)
## [4.2.0]
### Added
- Added the ability to invoke HtmlConverter objects as functions (#85)
### Fixed
- Fixed improper handling of nested list items (#19 and #84)
- Fixed preceeding or trailing spaces within emphasis tags (#83)
## [4.1.1]
### Fixed
- Fixed conversion of empty paragraphs (#78)
- Fixed `preg_replace` so it wouldn't break UTF-8 characters (#79)
## [4.1.0]
### Added
- Added `bin/html-to-markdown` script
### Changed
- Changed default italic character to `_` (#58)
## [4.0.1]
### Fixed
- Added escaping to avoid * and _ in a text being rendered as emphasis (#48)
### Removed
- Removed the demo (#51)
- `.styleci.yml` and `CONTRIBUTING.md` are no longer included in distributions (#50)
## [4.0.0]
This release changes the visibility of several methods/properties. #42 and #43 brought to light that some visiblities were
not ideally set, so this releases fixes that. Moving forwards this should reduce the chance of introducing BC-breaking changes.
### Added
- Added new `HtmlConverter::getEnvironment()` method to expose the `Environment` (#42, #43)
### Changed
- Changed `Environment::addConverter()` from `protected` to `public`, enabling custom converters to be added (#42, #43)
- Changed `HtmlConverter::createDOMDocument()` from `protected` to `private`
- Changed `Element::nextCached` from `protected` to `private`
- Made the `Environment` class `final`
## [3.1.1]
### Fixed
- Empty HTML strings now result in empty Markdown documents (#40, #41)
## [3.1.0]
### Added
- Added new `equals` method to `Element` to check for equality
### Changes
- Use Linux line endings consistently instead of plaform-specific line endings (#36)
### Fixed
- Cleaned up code style
## [3.0.0]
### Changed
- Changed namespace to `League\HTMLToMarkdown`
- Changed packagist name to `league/html-to-markdown`
- Re-organized code into several separate classes
- `<a>` tags with identical href and inner text are now rendered using angular bracket syntax (#31)
- `<div>` elements are now treated as block-level elements (#33)
## [2.2.2]
### Added
- Added support for PHP 5.6 and HHVM
- Enabled testing against PHP 7 nightlies
- Added this CHANGELOG.md
### Fixed
- Fixed whitespace preservation between inline elements (#9 and #10)
## [2.2.1]
### Fixed
- Preserve placeholder links (#22)
## [2.2.0]
### Added
- Added CircleCI config
### Changed
- `<pre>` blocks are now treated as code elements
### Removed
- Dropped support for PHP 5.2
- Removed incorrect README comment regarding `#text` nodes (#17)
## [2.1.2]
### Added
- Added the ability to blacklist/remove specific node types (#11)
### Changed
- Line breaks are now placed after divs instead of before them
- Newlines inside of link texts are now removed
- Updated the minimum PHPUnit version to 4.*
## [2.1.1]
### Added
- Added options to customize emphasis characters
## [2.1.0]
### Added
- Added option to strip HTML tags without Markdown equivalents
- Added `convert()` method for converter reuse
- Added ability to set options after instance construction
- Documented the required PHP extensions (#4)
### Changed
- ATX style now used for h1 and h2 tags inside blockquotes
### Fixed
- Newlines inside blockquotes are now started with a bracket
- Fixed some incorrect docblocks
- `__toString()` now returns an empty string if input is empty
- Convert head tag if body tag is empty (#7)
- Preserve special characters inside tags without md equivalents (#6)
## [2.0.1]
### Fixed
- Fixed first line indentation for multi-line code blocks
- Fixed consecutive anchors get separating spaces stripped (#3)
## [2.0.0]
### Added
- Initial release
[unreleased]: https://github.com/thephpleague/html-to-markdown/compare/5.1.0...master
[5.1.0]: https://github.com/thephpleague/html-to-markdown/compare/5.0.2...5.1.0
[5.0.2]: https://github.com/thephpleague/html-to-markdown/compare/5.0.1...5.0.2
[5.0.1]: https://github.com/thephpleague/html-to-markdown/compare/5.0.0...5.0.1
[5.0.0]: https://github.com/thephpleague/html-to-markdown/compare/4.10.0...5.0.0
[4.10.0]: https://github.com/thephpleague/html-to-markdown/compare/4.9.1...4.10.0
[4.9.1]: https://github.com/thephpleague/html-to-markdown/compare/4.9.0...4.9.1
[4.9.0]: https://github.com/thephpleague/html-to-markdown/compare/4.8.3...4.9.0
[4.8.3]: https://github.com/thephpleague/html-to-markdown/compare/4.8.2...4.8.3
[4.8.2]: https://github.com/thephpleague/html-to-markdown/compare/4.8.1...4.8.2
[4.8.1]: https://github.com/thephpleague/html-to-markdown/compare/4.8.0...4.8.1
[4.8.0]: https://github.com/thephpleague/html-to-markdown/compare/4.7.0...4.8.0
[4.7.0]: https://github.com/thephpleague/html-to-markdown/compare/4.6.2...4.7.0
[4.6.2]: https://github.com/thephpleague/html-to-markdown/compare/4.6.1...4.6.2
[4.6.1]: https://github.com/thephpleague/html-to-markdown/compare/4.6.0...4.6.1
[4.6.0]: https://github.com/thephpleague/html-to-markdown/compare/4.5.0...4.6.0
[4.5.0]: https://github.com/thephpleague/html-to-markdown/compare/4.4.1...4.5.0
[4.4.1]: https://github.com/thephpleague/html-to-markdown/compare/4.4.0...4.4.1
[4.4.0]: https://github.com/thephpleague/html-to-markdown/compare/4.3.1...4.4.0
[4.3.1]: https://github.com/thephpleague/html-to-markdown/compare/4.3.0...4.3.1
[4.3.0]: https://github.com/thephpleague/html-to-markdown/compare/4.2.2...4.3.0
[4.2.2]: https://github.com/thephpleague/html-to-markdown/compare/4.2.1...4.2.2
[4.2.1]: https://github.com/thephpleague/html-to-markdown/compare/4.2.0...4.2.1
[4.2.0]: https://github.com/thephpleague/html-to-markdown/compare/4.1.1...4.2.0
[4.1.1]: https://github.com/thephpleague/html-to-markdown/compare/4.1.0...4.1.1
[4.1.0]: https://github.com/thephpleague/html-to-markdown/compare/4.0.1...4.1.0
[4.0.1]: https://github.com/thephpleague/html-to-markdown/compare/4.0.0...4.0.1
[4.0.0]: https://github.com/thephpleague/html-to-markdown/compare/3.1.1...4.0.0
[3.1.1]: https://github.com/thephpleague/html-to-markdown/compare/3.1.0...3.1.1
[3.1.0]: https://github.com/thephpleague/html-to-markdown/compare/3.0.0...3.1.0
[3.0.0]: https://github.com/thephpleague/html-to-markdown/compare/2.2.2...3.0.0
[2.2.2]: https://github.com/thephpleague/html-to-markdown/compare/2.2.1...2.2.2
[2.2.1]: https://github.com/thephpleague/html-to-markdown/compare/2.2.0...2.2.1
[2.2.0]: https://github.com/thephpleague/html-to-markdown/compare/2.1.2...2.2.0
[2.1.2]: https://github.com/thephpleague/html-to-markdown/compare/2.1.1...2.1.2
[2.1.1]: https://github.com/thephpleague/html-to-markdown/compare/2.1.0...2.1.1
[2.1.0]: https://github.com/thephpleague/html-to-markdown/compare/2.0.1...2.1.0
[2.0.1]: https://github.com/thephpleague/html-to-markdown/compare/2.0.0...2.0.1
[2.0.0]: https://github.com/thephpleague/html-to-markdown/compare/775f91e...2.0.0

View file

@ -0,0 +1,22 @@
# Contributor Code of Conduct
As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, or nationality.
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery
* Personal attacks
* Trolling or insulting/derogatory comments
* Public or private harassment
* Publishing other's private information, such as physical or electronic addresses, without explicit permission
* Other unethical or unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. By adopting this Code of Conduct, project maintainers commit themselves to fairly and consistently applying these principles to every aspect of managing this project. Project maintainers who do not follow or enforce the Code of Conduct may be permanently removed from the project team.
This code of conduct applies both within project spaces and in public spaces when an individual is representing the project or its community.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
This Code of Conduct is adapted from the [Contributor Covenant](http://contributor-covenant.org), version 1.2.0, available at [http://contributor-covenant.org/version/1/2/0/](http://contributor-covenant.org/version/1/2/0/)

View file

@ -0,0 +1,20 @@
The MIT License (MIT)
Copyright (c) 2015 Colin O'Dell; Originally created by Nick Cernis
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View file

@ -0,0 +1,242 @@
HTML To Markdown for PHP
========================
[![Latest Version](https://img.shields.io/packagist/v/league/html-to-markdown.svg?style=flat-square)](https://packagist.org/packages/league/html-to-markdown)
[![Software License](http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square)](LICENSE)
[![Build Status](https://img.shields.io/github/workflow/status/thephpleague/html-to-markdown/Tests/master.svg?style=flat-square)](https://github.com/thephpleague/html-to-markdown/actions?query=workflow%3ATests+branch%3Amaster)
[![Coverage Status](https://img.shields.io/scrutinizer/coverage/g/thephpleague/html-to-markdown.svg?style=flat-square)](https://scrutinizer-ci.com/g/thephpleague/html-to-markdown/code-structure)
[![Quality Score](https://img.shields.io/scrutinizer/g/thephpleague/html-to-markdown.svg?style=flat-square)](https://scrutinizer-ci.com/g/thephpleague/html-to-markdown)
[![Total Downloads](https://img.shields.io/packagist/dt/league/html-to-markdown.svg?style=flat-square)](https://packagist.org/packages/league/html-to-markdown)
Library which converts HTML to [Markdown](http://daringfireball.net/projects/markdown/) for your sanity and convenience.
**Requires**: PHP 7.2+
**Lead Developer**: [@colinodell](http://twitter.com/colinodell)
**Original Author**: [@nickcernis](http://twitter.com/nickcernis)
### Why convert HTML to Markdown?
*"What alchemy is this?"* you mutter. *"I can see why you'd convert [Markdown to HTML](https://github.com/thephpleague/commonmark),"* you continue, already labouring the question somewhat, *"but why go the other way?"*
Typically you would convert HTML to Markdown if:
1. You have an existing HTML document that needs to be edited by people with good taste.
2. You want to store new content in HTML format but edit it as Markdown.
3. You want to convert HTML email to plain text email.
4. You know a guy who's been converting HTML to Markdown for years, and now he can speak Elvish. You'd quite like to be able to speak Elvish.
5. You just really like Markdown.
### How to use it
Require the library by issuing this command:
```bash
composer require league/html-to-markdown
```
Add `require 'vendor/autoload.php';` to the top of your script.
Next, create a new HtmlConverter instance, passing in your valid HTML code to its `convert()` function:
```php
use League\HTMLToMarkdown\HtmlConverter;
$converter = new HtmlConverter();
$html = "<h3>Quick, to the Batpoles!</h3>";
$markdown = $converter->convert($html);
```
The `$markdown` variable now contains the Markdown version of your HTML as a string:
```php
echo $markdown; // ==> ### Quick, to the Batpoles!
```
The included `demo` directory contains an HTML->Markdown conversion form to try out.
### Conversion options
By default, HTML To Markdown preserves HTML tags without Markdown equivalents, like `<span>` and `<div>`.
To strip HTML tags that don't have a Markdown equivalent while preserving the content inside them, set `strip_tags` to true, like this:
```php
$converter = new HtmlConverter(array('strip_tags' => true));
$html = '<span>Turnips!</span>';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!"
```
Or more explicitly, like this:
```php
$converter = new HtmlConverter();
$converter->getConfig()->setOption('strip_tags', true);
$html = '<span>Turnips!</span>';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!"
```
Note that only the tags themselves are stripped, not the content they hold.
To strip tags and their content, pass a space-separated list of tags in `remove_nodes`, like this:
```php
$converter = new HtmlConverter(array('remove_nodes' => 'span div'));
$html = '<span>Turnips!</span><div>Monkeys!</div>';
$markdown = $converter->convert($html); // $markdown now contains ""
```
By default, all comments are stripped from the content. To preserve them, use the `preserve_comments` option, like this:
```php
$converter = new HtmlConverter(array('preserve_comments' => true));
$html = '<span>Turnips!</span><!-- Monkeys! -->';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!<!-- Monkeys! -->"
```
To preserve only specific comments, set `preserve_comments` with an array of strings, like this:
```php
$converter = new HtmlConverter(array('preserve_comments' => array('Eggs!')));
$html = '<span>Turnips!</span><!-- Monkeys! --><!-- Eggs! -->';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!<!-- Eggs! -->"
```
By default, placeholder links are preserved. To strip the placeholder links, use the `strip_placeholder_links` option, like this:
```php
$converter = new HtmlConverter(array('strip_placeholder_links' => true));
$html = '<a>Github</a>';
$markdown = $converter->convert($html); // $markdown now contains "Github"
```
### Style options
By default bold tags are converted using the asterisk syntax, and italic tags are converted using the underlined syntax. Change these by using the `bold_style` and `italic_style` options.
```php
$converter = new HtmlConverter();
$converter->getConfig()->setOption('italic_style', '*');
$converter->getConfig()->setOption('bold_style', '__');
$html = '<em>Italic</em> and a <strong>bold</strong>';
$markdown = $converter->convert($html); // $markdown now contains "*Italic* and a __bold__"
```
### Line break options
By default, `br` tags are converted to two spaces followed by a newline character as per [traditional Markdown](https://daringfireball.net/projects/markdown/syntax#p). Set `hard_break` to `true` to omit the two spaces, as per GitHub Flavored Markdown (GFM).
```php
$converter = new HtmlConverter();
$html = '<p>test<br>line break</p>';
$converter->getConfig()->setOption('hard_break', true);
$markdown = $converter->convert($html); // $markdown now contains "test\nline break"
$converter->getConfig()->setOption('hard_break', false); // default
$markdown = $converter->convert($html); // $markdown now contains "test \nline break"
```
### Autolinking options
By default, `a` tags are converted to the easiest possible link syntax, i.e. if no text or title is available, then the `<url>` syntax will be used rather than the full `[url](url)` syntax. Set `use_autolinks` to `false` to change this behavior to always use the full link syntax.
```php
$converter = new HtmlConverter();
$html = '<p><a href="https://thephpleague.com">https://thephpleague.com</a></p>';
$converter->getConfig()->setOption('use_autolinks', true);
$markdown = $converter->convert($html); // $markdown now contains "<https://thephpleague.com>"
$converter->getConfig()->setOption('use_autolinks', false); // default
$markdown = $converter->convert($html); // $markdown now contains "[https://google.com](https://google.com)"
```
### Passing custom Environment object
You can pass current `Environment` object to customize i.e. which converters should be used.
```php
$environment = new Environment(array(
// your configuration here
));
$environment->addConverter(new HeaderConverter()); // optionally - add converter manually
$converter = new HtmlConverter($environment);
$html = '<h3>Header</h3>
<img src="" />
';
$markdown = $converter->convert($html); // $markdown now contains "### Header" and "<img src="" />"
```
### Table support
Support for Markdown tables is not enabled by default because it is not part of the original Markdown syntax. To use tables add the converter explicitly:
```php
use League\HTMLToMarkdown\HtmlConverter;
use League\HTMLToMarkdown\Converter\TableConverter;
$converter = new HtmlConverter();
$converter->getEnvironment()->addConverter(new TableConverter());
$html = "<table><tr><th>A</th></tr><tr><td>a</td></tr></table>";
$markdown = $converter->convert($html);
```
### Limitations
- Markdown Extra, MultiMarkdown and other variants aren't supported just Markdown.
### Style notes
- Setext (underlined) headers are the default for H1 and H2. If you prefer the ATX style for H1 and H2 (# Header 1 and ## Header 2), set `header_style` to 'atx' in the options array when you instantiate the object:
`$converter = new HtmlConverter(array('header_style'=>'atx'));`
Headers of H3 priority and lower always use atx style.
- Links and images are referenced inline. Footnote references (where image src and anchor href attributes are listed in the footnotes) are not used.
- Blockquotes aren't line wrapped it makes the converted Markdown easier to edit.
### Dependencies
HTML To Markdown requires PHP's [xml](http://www.php.net/manual/en/xml.installation.php), [lib-xml](http://www.php.net/manual/en/libxml.installation.php), and [dom](http://www.php.net/manual/en/dom.installation.php) extensions, all of which are enabled by default on most distributions.
Errors such as "Fatal error: Class 'DOMDocument' not found" on distributions such as CentOS that disable PHP's xml extension can be resolved by installing php-xml.
### Contributors
Many thanks to all [contributors](https://github.com/thephpleague/html-to-markdown/graphs/contributors) so far. Further improvements and feature suggestions are very welcome.
### How it works
HTML To Markdown creates a DOMDocument from the supplied HTML, walks through the tree, and converts each node to a text node containing the equivalent markdown, starting from the most deeply nested node and working inwards towards the root node.
### To-do
- Support for nested lists and lists inside blockquotes.
- Offer an option to preserve tags as HTML if they contain attributes that can't be represented with Markdown (e.g. `style`).
### Trying to convert Markdown to HTML?
Use one of these great libraries:
- [league/commonmark](https://github.com/thephpleague/commonmark) (recommended)
- [cebe/markdown](https://github.com/cebe/markdown)
- [PHP Markdown](https://michelf.ca/projects/php-markdown/)
- [Parsedown](https://github.com/erusev/parsedown)
No guarantees about the Elvish, though.

View file

@ -0,0 +1,108 @@
#!/usr/bin/env php
<?php
requireAutoloader();
ini_set('display_errors', 'stderr');
foreach ($argv as $i => $arg) {
if ($i === 0) {
continue;
}
if (substr($arg, 0, 1) === '-') {
switch ($arg) {
case '-h':
case '--help':
echo getHelpText();
exit(0);
default:
fail('Unknown option: ' . $arg);
}
} else {
$src = $argv[1];
}
}
if (isset($src)) {
if (!file_exists($src)) {
fail('File not found: ' . $src);
}
$html = file_get_contents($src);
} else {
$stdin = fopen('php://stdin', 'r');
stream_set_blocking($stdin, false);
$html = stream_get_contents($stdin);
fclose($stdin);
if (empty($html)) {
fail(getHelpText());
}
}
$converter = new League\HTMLToMarkdown\HtmlConverter();
echo $converter->convert($html);
/**
* Get help and usage info
*
* @return string
*/
function getHelpText()
{
return <<<HELP
HTML To Markdown
Usage: html-to-markdown [OPTIONS] [FILE]
-h, --help Shows help and usage information
If no file is given, input will be read from STDIN
Examples:
Converting a file named document.html:
html-to-markdown document.html
Converting a file and saving its output:
html-to-markdown document.html > output.md
Converting from STDIN:
echo -e '<h1>Hello World!</h1>' | html-to-markdown
Converting from STDIN and saving the output:
echo -e '<h1>Hello World!</h1>' | html-to-markdown > output.md
HELP;
}
/**
* @param string $message Error message
*/
function fail($message)
{
fwrite(STDERR, $message . "\n");
exit(1);
}
function requireAutoloader()
{
$autoloadPaths = array(
// Local package usage
__DIR__ . '/../vendor/autoload.php',
// Package was included as a library
__DIR__ . '/../../../autoload.php',
);
foreach ($autoloadPaths as $path) {
if (file_exists($path)) {
require_once $path;
break;
}
}
}

View file

@ -0,0 +1,56 @@
{
"name": "league/html-to-markdown",
"type": "library",
"description": "An HTML-to-markdown conversion helper for PHP",
"keywords": ["markdown", "html"],
"homepage": "https://github.com/thephpleague/html-to-markdown",
"license": "MIT",
"authors": [
{
"name": "Colin O'Dell",
"email": "colinodell@gmail.com",
"homepage": "https://www.colinodell.com",
"role": "Lead Developer"
},
{
"name": "Nick Cernis",
"email": "nick@cern.is",
"homepage": "http://modernnerd.net",
"role": "Original Author"
}
],
"autoload": {
"psr-4": {
"League\\HTMLToMarkdown\\": "src/"
}
},
"autoload-dev": {
"psr-4": {
"League\\HTMLToMarkdown\\Test\\": "tests"
}
},
"require": {
"php": "^7.2.5 || ^8.0",
"ext-dom": "*",
"ext-xml": "*"
},
"require-dev": {
"mikehaertl/php-shellcommand": "^1.1.0",
"phpstan/phpstan": "^0.12.99",
"phpunit/phpunit": "^8.5 || ^9.2",
"scrutinizer/ocular": "^1.6",
"unleashedtech/php-coding-standard": "^2.7",
"vimeo/psalm": "^4.22"
},
"bin": ["bin/html-to-markdown"],
"extra": {
"branch-alias": {
"dev-master": "5.2-dev"
}
},
"config": {
"allow-plugins": {
"dealerdirect/phpcodesniffer-composer-installer": true
}
}
}

View file

@ -0,0 +1,27 @@
<?xml version="1.0"?>
<ruleset>
<arg name="basepath" value="."/>
<arg name="extensions" value="php"/>
<arg name="parallel" value="80"/>
<arg name="cache" value=".phpcs-cache"/>
<arg name="colors"/>
<!-- Ignore warnings, show progress of the run and show sniff names -->
<arg value="nps"/>
<!-- Directories to be checked -->
<file>src</file>
<file>tests</file>
<!-- Include full Unleashed Coding Standard -->
<rule ref="Unleashed"/>
<rule ref="SlevomatCodingStandard.Commenting.ForbiddenAnnotations.AnnotationForbidden">
<exclude-pattern>src/HtmlConverter*\.php</exclude-pattern>
</rule>
<rule ref="SlevomatCodingStandard.Commenting.DocCommentSpacing.IncorrectOrderOfAnnotationsGroup">
<exclude-pattern>src/HtmlConverter*\.php</exclude-pattern>
</rule>
</ruleset>

View file

@ -0,0 +1,4 @@
parameters:
level: max
paths:
- src

View file

@ -0,0 +1,15 @@
<?xml version="1.0"?>
<psalm
errorLevel="3"
resolveFromConfigFile="true"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="https://getpsalm.org/schema/config"
xsi:schemaLocation="https://getpsalm.org/schema/config vendor/vimeo/psalm/config.xsd"
>
<projectFiles>
<directory name="src" />
<ignoreFiles>
<directory name="vendor" />
</ignoreFiles>
</projectFiles>
</psalm>

View file

@ -0,0 +1,80 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown;
class Configuration
{
/** @var array<string, mixed> */
protected $config;
/**
* @param array<string, mixed> $config
*/
public function __construct(array $config = [])
{
$this->config = $config;
$this->checkForDeprecatedOptions($config);
}
/**
* @param array<string, mixed> $config
*/
public function merge(array $config = []): void
{
$this->checkForDeprecatedOptions($config);
$this->config = \array_replace_recursive($this->config, $config);
}
/**
* @param array<string, mixed> $config
*/
public function replace(array $config = []): void
{
$this->checkForDeprecatedOptions($config);
$this->config = $config;
}
/**
* @param mixed $value
*/
public function setOption(string $key, $value): void
{
$this->checkForDeprecatedOptions([$key => $value]);
$this->config[$key] = $value;
}
/**
* @param mixed|null $default
*
* @return mixed|null
*/
public function getOption(?string $key = null, $default = null)
{
if ($key === null) {
return $this->config;
}
if (! isset($this->config[$key])) {
return $default;
}
return $this->config[$key];
}
/**
* @param array<string, mixed> $config
*/
private function checkForDeprecatedOptions(array $config): void
{
foreach ($config as $key => $value) {
if ($key === 'bold_style' && $value !== '**') {
@\trigger_error('Customizing the bold_style option is deprecated and may be removed in the next major version', E_USER_DEPRECATED);
} elseif ($key === 'italic_style' && $value !== '*') {
@\trigger_error('Customizing the italic_style option is deprecated and may be removed in the next major version', E_USER_DEPRECATED);
}
}
}
}

View file

@ -0,0 +1,10 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown;
interface ConfigurationAwareInterface
{
public function setConfig(Configuration $config): void;
}

View file

@ -0,0 +1,42 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
class BlockquoteConverter implements ConverterInterface
{
public function convert(ElementInterface $element): string
{
// Contents should have already been converted to Markdown by this point,
// so we just need to add '>' symbols to each line.
$markdown = '';
$quoteContent = \trim($element->getValue());
$lines = \preg_split('/\r\n|\r|\n/', $quoteContent);
\assert(\is_array($lines));
$totalLines = \count($lines);
foreach ($lines as $i => $line) {
$markdown .= '> ' . $line . "\n";
if ($i + 1 === $totalLines) {
$markdown .= "\n";
}
}
return $markdown;
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['blockquote'];
}
}

View file

@ -0,0 +1,68 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
class CodeConverter implements ConverterInterface
{
public function convert(ElementInterface $element): string
{
$language = '';
// Checking for language class on the code block
$classes = $element->getAttribute('class');
if ($classes) {
// Since tags can have more than one class, we need to find the one that starts with 'language-'
$classes = \explode(' ', $classes);
foreach ($classes as $class) {
if (\strpos($class, 'language-') !== false) {
// Found one, save it as the selected language and stop looping over the classes.
$language = \str_replace('language-', '', $class);
break;
}
}
}
$markdown = '';
$code = \html_entity_decode($element->getChildrenAsString());
// In order to remove the code tags we need to search for them and, in the case of the opening tag
// use a regular expression to find the tag and the other attributes it might have
$code = \preg_replace('/<code\b[^>]*>/', '', $code);
\assert($code !== null);
$code = \str_replace('</code>', '', $code);
// Checking if it's a code block or span
if ($this->shouldBeBlock($element, $code)) {
// Code block detected, newlines will be added in parent
$markdown .= '```' . $language . "\n" . $code . "\n" . '```';
} else {
// One line of code, wrapping it on one backtick, removing new lines
$markdown .= '`' . \preg_replace('/\r\n|\r|\n/', '', $code) . '`';
}
return $markdown;
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['code'];
}
private function shouldBeBlock(ElementInterface $element, string $code): bool
{
$parent = $element->getParent();
if ($parent !== null && $parent->getTagName() === 'pre') {
return true;
}
return \preg_match('/[^\s]` `/', $code) === 1;
}
}

View file

@ -0,0 +1,53 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
class CommentConverter implements ConverterInterface, ConfigurationAwareInterface
{
/** @var Configuration */
protected $config;
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
public function convert(ElementInterface $element): string
{
if ($this->shouldPreserve($element)) {
return '<!--' . $element->getValue() . '-->';
}
return '';
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['#comment'];
}
private function shouldPreserve(ElementInterface $element): bool
{
$preserve = $this->config->getOption('preserve_comments');
if ($preserve === true) {
return true;
}
if (\is_array($preserve)) {
$value = \trim($element->getValue());
return \in_array($value, $preserve, true);
}
return false;
}
}

View file

@ -0,0 +1,17 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
interface ConverterInterface
{
public function convert(ElementInterface $element): string;
/**
* @return string[]
*/
public function getSupportedTags(): array;
}

View file

@ -0,0 +1,49 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
class DefaultConverter implements ConverterInterface, ConfigurationAwareInterface
{
public const DEFAULT_CONVERTER = '_default';
/** @var Configuration */
protected $config;
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
public function convert(ElementInterface $element): string
{
// If strip_tags is false (the default), preserve tags that don't have Markdown equivalents,
// such as <span> nodes on their own. C14N() canonicalizes the node to a string.
// See: http://www.php.net/manual/en/domnode.c14n.php
if ($this->config->getOption('strip_tags', false)) {
return $element->getValue();
}
$markdown = \html_entity_decode($element->getChildrenAsString());
// Tables are only handled here if TableConverter is not used
if ($element->getTagName() === 'table') {
$markdown .= "\n\n";
}
return $markdown;
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return [self::DEFAULT_CONVERTER];
}
}

View file

@ -0,0 +1,37 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
class DivConverter implements ConverterInterface, ConfigurationAwareInterface
{
/** @var Configuration */
protected $config;
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
public function convert(ElementInterface $element): string
{
if ($this->config->getOption('strip_tags', false)) {
return $element->getValue() . "\n\n";
}
return \html_entity_decode($element->getChildrenAsString());
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['div'];
}
}

View file

@ -0,0 +1,72 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
class EmphasisConverter implements ConverterInterface, ConfigurationAwareInterface
{
/** @var Configuration */
protected $config;
protected function getNormTag(?ElementInterface $element): string
{
if ($element !== null && ! $element->isText()) {
$tag = $element->getTagName();
if ($tag === 'i' || $tag === 'em') {
return 'em';
}
if ($tag === 'b' || $tag === 'strong') {
return 'strong';
}
}
return '';
}
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
public function convert(ElementInterface $element): string
{
$tag = $this->getNormTag($element);
$value = $element->getValue();
if (! \trim($value)) {
return $value;
}
if ($tag === 'em') {
$style = $this->config->getOption('italic_style');
} else {
$style = $this->config->getOption('bold_style');
}
$prefix = \ltrim($value) !== $value ? ' ' : '';
$suffix = \rtrim($value) !== $value ? ' ' : '';
/* If this node is immediately preceded or followed by one of the same type don't emit
* the start or end $style, respectively. This prevents <em>foo</em><em>bar</em> from
* being converted to *foo**bar* which is incorrect. We want *foobar* instead.
*/
$preStyle = $this->getNormTag($element->getPreviousSibling()) === $tag ? '' : $style;
$postStyle = $this->getNormTag($element->getNextSibling()) === $tag ? '' : $style;
return $prefix . $preStyle . \trim($value) . $postStyle . $suffix;
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['em', 'i', 'strong', 'b'];
}
}

View file

@ -0,0 +1,48 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
class HardBreakConverter implements ConverterInterface, ConfigurationAwareInterface
{
/** @var Configuration */
protected $config;
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
public function convert(ElementInterface $element): string
{
$return = $this->config->getOption('hard_break') ? "\n" : " \n";
$next = $element->getNext();
if ($next) {
$nextValue = $next->getValue();
if ($nextValue) {
if (\in_array(\substr($nextValue, 0, 2), ['- ', '* ', '+ '], true)) {
$parent = $element->getParent();
if ($parent && $parent->getTagName() === 'li') {
$return .= '\\';
}
}
}
}
return $return;
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['br'];
}
}

View file

@ -0,0 +1,62 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
class HeaderConverter implements ConverterInterface, ConfigurationAwareInterface
{
public const STYLE_ATX = 'atx';
public const STYLE_SETEXT = 'setext';
/** @var Configuration */
protected $config;
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
public function convert(ElementInterface $element): string
{
$level = (int) \substr($element->getTagName(), 1, 1);
$style = $this->config->getOption('header_style', self::STYLE_SETEXT);
if (\strlen($element->getValue()) === 0) {
return "\n";
}
if (($level === 1 || $level === 2) && ! $element->isDescendantOf('blockquote') && $style === self::STYLE_SETEXT) {
return $this->createSetextHeader($level, $element->getValue());
}
return $this->createAtxHeader($level, $element->getValue());
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['h1', 'h2', 'h3', 'h4', 'h5', 'h6'];
}
private function createSetextHeader(int $level, string $content): string
{
$length = \function_exists('mb_strlen') ? \mb_strlen($content, 'utf-8') : \strlen($content);
$underline = $level === 1 ? '=' : '-';
return $content . "\n" . \str_repeat($underline, $length) . "\n\n";
}
private function createAtxHeader(int $level, string $content): string
{
$prefix = \str_repeat('#', $level) . ' ';
return $prefix . $content . "\n\n";
}
}

View file

@ -0,0 +1,23 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
class HorizontalRuleConverter implements ConverterInterface
{
public function convert(ElementInterface $element): string
{
return "---\n\n";
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['hr'];
}
}

View file

@ -0,0 +1,32 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
class ImageConverter implements ConverterInterface
{
public function convert(ElementInterface $element): string
{
$src = $element->getAttribute('src');
$alt = $element->getAttribute('alt');
$title = $element->getAttribute('title');
if ($title !== '') {
// No newlines added. <img> should be in a block-level element.
return '![' . $alt . '](' . $src . ' "' . $title . '")';
}
return '![' . $alt . '](' . $src . ')';
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['img'];
}
}

View file

@ -0,0 +1,77 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
class LinkConverter implements ConverterInterface, ConfigurationAwareInterface
{
/** @var Configuration */
protected $config;
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
public function convert(ElementInterface $element): string
{
$href = $element->getAttribute('href');
$title = $element->getAttribute('title');
$text = \trim($element->getValue(), "\t\n\r\0\x0B");
if ($title !== '') {
$markdown = '[' . $text . '](' . $href . ' "' . $title . '")';
} elseif ($href === $text && $this->isValidAutolink($href)) {
$markdown = '<' . $href . '>';
} elseif ($href === 'mailto:' . $text && $this->isValidEmail($text)) {
$markdown = '<' . $text . '>';
} else {
if (\stristr($href, ' ')) {
$href = '<' . $href . '>';
}
$markdown = '[' . $text . '](' . $href . ')';
}
if (! $href) {
if ($this->shouldStrip()) {
$markdown = $text;
} else {
$markdown = \html_entity_decode($element->getChildrenAsString());
}
}
return $markdown;
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['a'];
}
private function isValidAutolink(string $href): bool
{
$useAutolinks = $this->config->getOption('use_autolinks');
return $useAutolinks && (\preg_match('/^[A-Za-z][A-Za-z0-9.+-]{1,31}:[^<>\x00-\x20]*/i', $href) === 1);
}
private function isValidEmail(string $email): bool
{
// Email validation is messy business, but this should cover most cases
return \filter_var($email, FILTER_VALIDATE_EMAIL) !== false;
}
private function shouldStrip(): bool
{
return $this->config->getOption('strip_placeholder_links') ?? false;
}
}

View file

@ -0,0 +1,23 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
class ListBlockConverter implements ConverterInterface
{
public function convert(ElementInterface $element): string
{
return $element->getValue() . "\n";
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['ol', 'ul'];
}
}

View file

@ -0,0 +1,70 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
class ListItemConverter implements ConverterInterface, ConfigurationAwareInterface
{
/** @var Configuration */
protected $config;
/** @var string|null */
protected $listItemStyle;
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
public function convert(ElementInterface $element): string
{
// If parent is an ol, use numbers, otherwise, use dashes
$listType = ($parent = $element->getParent()) ? $parent->getTagName() : 'ul';
// Add spaces to start for nested list items
$level = $element->getListItemLevel();
$value = \trim(\implode("\n" . ' ', \explode("\n", \trim($element->getValue()))));
// If list item is the first in a nested list, add a newline before it
$prefix = '';
if ($level > 0 && $element->getSiblingPosition() === 1) {
$prefix = "\n";
}
if ($listType === 'ul') {
$listItemStyle = $this->config->getOption('list_item_style', '-');
$listItemStyleAlternate = $this->config->getOption('list_item_style_alternate');
if (! isset($this->listItemStyle)) {
$this->listItemStyle = $listItemStyleAlternate ?: $listItemStyle;
}
if ($listItemStyleAlternate && $level === 0 && $element->getSiblingPosition() === 1) {
$this->listItemStyle = $this->listItemStyle === $listItemStyle ? $listItemStyleAlternate : $listItemStyle;
}
return $prefix . $this->listItemStyle . ' ' . $value . "\n";
}
if ($listType === 'ol' && ($parent = $element->getParent()) && ($start = \intval($parent->getAttribute('start')))) {
$number = $start + $element->getSiblingPosition() - 1;
} else {
$number = $element->getSiblingPosition();
}
return $prefix . $number . '. ' . $value . "\n";
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['li'];
}
}

View file

@ -0,0 +1,108 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
class ParagraphConverter implements ConverterInterface
{
public function convert(ElementInterface $element): string
{
$value = $element->getValue();
$markdown = '';
$lines = \preg_split('/\r\n|\r|\n/', $value);
\assert($lines !== false);
foreach ($lines as $line) {
/*
* Some special characters need to be escaped based on the position that they appear
* The following function will deal with those special cases.
*/
$markdown .= $this->escapeSpecialCharacters($line);
$markdown .= "\n";
}
return \trim($markdown) !== '' ? \rtrim($markdown) . "\n\n" : '';
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['p'];
}
private function escapeSpecialCharacters(string $line): string
{
$line = $this->escapeFirstCharacters($line);
$line = $this->escapeOtherCharacters($line);
$line = $this->escapeOtherCharactersRegex($line);
return $line;
}
private function escapeFirstCharacters(string $line): string
{
$escapable = [
'>',
'- ',
'+ ',
'--',
'~~~',
'---',
'- - -',
];
foreach ($escapable as $i) {
if (\strpos(\ltrim($line), $i) === 0) {
// Found a character that must be escaped, adding a backslash before
return '\\' . \ltrim($line);
}
}
return $line;
}
private function escapeOtherCharacters(string $line): string
{
$escapable = [
'<!--',
];
foreach ($escapable as $i) {
if (($pos = \strpos($line, $i)) === false) {
continue;
}
// Found an escapable character, escaping it
$line = \substr_replace($line, '\\', $pos, 0);
}
return $line;
}
private function escapeOtherCharactersRegex(string $line): string
{
$regExs = [
// Match numbers ending on ')' or '.' that are at the beginning of the line.
// They will be escaped if immediately followed by a space or newline.
'/^[0-9]+(?=(\)|\.)( |$))/',
];
foreach ($regExs as $i) {
if (! \preg_match($i, $line, $match)) {
continue;
}
// Matched an escapable character, adding a backslash on the string before the offending character
$line = \substr_replace($line, '\\', \strlen($match[0]), 0);
}
return $line;
}
}

View file

@ -0,0 +1,56 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
class PreformattedConverter implements ConverterInterface
{
public function convert(ElementInterface $element): string
{
$preContent = \html_entity_decode($element->getChildrenAsString());
$preContent = \str_replace(['<pre>', '</pre>'], '', $preContent);
/*
* Checking for the code tag.
* Usually pre tags are used along with code tags. This conditional will check for already converted code tags,
* which use backticks, and if those backticks are at the beginning and at the end of the string it means
* there's no more information to convert.
*/
$firstBacktick = \strpos(\trim($preContent), '`');
$lastBacktick = \strrpos(\trim($preContent), '`');
if ($firstBacktick === 0 && $lastBacktick === \strlen(\trim($preContent)) - 1) {
return $preContent . "\n\n";
}
// If the execution reaches this point it means it's just a pre tag, with no code tag nested
// Empty lines are a special case
if ($preContent === '') {
return "```\n```\n\n";
}
// Normalizing new lines
$preContent = \preg_replace('/\r\n|\r|\n/', "\n", $preContent);
\assert(\is_string($preContent));
// Ensure there's a newline at the end
if (\strrpos($preContent, "\n") !== \strlen($preContent) - \strlen("\n")) {
$preContent .= "\n";
}
// Use three backticks
return "```\n" . $preContent . "```\n\n";
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['pre'];
}
}

View file

@ -0,0 +1,113 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\Configuration;
use League\HTMLToMarkdown\ConfigurationAwareInterface;
use League\HTMLToMarkdown\ElementInterface;
use League\HTMLToMarkdown\PreConverterInterface;
class TableConverter implements ConverterInterface, PreConverterInterface, ConfigurationAwareInterface
{
/** @var Configuration */
protected $config;
public function setConfig(Configuration $config): void
{
$this->config = $config;
}
/** @var array<string, string> */
private static $alignments = [
'left' => ':--',
'right' => '--:',
'center' => ':-:',
];
/** @var array<int, string>|null */
private $columnAlignments = [];
/** @var string|null */
private $caption = null;
public function preConvert(ElementInterface $element): void
{
$tag = $element->getTagName();
// Only table cells and caption are allowed to contain content.
// Remove all text between other table elements.
if ($tag === 'th' || $tag === 'td' || $tag === 'caption') {
return;
}
foreach ($element->getChildren() as $child) {
if ($child->isText()) {
$child->setFinalMarkdown('');
}
}
}
public function convert(ElementInterface $element): string
{
$value = $element->getValue();
switch ($element->getTagName()) {
case 'table':
$this->columnAlignments = [];
if ($this->caption) {
$side = $this->config->getOption('table_caption_side');
if ($side === 'top') {
$value = $this->caption . "\n" . $value;
} elseif ($side === 'bottom') {
$value .= $this->caption;
}
$this->caption = null;
}
return $value . "\n";
case 'caption':
$this->caption = \trim($value);
return '';
case 'tr':
$value .= "|\n";
if ($this->columnAlignments !== null) {
$value .= '|' . \implode('|', $this->columnAlignments) . "|\n";
$this->columnAlignments = null;
}
return $value;
case 'th':
case 'td':
if ($this->columnAlignments !== null) {
$align = $element->getAttribute('align');
$this->columnAlignments[] = self::$alignments[$align] ?? '---';
}
$value = \str_replace("\n", ' ', $value);
$value = \str_replace('|', $this->config->getOption('table_pipe_escape') ?? '\|', $value);
return '| ' . \trim($value) . ' ';
case 'thead':
case 'tbody':
case 'tfoot':
case 'colgroup':
case 'col':
return $value;
default:
return '';
}
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['table', 'tr', 'th', 'td', 'thead', 'tbody', 'tfoot', 'colgroup', 'col', 'caption'];
}
}

View file

@ -0,0 +1,48 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown\Converter;
use League\HTMLToMarkdown\ElementInterface;
class TextConverter implements ConverterInterface
{
public function convert(ElementInterface $element): string
{
$markdown = $element->getValue();
// Remove leftover \n at the beginning of the line
$markdown = \ltrim($markdown, "\n");
// Replace sequences of invisible characters with spaces
$markdown = \preg_replace('~\s+~u', ' ', $markdown);
\assert(\is_string($markdown));
// Escape the following characters: '*', '_', '[', ']' and '\'
if (($parent = $element->getParent()) && $parent->getTagName() !== 'div') {
$markdown = \preg_replace('~([*_\\[\\]\\\\])~u', '\\\\$1', $markdown);
\assert(\is_string($markdown));
}
$markdown = \preg_replace('~^#~u', '\\\\#', $markdown);
\assert(\is_string($markdown));
if ($markdown === ' ') {
$next = $element->getNext();
if (! $next || $next->isBlock()) {
$markdown = '';
}
}
return \htmlspecialchars($markdown, ENT_NOQUOTES, 'UTF-8');
}
/**
* @return string[]
*/
public function getSupportedTags(): array
{
return ['#text'];
}
}

View file

@ -0,0 +1,235 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown;
class Element implements ElementInterface
{
/** @var \DOMNode */
protected $node;
/** @var ElementInterface|null */
private $nextCached;
/** @var \DOMNode|null */
private $previousSiblingCached;
public function __construct(\DOMNode $node)
{
$this->node = $node;
$this->previousSiblingCached = $this->node->previousSibling;
}
public function isBlock(): bool
{
switch ($this->getTagName()) {
case 'blockquote':
case 'body':
case 'div':
case 'h1':
case 'h2':
case 'h3':
case 'h4':
case 'h5':
case 'h6':
case 'hr':
case 'html':
case 'li':
case 'p':
case 'ol':
case 'ul':
return true;
default:
return false;
}
}
public function isText(): bool
{
return $this->getTagName() === '#text';
}
public function isWhitespace(): bool
{
return $this->getTagName() === '#text' && \trim($this->getValue()) === '';
}
public function getTagName(): string
{
return $this->node->nodeName;
}
public function getValue(): string
{
return $this->node->nodeValue ?? '';
}
public function hasParent(): bool
{
return $this->node->parentNode !== null;
}
public function getParent(): ?ElementInterface
{
return $this->node->parentNode ? new self($this->node->parentNode) : null;
}
public function getNextSibling(): ?ElementInterface
{
return $this->node->nextSibling !== null ? new self($this->node->nextSibling) : null;
}
public function getPreviousSibling(): ?ElementInterface
{
return $this->previousSiblingCached !== null ? new self($this->previousSiblingCached) : null;
}
public function hasChildren(): bool
{
return $this->node->hasChildNodes();
}
/**
* @return ElementInterface[]
*/
public function getChildren(): array
{
$ret = [];
foreach ($this->node->childNodes as $node) {
$ret[] = new self($node);
}
return $ret;
}
public function getNext(): ?ElementInterface
{
if ($this->nextCached === null) {
$nextNode = $this->getNextNode($this->node);
if ($nextNode !== null) {
$this->nextCached = new self($nextNode);
}
}
return $this->nextCached;
}
private function getNextNode(\DomNode $node, bool $checkChildren = true): ?\DomNode
{
if ($checkChildren && $node->firstChild) {
return $node->firstChild;
}
if ($node->nextSibling) {
return $node->nextSibling;
}
if ($node->parentNode) {
return $this->getNextNode($node->parentNode, false);
}
return null;
}
/**
* @param string[]|string $tagNames
*/
public function isDescendantOf($tagNames): bool
{
if (! \is_array($tagNames)) {
$tagNames = [$tagNames];
}
for ($p = $this->node->parentNode; $p !== false; $p = $p->parentNode) {
if ($p === null) {
return false;
}
if (\in_array($p->nodeName, $tagNames, true)) {
return true;
}
}
return false;
}
public function setFinalMarkdown(string $markdown): void
{
if ($this->node->ownerDocument === null) {
throw new \RuntimeException('Unowned node');
}
if ($this->node->parentNode === null) {
throw new \RuntimeException('Cannot setFinalMarkdown() on a node without a parent');
}
$markdownNode = $this->node->ownerDocument->createTextNode($markdown);
$this->node->parentNode->replaceChild($markdownNode, $this->node);
}
public function getChildrenAsString(): string
{
return $this->node->C14N();
}
public function getSiblingPosition(): int
{
$position = 0;
$parent = $this->getParent();
if ($parent === null) {
return $position;
}
// Loop through all nodes and find the given $node
foreach ($parent->getChildren() as $currentNode) {
if (! $currentNode->isWhitespace()) {
$position++;
}
// TODO: Need a less-buggy way of comparing these
// Perhaps we can somehow ensure that we always have the exact same object and use === instead?
if ($this->equals($currentNode)) {
break;
}
}
return $position;
}
public function getListItemLevel(): int
{
$level = 0;
$parent = $this->getParent();
while ($parent !== null && $parent->hasParent()) {
if ($parent->getTagName() === 'li') {
$level++;
}
$parent = $parent->getParent();
}
return $level;
}
public function getAttribute(string $name): string
{
if ($this->node instanceof \DOMElement) {
return $this->node->getAttribute($name);
}
return '';
}
public function equals(ElementInterface $element): bool
{
if ($element instanceof self) {
return $element->node === $this->node;
}
return false;
}
}

View file

@ -0,0 +1,50 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown;
interface ElementInterface
{
public function isBlock(): bool;
public function isText(): bool;
public function isWhitespace(): bool;
public function getTagName(): string;
public function getValue(): string;
public function hasParent(): bool;
public function getParent(): ?ElementInterface;
public function getNextSibling(): ?ElementInterface;
public function getPreviousSibling(): ?ElementInterface;
/**
* @param string|string[] $tagNames
*/
public function isDescendantOf($tagNames): bool;
public function hasChildren(): bool;
/**
* @return ElementInterface[]
*/
public function getChildren(): array;
public function getNext(): ?ElementInterface;
public function getSiblingPosition(): int;
public function getChildrenAsString(): string;
public function setFinalMarkdown(string $markdown): void;
public function getListItemLevel(): int;
public function getAttribute(string $name): string;
}

View file

@ -0,0 +1,92 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown;
use League\HTMLToMarkdown\Converter\BlockquoteConverter;
use League\HTMLToMarkdown\Converter\CodeConverter;
use League\HTMLToMarkdown\Converter\CommentConverter;
use League\HTMLToMarkdown\Converter\ConverterInterface;
use League\HTMLToMarkdown\Converter\DefaultConverter;
use League\HTMLToMarkdown\Converter\DivConverter;
use League\HTMLToMarkdown\Converter\EmphasisConverter;
use League\HTMLToMarkdown\Converter\HardBreakConverter;
use League\HTMLToMarkdown\Converter\HeaderConverter;
use League\HTMLToMarkdown\Converter\HorizontalRuleConverter;
use League\HTMLToMarkdown\Converter\ImageConverter;
use League\HTMLToMarkdown\Converter\LinkConverter;
use League\HTMLToMarkdown\Converter\ListBlockConverter;
use League\HTMLToMarkdown\Converter\ListItemConverter;
use League\HTMLToMarkdown\Converter\ParagraphConverter;
use League\HTMLToMarkdown\Converter\PreformattedConverter;
use League\HTMLToMarkdown\Converter\TextConverter;
final class Environment
{
/** @var Configuration */
protected $config;
/** @var ConverterInterface[] */
protected $converters = [];
/**
* @param array<string, mixed> $config
*/
public function __construct(array $config = [])
{
$this->config = new Configuration($config);
$this->addConverter(new DefaultConverter());
}
public function getConfig(): Configuration
{
return $this->config;
}
public function addConverter(ConverterInterface $converter): void
{
if ($converter instanceof ConfigurationAwareInterface) {
$converter->setConfig($this->config);
}
foreach ($converter->getSupportedTags() as $tag) {
$this->converters[$tag] = $converter;
}
}
public function getConverterByTag(string $tag): ConverterInterface
{
if (isset($this->converters[$tag])) {
return $this->converters[$tag];
}
return $this->converters[DefaultConverter::DEFAULT_CONVERTER];
}
/**
* @param array<string, mixed> $config
*/
public static function createDefaultEnvironment(array $config = []): Environment
{
$environment = new static($config);
$environment->addConverter(new BlockquoteConverter());
$environment->addConverter(new CodeConverter());
$environment->addConverter(new CommentConverter());
$environment->addConverter(new DivConverter());
$environment->addConverter(new EmphasisConverter());
$environment->addConverter(new HardBreakConverter());
$environment->addConverter(new HeaderConverter());
$environment->addConverter(new HorizontalRuleConverter());
$environment->addConverter(new ImageConverter());
$environment->addConverter(new LinkConverter());
$environment->addConverter(new ListBlockConverter());
$environment->addConverter(new ListItemConverter());
$environment->addConverter(new ParagraphConverter());
$environment->addConverter(new PreformattedConverter());
$environment->addConverter(new TextConverter());
return $environment;
}
}

View file

@ -0,0 +1,277 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown;
/**
* A helper class to convert HTML to Markdown.
*
* @author Colin O'Dell <colinodell@gmail.com>
* @author Nick Cernis <nick@cern.is>
*
* @link https://github.com/thephpleague/html-to-markdown/ Latest version on GitHub.
*
* @license http://www.opensource.org/licenses/mit-license.php MIT
*/
class HtmlConverter implements HtmlConverterInterface
{
/** @var Environment */
protected $environment;
/**
* Constructor
*
* @param Environment|array<string, mixed> $options Environment object or configuration options
*/
public function __construct($options = [])
{
if ($options instanceof Environment) {
$this->environment = $options;
} elseif (\is_array($options)) {
$defaults = [
'header_style' => 'setext', // Set to 'atx' to output H1 and H2 headers as # Header1 and ## Header2
'suppress_errors' => true, // Set to false to show warnings when loading malformed HTML
'strip_tags' => false, // Set to true to strip tags that don't have markdown equivalents. N.B. Strips tags, not their content. Useful to clean MS Word HTML output.
'strip_placeholder_links' => false, // Set to true to remove <a> that doesn't have href.
'bold_style' => '**', // DEPRECATED: Set to '__' if you prefer the underlined style
'italic_style' => '*', // DEPRECATED: Set to '_' if you prefer the underlined style
'remove_nodes' => '', // space-separated list of dom nodes that should be removed. example: 'meta style script'
'hard_break' => false, // Set to true to turn <br> into `\n` instead of ` \n`
'list_item_style' => '-', // Set the default character for each <li> in a <ul>. Can be '-', '*', or '+'
'preserve_comments' => false, // Set to true to preserve comments, or set to an array of strings to preserve specific comments
'use_autolinks' => true, // Set to true to use simple link syntax if possible. Will always use []() if set to false
'table_pipe_escape' => '\|', // Replacement string for pipe characters inside markdown table cells
'table_caption_side' => 'top', // Set to 'top' or 'bottom' to show <caption> content before or after table, null to suppress
];
$this->environment = Environment::createDefaultEnvironment($defaults);
$this->environment->getConfig()->merge($options);
}
}
public function getEnvironment(): Environment
{
return $this->environment;
}
public function getConfig(): Configuration
{
return $this->environment->getConfig();
}
/**
* Convert
*
* @see HtmlConverter::convert
*
* @return string The Markdown version of the html
*/
public function __invoke(string $html): string
{
return $this->convert($html);
}
/**
* Convert
*
* Loads HTML and passes to getMarkdown()
*
* @return string The Markdown version of the html
*
* @throws \InvalidArgumentException|\RuntimeException
*/
public function convert(string $html): string
{
if (\trim($html) === '') {
return '';
}
$document = $this->createDOMDocument($html);
// Work on the entire DOM tree (including head and body)
if (! ($root = $document->getElementsByTagName('html')->item(0))) {
throw new \InvalidArgumentException('Invalid HTML was provided');
}
$rootElement = new Element($root);
$this->convertChildren($rootElement);
// Store the now-modified DOMDocument as a string
$markdown = $document->saveHTML();
if ($markdown === false) {
throw new \RuntimeException('Unknown error occurred during HTML to Markdown conversion');
}
return $this->sanitize($markdown);
}
private function createDOMDocument(string $html): \DOMDocument
{
$document = new \DOMDocument();
if ($this->getConfig()->getOption('suppress_errors')) {
// Suppress conversion errors (from http://bit.ly/pCCRSX)
\libxml_use_internal_errors(true);
}
// Hack to load utf-8 HTML (from http://bit.ly/pVDyCt)
$document->loadHTML('<?xml encoding="UTF-8">' . $html);
$document->encoding = 'UTF-8';
$this->replaceMisplacedComments($document);
if ($this->getConfig()->getOption('suppress_errors')) {
\libxml_clear_errors();
}
return $document;
}
/**
* Finds any comment nodes outside <html> element and moves them into <body>.
*
* @see https://github.com/thephpleague/html-to-markdown/issues/212
* @see https://3v4l.org/7bC33
*/
private function replaceMisplacedComments(\DOMDocument $document): void
{
// Find ny comment nodes at the root of the document.
$misplacedComments = (new \DOMXPath($document))->query('/comment()');
if ($misplacedComments === false) {
return;
}
$body = $document->getElementsByTagName('body')->item(0);
if ($body === null) {
return;
}
// Loop over comment nodes in reverse so we put them inside <body> in
// their original order.
for ($index = $misplacedComments->length - 1; $index >= 0; $index--) {
if ($body->firstChild === null) {
$body->insertBefore($misplacedComments[$index]);
} else {
$body->insertBefore($misplacedComments[$index], $body->firstChild);
}
}
}
/**
* Convert Children
*
* Recursive function to drill into the DOM and convert each node into Markdown from the inside out.
*
* Finds children of each node and convert those to #text nodes containing their Markdown equivalent,
* starting with the innermost element and working up to the outermost element.
*/
private function convertChildren(ElementInterface $element): void
{
// Don't convert HTML code inside <code> and <pre> blocks to Markdown - that should stay as HTML
// except if the current node is a code tag, which needs to be converted by the CodeConverter.
if ($element->isDescendantOf(['pre', 'code']) && $element->getTagName() !== 'code') {
return;
}
// Give converter a chance to inspect/modify the DOM before children are converted
$converter = $this->environment->getConverterByTag($element->getTagName());
if ($converter instanceof PreConverterInterface) {
$converter->preConvert($element);
}
// If the node has children, convert those to Markdown first
if ($element->hasChildren()) {
foreach ($element->getChildren() as $child) {
$this->convertChildren($child);
}
}
// Now that child nodes have been converted, convert the original node
$markdown = $this->convertToMarkdown($element);
// Create a DOM text node containing the Markdown equivalent of the original node
// Replace the old $node e.g. '<h3>Title</h3>' with the new $markdown_node e.g. '### Title'
$element->setFinalMarkdown($markdown);
}
/**
* Convert to Markdown
*
* Converts an individual node into a #text node containing a string of its Markdown equivalent.
*
* Example: An <h3> node with text content of 'Title' becomes a text node with content of '### Title'
*
* @return string The converted HTML as Markdown
*/
protected function convertToMarkdown(ElementInterface $element): string
{
$tag = $element->getTagName();
// Strip nodes named in remove_nodes
$tagsToRemove = \explode(' ', $this->getConfig()->getOption('remove_nodes') ?? '');
if (\in_array($tag, $tagsToRemove, true)) {
return '';
}
$converter = $this->environment->getConverterByTag($tag);
return $converter->convert($element);
}
protected function sanitize(string $markdown): string
{
$markdown = \html_entity_decode($markdown, ENT_QUOTES, 'UTF-8');
$markdown = \preg_replace('/<!DOCTYPE [^>]+>/', '', $markdown); // Strip doctype declaration
\assert($markdown !== null);
$markdown = \trim($markdown); // Remove blank spaces at the beggining of the html
/*
* Removing unwanted tags. Tags should be added to the array in the order they are expected.
* XML, html and body opening tags should be in that order. Same case with closing tags
*/
$unwanted = ['<?xml encoding="UTF-8">', '<html>', '</html>', '<body>', '</body>', '<head>', '</head>', '&#xD;'];
foreach ($unwanted as $tag) {
if (\strpos($tag, '/') === false) {
// Opening tags
if (\strpos($markdown, $tag) === 0) {
$markdown = \substr($markdown, \strlen($tag));
}
} else {
// Closing tags
if (\strpos($markdown, $tag) === \strlen($markdown) - \strlen($tag)) {
$markdown = \substr($markdown, 0, -\strlen($tag));
}
}
}
return \trim($markdown, "\n\r\0\x0B");
}
/**
* Pass a series of key-value pairs in an array; these will be passed
* through the config and set.
* The advantage of this is that it can allow for static use (IE in Laravel).
* An example being:
*
* HtmlConverter::setOptions(['strip_tags' => true])->convert('<h1>test</h1>');
*
* @param array<string, mixed> $options
*
* @return $this
*/
public function setOptions(array $options)
{
$config = $this->getConfig();
foreach ($options as $key => $option) {
$config->setOption($key, $option);
}
return $this;
}
}

View file

@ -0,0 +1,26 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown;
/**
* Interface for an HTML-to-Markdown converter.
*
* @author Colin O'Dell <colinodell@gmail.com>
*
* @link https://github.com/thephpleague/html-to-markdown/ Latest version on GitHub.
*
* @license http://www.opensource.org/licenses/mit-license.php MIT
*/
interface HtmlConverterInterface
{
/**
* Convert the given $html to Markdown
*
* @return string The Markdown version of the html
*
* @throws \InvalidArgumentException
*/
public function convert(string $html): string;
}

View file

@ -0,0 +1,10 @@
<?php
declare(strict_types=1);
namespace League\HTMLToMarkdown;
interface PreConverterInterface
{
public function preConvert(ElementInterface $element): void;
}