Information for build python-html-text-0.6.2-1.fc41
ID | 343319 | |||||||
---|---|---|---|---|---|---|---|---|
Package Name | python-html-text | |||||||
Version | 0.6.2 | |||||||
Release | 1.fc41 | |||||||
Epoch | ||||||||
Summary | Extract text from HTML | |||||||
Description | How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup? - Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users; - html_text normalizes whitespace, but in a way smarter than .xpath('normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation; - html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers. | |||||||
Built by | davidlt | |||||||
State | complete | |||||||
Volume | DEFAULT | |||||||
Started | Tue, 12 Nov 2024 07:53:13 UTC | |||||||
Completed | Tue, 12 Nov 2024 07:53:13 UTC | |||||||
Tags |
|
|||||||
RPMs |
|
|||||||
Changelog | * Fri Oct 18 2024 Benson Muite <benson_muite@emailplus.org> - 0.6.2-1 - Initial packaging |