Information for RPM python3-html-text-0.6.2-1.fc41.noarch.rpm
ID | 1473644 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Name | python3-html-text | ||||||||
Version | 0.6.2 | ||||||||
Release | 1.fc41 | ||||||||
Epoch | |||||||||
Arch | noarch | ||||||||
Summary | Extract text from HTML | ||||||||
Description | How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup? - Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users; - html_text normalizes whitespace, but in a way smarter than .xpath('normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation; - html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers. | ||||||||
Build Time | 2024-10-25 03:31:48 GMT | ||||||||
Size | 19.28 KB | ||||||||
ecf2c194e8e7aba96fc29a8e664795a9 | |||||||||
License | MIT | ||||||||
Provides |
|
||||||||
Obsoletes | No Obsoletes | ||||||||
Conflicts | No Conflicts | ||||||||
Requires |
|
||||||||
Recommends | No Recommends | ||||||||
Suggests | No Suggests | ||||||||
Supplements | No Supplements | ||||||||
Enhances | No Enhances | ||||||||
Files | |||||||||
Component of | No Buildroots |