Information for RPM python-html-text-0.6.2-1.fc41.src.rpm
ID | 1473643 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | python-html-text | |||||||||||||
Version | 0.6.2 | |||||||||||||
Release | 1.fc41 | |||||||||||||
Epoch | ||||||||||||||
Arch | src | |||||||||||||
Summary | Extract text from HTML | |||||||||||||
Description | How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup? - Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users; - html_text normalizes whitespace, but in a way smarter than .xpath('normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation; - html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers. | |||||||||||||
Build Time | 2024-10-25 03:31:48 GMT | |||||||||||||
Size | 73.27 KB | |||||||||||||
a35d76505236a2d99df3c9ce27e91560 | ||||||||||||||
License | MIT | |||||||||||||
Provides |
|
|||||||||||||
Obsoletes | No Obsoletes | |||||||||||||
Conflicts | No Conflicts | |||||||||||||
Requires |
|
|||||||||||||
Recommends | No Recommends | |||||||||||||
Suggests | No Suggests | |||||||||||||
Supplements | No Supplements | |||||||||||||
Enhances | No Enhances | |||||||||||||
Files |
|
|||||||||||||
Component of | No Buildroots |