parsing - none of the parsers are finding all beautiful soup python -
i trying simple parsing of html file contains unit test results in body
url = urllib2.urlopen('file:/randomstuff/results.txt').read() soup = beautifulsoup(url, 'lxml') save = soup.body.findall(text = re.compile("failed"))
the best can out of 1 instance of text (when there closer 50) lxml , html5lib. other parsers find none. there anyway can work around broken html?
an example of body this
********* finished testing of logleveltypetest *********
********* start testing of apploggerconfigtest *********
config: using qtest library 4.8.1, qt 4.8.1
pass : inittestcase
pass : testsetfromenvironment
pass : cleanuptestcase
totals: 3 passed, 0 failed, 0 skipped
html looks this
<html> <head></head> <body> <pre style="word-wrap: break-word; white-space: pre-wrap;"> "common unit test results" ... ... </pre> </body>
Comments
Post a Comment