parsing - none of the parsers are finding all beautiful soup python -


i trying simple parsing of html file contains unit test results in body

url = urllib2.urlopen('file:/randomstuff/results.txt').read() soup = beautifulsoup(url, 'lxml') save = soup.body.findall(text = re.compile("failed")) 

the best can out of 1 instance of text (when there closer 50) lxml , html5lib. other parsers find none. there anyway can work around broken html?

an example of body this

********* finished testing of logleveltypetest *********
********* start testing of apploggerconfigtest *********
config: using qtest library 4.8.1, qt 4.8.1
pass : inittestcase
pass : testsetfromenvironment
pass : cleanuptestcase
totals: 3 passed, 0 failed, 0 skipped

html looks this

<html>    <head></head>    <body>    <pre style="word-wrap: break-word; white-space: pre-wrap;">       "common unit test results"       ...       ...    </pre>  </body> 


Comments

Popular posts from this blog

ios - MKAnnotationView layer is not of expected type: MKLayer -

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -