parsing - none of the parsers are finding all beautiful soup python -

- January 15, 2013

i trying simple parsing of html file contains unit test results in body

url = urllib2.urlopen('file:/randomstuff/results.txt').read() soup = beautifulsoup(url, 'lxml') save = soup.body.findall(text = re.compile("failed"))

the best can out of 1 instance of text (when there closer 50) lxml , html5lib. other parsers find none. there anyway can work around broken html?

an example of body this

********* finished testing of logleveltypetest *********
********* start testing of apploggerconfigtest *********
config: using qtest library 4.8.1, qt 4.8.1
pass : inittestcase
pass : testsetfromenvironment
pass : cleanuptestcase
totals: 3 passed, 0 failed, 0 skipped

html looks this

<html>    <head></head>    <body>    <pre style="word-wrap: break-word; white-space: pre-wrap;">       "common unit test results"       ...       ...    </pre>  </body>

Search This Blog

ANy

parsing - none of the parsers are finding all beautiful soup python -

Comments

Post a Comment

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -