python – How can I print the content of each <span> tags that are after every <strong> tag with BeautifulSoup?-ThrowExceptions

Exception or error:

I am trying to scrape the content of both every span tag that follows a strong tag, as well as the strong tag itself. I currently have the strong tag printing, but can’t seem to print the following span tag for each strong tag. Here is my code:

import bs4 as bs
from urllib.request import urlopen, Request
import urllib


    #all strong tags
    strong_tags = soup.find_all('strong')
    for element in strong_tags:
        element.extract()
        print(element.text)

and the output I get:

severity:  
ID: 
File Name: 
Version: 
Family: 
Published: 
Dependencies: 
Risk Factor: 
Required KB Items: 

The content of the span tags should go after each colon, but I can’t get it to. Here is part of the html I am scraping.

<div class="col-md-4 plugin-single__sidebar">
<h4 class="u-m-t-2">Plugin Details</h4>
<div>
    <p>
        <strong>Severity
            <!-- -->: 
        </strong>
        <span>Critical</span>
    </p>
</div>
<div>
    <p>
        <strong>ID
            <!-- -->: 
        </strong>
        <span>14612</span>
    </p>
</div>
<div>
    <p>
        <strong>File Name
            <!-- -->: 
        </strong>
        <span>aix_IY40501.nasl</span>
    </p>
</div>
How to solve:

Try this.

from simplified_scrapy import SimplifiedDoc
html = '''
<div class="col-md-4 plugin-single__sidebar">
<h4 class="u-m-t-2">Plugin Details</h4>
<div>
    <p>
        <strong>Severity
            <!-- -->: 
        </strong>
        <span>Critical</span>
    </p>
</div>
<div>
    <p>
        <strong>ID
            <!-- -->: 
        </strong>
        <span>14612</span>
    </p>
</div>
<div>
    <p>
        <strong>File Name
            <!-- -->: 
        </strong>
        <span>aix_IY40501.nasl</span>
    </p>
</div>
'''
doc = SimplifiedDoc(html)
# First method
spans = doc.selects('strong>next()')
print (spans)
# Second method
strongs = doc.selects('strong')
for strong in strongs:
    span = strong.next
    print (strong.text,span.text)

Result:

[{'tag': 'span', 'html': 'Critical'}, {'tag': 'span', 'html': '14612'}, {'tag': 'span', 'html': 'aix_IY40501.nasl'}]
Severity : Critical
ID : 14612
File Name : aix_IY40501.nasl

Leave a Reply

Your email address will not be published. Required fields are marked *