python 3.x – Wrapping a tag around a string in HTML using BeautifulSoup-ThrowExceptions

Exception or error:

I have a HTML file analogous to as below

<html>
<head>
<title>Sample</title>
<body>
<p> A new model of Apple iPhone is being launched next week. If the model turns out to be a success then the market value of <i>Apple</i> will reach sky-high </p>
</body>
</html>

In the above script, Apple occurs two times as a part of text in the script and I wish to mark the second occurrence as an annotation by placing a tag around the word, provided it is not already enclosed inside a tag.

For example: If the second occurrence is as such

<i><span class="annotate">Apple</span></i>

I would like to ignore the occurrence and do no changes to the script.

I have tried find_all(text="Apple") method in BeautifulSoup, but it returns only if the entire string inside a tag matches the text given. I also tried treating the entire HTML script as a raw string. But I am not able to validate if the word is already enclosed in a span tag.

How to solve:

Below code give you a list of tags that contain Apple

from bs4 import BeautifulSoup
import re

soup = BeautifulSoup("<html><head><title>Sample</title><body><p> A new model of Apple iPhone is being launched next week. If the model turns out to be a success then the market value of <i>Apple</i> will reach sky-high </p><i><span class='annotate'>Apple</span></i></body></html>",features="lxml")

for elem in soup(text=re.compile("Apple")):
    print(elem.parent)

Result :

<p> A new model of Apple iPhone is being launched next week. If the model turns out to be a success then the market value of <i>Apple</i> will reach sky-high </p>
<i>Apple</i>
<span class="annotate">Apple</span>

Now you can take each tag and then check whether its in a enlosed tag or not and then add an enclosed tag if necessary.

Leave a Reply

Your email address will not be published. Required fields are marked *