Python webscraping how to get only the body html-ThrowExceptions

Exception or error:

Hey I am trying to implement a program that can get urls from the html of a website, but I only want the urls from the body. Basically, I want to avoid ads and menus on the website and only get links to the websites that are embedded in the actual article. Does anyone know of a good way of isolating the body html from the rest of the html without hardcoding how the body is designated for each website?

How to solve:

It is a simple process to scrape only specific parts of the html. For the most part you can choose elements from the page you want. Let’s say you only want the <div id="example">example</div> you can specify your scraper to only pick up that div. Please check this example out.

https://realpython.com/beautiful-soup-web-scraper-python/

Leave a Reply

Your email address will not be published. Required fields are marked *