إيجاد الوسوم tags من خلال فئة CSS باستخدام BeautifulSoup في بايثون

إياد أحمد · 20 نوفمبر 2021

لدي مستند HTML ، واحتاج إلى العثور على tags واستخراجها من المستند باستخدام فئة CSS؟ على سبيل المثال المستند التالي:

HTML Document:
<html>
<head>
    <title> Hsoub Academy </title>
</head>
<body>
    <div class="ext" >Extract this tag</div>
</body>
</html>

كيف نقوم بذلك؟

Ali Haidar Ahmad · 20 نوفمبر 2021

اتبع الخطوات التالية:
أولاً: استيراد مكتبة bs4. ثانياً: الحصول على مستند ال HTML المطلوب. ثالثاً: تحليل المحتوى ووضعه في كائن BeautifulSoup.
رابعاً: البحث حسب فئة CSS : يمكننا القيام بذلك من خلال الوسيط المسمى _class (لا يمكن استخدام "class" لأن class كلمة محجوزة في بايثون وسيعطي المترجم خطأ قواعدي إذا تم استخدام "class" ك keyword argument).
خامساً: نستخدم الدالة find_all مع الوسيط المسمى class_ للعثور على جميع العلامات باستخدام فئة CSS المحددة (يمكنك استخدام الدالة find إذا أردت العثور على tag واحدة فقط). وأخيراً نقوم بطباعة ال tags المستخرجة.

from bs4 import BeautifulSoup
# HTML مستند 
document = """
			<html>
			<head>
				<title> Hsoub Academy </title>
			</head>
			<body>
				<div class="ext" >Extract this tag</div>
			</body>
			</html>
			"""
# بناء دالة لاستخراج العلامات
def find_tags_from_class(htmlDoc):
	# HTML تحليل محتويات ملف ال
	soup = BeautifulSoup(htmlDoc, "html.parser")
	# css البحث عن العلامات حسب فئة
	res = soup.find("div", class_= "ext")
	# طباعتها
	print(res)
# استدعاء الدالة
find_tags_from_class(document)
# الخرج:
# <div class="ext">Extract this tag</div>

مثال آخر، في حالة استخدام الدالة find_all من أجل إيجاد جميع ال tags:

# Import Module
from bs4 import BeautifulSoup
document = """
			<html>
			<head>
				<title> Hsoub Academy </title>
			</head>
			<body>
				<table>
				<tr>
					<td class = "table-row"> t1 </td>
					<td class = "table-row"> t2 </td>
					<td class = "table-row"> t3 </td>
				</tr>
				</table>
			</body>
			</html>
			"""
def find_tags_from_class(htmlDoc):
	soup = BeautifulSoup(htmlDoc, "html.parser")
	res = soup.find_all("td", class_= "table-row")
	for row in res:
		print(row)
find_tags_from_class(document)
# الخرج:
"""
<td class="table-row"> t1 </td>
<td class="table-row"> t2 </td>
<td class="table-row"> t3 </td>
"""

مثال آخر عن البحث عن العلامات بواسطة فئة CSS من موقع ويب:

from bs4 import BeautifulSoup
import requests
# تعيين الموقع
import requests
URL = "https://academy.hsoub.com/"
HTML_DOC = requests.get(URL)
# Function to find tags
def find_tags_from_class(html):
	soup = BeautifulSoup(html.content, "html5lib")
	div = soup.find("div", class_= "body")
	print(div)
find_tags_from_class(HTML_DOC)

تم التعديل في 20 نوفمبر 2021 بواسطة Ali Haidar Ahmad

Ahmed Sharshar · 26 نوفمبر 2021

يمكنك كذلك استخدام الطرق التالية بدون الحاجة الي find_all من أجل ايجاد الوسوم:

أولا اذا كان الملف لديك بالفعل يمكنك استخدام المثال التالي بسهولة:

# استدعاء المكتبة
from bs4 import BeautifulSoup

markup = """

<!DOCTYPE>
<html>
<head><title>Example</title></head>
	<body>
		<div class="first"> Div with Class first
		</div>
		<p class="first"> Para with Class first
		</p>

		<div class="second"> Div with Class second
		</div>
		<span class="first"> Span with Class first
		</span>
	</body>
</html>
"""

# تحويله الى html
soup = BeautifulSoup(markup, 'html.parser')

# طباعة اسماء الملفات الموجودة
for i in soup.find_all(class_="first"):
	print(i.name)

اما اذا كان لدينا رابط الصفحة فنستخدم نفس الطريقة تقريبا كالتالي:

# استدعاء المكتبات
from bs4 import BeautifulSoup
import requests

#الرابط
URL = "https://academy.hsoub.com/questions/18275-%D8%A5%D9%8A%D8%AC%D8%A7%D8%AF-%D8%A7%D9%84%D9%88%D8%B3%D9%88%D9%85-tags-%D9%85%D9%86-%D8%AE%D9%84%D8%A7%D9%84-%D9%81%D8%A6%D8%A9-css-%D8%A8%D8%A7%D8%B3%D8%AA%D8%AE%D8%AF%D8%A7%D9%85-beautifulsoup-%D9%81%D9%8A-%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86/"
html = requests.get(URL)

# تحويله الى HTML
soup = BeautifulSoup(html.content, "html5lib")

# طباعة اسماء ال tags
for i in soup.find_all(class_="article--container_content"):
	print(i.name)

إيجاد الوسوم tags من خلال فئة CSS باستخدام BeautifulSoup في بايثون

السؤال

إياد أحمد

رابط هذا التعليق

شارك على الشبكات الإجتماعية

2 أجوبة على هذا السؤال

Recommended Posts

Ali Haidar Ahmad

رابط هذا التعليق

شارك على الشبكات الإجتماعية

Ahmed Sharshar

رابط هذا التعليق

شارك على الشبكات الإجتماعية

انضم إلى النقاش

إعلانات

تابعنا على

الرئيسية

تابعنا

دروس ومقالات

أسئلة وأجوبة

كتب

دورات

بطاقات هدية