استخراج قيمة سمة attribute ضمن ال tags باستخدام beautifulsoup في Python

21 نوفمبر 2021

لدي الموقع التالي:
https://www.imdb.com/title/tt5648202/
وأحاول استخراج القيم التي تأخذها ال attribute (السمة) class ضمن الوسم main في الصفحة؟ كيف يمكنني الحصول عليها؟

21 نوفمبر 2021

يمكنك القيام بذلك بالشكل التالي:

# Import Beautiful Soup
from bs4 import BeautifulSoup
htmlDoc='''
	<html>
		<h2 class="first second third"> Heading 1 </h2>
		<h1> Heading 2 </h1>
	</html>
	'''
# تحليل المكونات
soup = BeautifulSoup(htmlDoc, "lxml")
# الحصول على الوسم 
tag = soup.h2
#الحصول على قيمة السمة المطلوبة 
attribute = tag['class']
# طباعتها
print(attribute)
# ['first', 'second', 'third']

وبفرض كان لديك أكثر من وسم من نفس النوع استخدم findall:

# Import Beautiful Soup
from bs4 import BeautifulSoup
htmlDoc='''
	<html>
		<h2 class="v0"> Heading 1 </h2>
    <h2 class="v1"> Heading 2 </h2>
    <h2 class="v2"> Heading 3 </h2>
		<h1> Heading 2 </h1>
	</html>
	'''
# تحليل المكونات
soup = BeautifulSoup(htmlDoc, "lxml")
tags = soup.find_all('h2') 
for tag in tags:
  attribute = tag['class']
  print(attribute)
"""
['v0']
['v1']
['v2']
"""

وبالتالي في مثال يمكنك القيام بذلك بالشكل التالي:

# استيراد الوحدات
from bs4 import BeautifulSoup
import requests
# تحديد العنوان
url="https://www.imdb.com/title/tt5648202/"
# GET إرسال طلب 
page = requests.get(url)
# BeautifulSoup تحليل مكونات الصفحة باستخدام 
soup = BeautifulSoup(page.content, "lxml") # lxml استخدمنا المحلل 
# main الحصول على كل الوسوم التي تحمل اسم 
tags = soup.find_all('main')
# نقوم بالمرور عليها واحدة تلو الأخرى
for tag in tags:
  # الحصول على قيمة السمة المطلوبة
  attribute = tag['class']
  # طباعتها
  print(attribute)

26 نوفمبر 2021

بالاضافة للطرق السابقة يمكنك تحويل الملف الى xml ثم ايجاد القيم بسهولة باستخدام find_all كالتالي:

xmlData = None

with open('conf//test1.xml', 'r') as xmlFile:
    xmlData = xmlFile.read()

xmlDecoded = xmlData

xmlSoup = BeautifulSoup(xmlData, 'html.parser')

repElemList = xmlSoup.find_all('repeatingelement')

for repElem in repElemList:
    print("Processing repElem...")
    repElemID = repElem.get('id')
    repElemName = repElem.get('name')

    print("Attribute id = %s" % repElemID)
    print("Attribute name = %s" % repElemName)

اما اذا أردت ايجاد قيمة عنصر معين:

يمكنك كذلك استخدام find_all لجلب العناصر كالتالي:

input_tag = soup.find_all(attrs={"name" : "stainfo"})

بعد ذلك تحديد العنصر الذي تريده بين كل العناصر التي تم ارجاعها:

output = input_tag[0]['value']

أو استخدام find لجلب عنصر واحد فقط ثم ايجاد قيمته:

input_tag = soup.find(attrs={"name": "stainfo"})
output = input_tag['value']

استخراج قيمة سمة attribute ضمن ال tags باستخدام beautifulsoup في Python

السؤال

إياد أحمد

2 أجوبة على هذا السؤال

Recommended Posts

Ali Haidar Ahmad

Ahmed Sharshar

انضم إلى النقاش

إعلانات

تابعنا على

الرئيسية

كيف أتعلم؟

تابعنا

دروس ومقالات

أسئلة وأجوبة

كتب

دورات

بطاقات هدية