إياد أحمد نشر 21 نوفمبر 2021 أرسل تقرير نشر 21 نوفمبر 2021 أحاول جمع بعض المعلومات من المواقع المختلفة وأريد استخدام bs4 لاستخلاص بعض الفقرات من مواقع مختلفة، فيكف يمكنني القيام بذلك؟ على سبيل المثال لدي صفحة الويب التالية:https://undergrad.cs.umd.edu/what-computer-science أريد أن أقوم بسحب هذه الفقرة. 1 اقتباس
1 Ali Haidar Ahmad نشر 21 نوفمبر 2021 أرسل تقرير نشر 21 نوفمبر 2021 يمكنك القيام بذلك بالشكل التالي: # استيراد الوحدات from bs4 import BeautifulSoup import requests # تحديد العنوان url="https://undergrad.cs.umd.edu/what-computer-science" # والحصول على الصفحة GET إرسال طلب page = requests.get(url) # BeautifulSoup تحليل مكونات الصفحة باستخدام soup = BeautifulSoup(page.content, "lxml") # lxml استخدمنا المحلل # استخلاص كل الفقرات وعرضها for para in soup.find_all("p"): print(para.get_text()) الخرج: Computer Science is the study of computers and computational systems. Unlike electrical and computer engineers, computer scientists deal mostly with software and software systems; this includes their theory, design, development, and application. Principal areas of study within Computer Science include artificial intelligence, computer systems and networks, security, database systems, human computer interaction, vision and graphics, numerical analysis, programming languages, software engineering, bioinformatics and theory of computing. Although knowing how to program is essential to the study of computer science, it is only one element of the field. Computer scientists design and analyze algorithms to solve programs and study the performance of computer hardware and software. The problems that computer scientists encounter range from the abstract-- determining what problems can be solved with computers and the complexity of the algorithms that solve them – to the tangible – designing applications that perform well on handheld devices, that are easy to use, and that uphold security measures. Graduates of University of Maryland’s Computer Science Department are lifetime learners; they are able to adapt quickly with this challenging field. Contact Our Office 1 اقتباس
0 Ahmed Sharshar نشر 26 نوفمبر 2021 أرسل تقرير نشر 26 نوفمبر 2021 بجانب استخدام xml ك parser يمكنك كذلك استخدام html مع استخدام urllib.request كبديل للمكتبة requests كالتالي: # استدعاء المكتبات import urllib.request from bs4 import BeautifulSoup # الموقع url = "https://undergrad.cs.umd.edu/what-computer-science" # قراءة الملفات من الموقع html = urllib.request.urlopen(url) # تحويلها الى html htmlParse = BeautifulSoup(html, 'html.parser') # الحصول على كل الفقرات for para in htmlParse.find_all("p"): print(para.get_text()) ويكون العائد منها كالتالي: Computer Science is the study of computers and computational systems. Unlike electrical and computer engineers, computer scientists deal mostly with software and software systems; this includes their theory, design, development, and application. Principal areas of study within Computer Science include artificial intelligence, computer systems and networks, security, database systems, human computer interaction, vision and graphics, numerical analysis, programming languages, software engineering, bioinformatics and theory of computing. Although knowing how to program is essential to the study of computer science, it is only one element of the field. Computer scientists design and analyze algorithms to solve programs and study the performance of computer hardware and software. The problems that computer scientists encounter range from the abstract-- determining what problems can be solved with computers and the complexity of the algorithms that solve them – to the tangible – designing applications that perform well on handheld devices, that are easy to use, and that uphold security measures. Graduates of University of Maryland’s Computer Science Department are lifetime learners; they are able to adapt quickly with this challenging field. Contact Our Office اقتباس
السؤال
إياد أحمد
أحاول جمع بعض المعلومات من المواقع المختلفة وأريد استخدام bs4 لاستخلاص بعض الفقرات من مواقع مختلفة، فيكف يمكنني القيام بذلك؟
على سبيل المثال لدي صفحة الويب التالية:
https://undergrad.cs.umd.edu/what-computer-science
أريد أن أقوم بسحب هذه الفقرة.
2 أجوبة على هذا السؤال
Recommended Posts
انضم إلى النقاش
يمكنك أن تنشر الآن وتسجل لاحقًا. إذا كان لديك حساب، فسجل الدخول الآن لتنشر باسم حسابك.