كيف يمكننا إيجاد العلامات النحوية أو ال Part of Speech Tagging (POS) باستخدام NLTK في بايثون

إياد أحمد · 10 ديسمبر 2021

أعمل على بناء خوارزمية في ال NLP وأحتاج إلى طريقة يمكنني من خلالها تصنيف كل كلمة حسب نوعها (اسم علم، اسم، اسم جمع, فعل،..إلخ) هل تحتوي NLTKعلى دالة أو طريقة للقيام بذلك؟
فعلى فرض لدي النص التالي:

"Life is like riding a bicycle. To keep your balance, you must keep moving."

كيف يمكنني تحديد الصنف القواعدي لكل كلمة فيه؟ على سبيل المثال keep هي فعل و MUST هي شرط...
أيضاً أريد حذف كلمات التوقف.

Ali Haidar Ahmad · 10 ديسمبر 2021

نعم في الحقيقة NLTK هي أفضل من قام بذلك. في NLTK يُشار إلى كل علامة نحوية (أو جزء من الكلام) برمز محدد وهذه هي قائمة الرموز:

VB verb, base form take
VBD verb, past tense took
VBG verb, gerund/present participle taking
VBN verb, past participle taken
VBP verb, sing. present, non-3d take
VBZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what
WP$ possessive wh-pronoun whose
WRB wh-abverb where, when
CC coordinating conjunction
CD cardinal digit
DT determiner
EX existential there (like: “there is” … think of it like “there exists”)
FW foreign word
IN preposition/subordinating conjunction
JJ adjective ‘big’
JJR adjective, comparative ‘bigger’
JJS adjective, superlative ‘biggest’
LS list marker 1)
MD modal could, will
NN noun, singular ‘desk’
NNS noun plural ‘desks’
NNP proper noun, singular ‘Harrison’
NNPS proper noun, plural ‘Americans’
PDT predeterminer ‘all the kids’
POS possessive ending parent‘s
PRP personal pronoun I, he, she
PRP$ possessive pronoun my, his, hers
RB adverb very, silently,
RBR adverb, comparative better
RBS adverb, superlative best
RP particle give up
TO to go ‘to‘ the store.
UH interjection errrrrrrrm

الآن لإيجاد ال POS في النص الخاص بك أو أي نص اتبع الكود التالي فقط:

# استيراد الوحدات 
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
# كلمات التوقف
stop_words = set(stopwords.words('english'))
# تحديد النص
txt = "Life is like riding a bicycle. To keep your balance, you must keep moving."
# تقسيم النص إلى جمل 
sent = sent_tokenize(txt) # هنا لدينا جملتين في النص أعلاه
# منها POS نقوم الآن بالمرور على كل جملة ونستخرج ال 
for i in sent:
    # لإيجاد الكلمات وعلامات الترقيم word_tokenize نستخدم الآن 
    wordsList = nltk.word_tokenize(i)
    # نقوم بحذف كلمات التوقف منها
    wordsList = [w for w in wordsList if not w in stop_words]
    # تصنيف كل كلمة nltk ونمرر لها الكلمات وستتولى  pos_tag نقوم الآن باستدعاء الدالة
    POS = nltk.pos_tag(wordsList)
    print(POS)

والخرج:

[('Life', 'NNP'), ('like', 'IN'), ('riding', 'VBG'), ('bicycle', 'NN'), ('.', '.')]
[('To', 'TO'), ('keep', 'VB'), ('balance', 'NN'), (',', ','), ('must', 'MD'), ('keep', 'VB'), ('moving', 'NN'), ('.', '.')]

Ahmed Sharshar · 10 ديسمبر 2021

كما أوضح علي فإن NLTK توفر امكانية ايجاد العلامات النحوية بسهولة خاصة باللغة الإنجليزية، المثال التالي يوضح كيفية القيام بهذا بشكل مفصل، لاحظ أولا فصل الجملة بعد ذلك معرفة القواعد الخاصة بكل كلمة ثم تجميع الكلمات المتشابهة سويا:

#استدعاء المكتبات
from nltk import pos_tag
from nltk import RegexpParser

#فصل الجملة الى كلمات
text ="learn php from hsoub and make study easy".split()
print("After Split:",text)

tokens_tag = pos_tag(text)
print("After Token:",tokens_tag)

patterns= """mychunk:{<NN.?>*<VBD.?>*<JJ.?>*<CC>?}"""
chunker = RegexpParser(patterns)
print("After Regex:",chunker)
  
#الخرج النهائي
output = chunker.parse(tokens_tag)
print("After Chunking",output)

ويكون شكل الخرج هكذا:

After Split: ['learn', 'php', 'from', 'hsoub', 'and', 'make', 'study', 'easy']
After Token: [('learn', 'JJ'), ('php', 'NN'), ('from', 'IN'), ('hsoub', 'NN'), ('and', 'CC'), ('make', 'VB'), ('study', 'NN'), ('easy', 'JJ')]
After Regex: chunk.RegexpParser with 1 stages:
RegexpChunkParser with 1 rules:
       <ChunkRule: '<NN.?>*<VBD.?>*<JJ.?>*<CC>?'>
After Chunking (S
  (mychunk learn/JJ)
  (mychunk php/NN)
  from/IN
  (mychunk hsoub/NN and/CC)
  make/VB
  (mychunk study/NN easy/JJ))

كيف يمكننا إيجاد العلامات النحوية أو ال Part of Speech Tagging (POS) باستخدام NLTK في بايثون

السؤال

إياد أحمد

رابط هذا التعليق

شارك على الشبكات الإجتماعية

2 أجوبة على هذا السؤال

Recommended Posts

Ali Haidar Ahmad

رابط هذا التعليق

شارك على الشبكات الإجتماعية

Ahmed Sharshar

رابط هذا التعليق

شارك على الشبكات الإجتماعية

انضم إلى النقاش

إعلانات

تابعنا على

الرئيسية

تابعنا

دروس ومقالات

أسئلة وأجوبة

كتب

دورات

بطاقات هدية