اذهب إلى المحتوى

السؤال

نشر

قمت بتثبيت مكتبة NLTK وبعدها أتبعت أحد المسارات التعليمية من إحدى المواقع لتثبيت NLTK Data لكن يظهر لي خطأ:

import nltk
nltk.download()
"""
AttributeError: 'module' object has no attribute 'download'
"""

لذا كيف يمكنني تثبيت نماذج وبيانات nltk؟
 

Recommended Posts

  • 1
نشر

لتحميل مجموعة بيانات أو نماذج معينة يمكنك استخدم الدالة nltk.download. على سبيل المثال إذا كنت تريد تنزيل النموذج "Punkt" من أجل استخدام ال sentence tokenizer فيمكنك تحميله بالشكل التالي:

import nltk
nltk.download('punkt')

إذا لم تكن متأكداً من البيانات / النموذج الذي تحتاجه، فيمكنك  استخدام "popular" لتحميل أهم النماذج والبيانات المتوفرة بالشكل التالي:

import nltk
nltk.download('popular')
"""
[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/cmudict.zip.
[nltk_data]    | Downloading package gazetteers to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/gazetteers.zip.
[nltk_data]    | Downloading package genesis to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/genesis.zip.
[nltk_data]    | Downloading package gutenberg to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/gutenberg.zip.
[nltk_data]    | Downloading package inaugural to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/inaugural.zip.
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping corpora/movie_reviews.zip.
[nltk_data]    | Downloading package names to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/names.zip.
[nltk_data]    | Downloading package shakespeare to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/shakespeare.zip.
[nltk_data]    | Downloading package stopwords to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/stopwords.zip.
[nltk_data]    | Downloading package treebank to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/treebank.zip.
[nltk_data]    | Downloading package twitter_samples to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping corpora/twitter_samples.zip.
[nltk_data]    | Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/omw-1.4.zip.
[nltk_data]    | Downloading package omw to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/omw.zip.
[nltk_data]    | Downloading package wordnet to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/wordnet.zip.
[nltk_data]    | Downloading package wordnet31 to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/wordnet31.zip.
[nltk_data]    | Downloading package wordnet_ic to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/wordnet_ic.zip.
[nltk_data]    | Downloading package words to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/words.zip.
[nltk_data]    | Downloading package maxent_ne_chunker to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data]    | Downloading package punkt to /root/nltk_data...
[nltk_data]    |   Unzipping tokenizers/punkt.zip.
[nltk_data]    | Downloading package snowball_data to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data]    | 
[nltk_data]  Done downloading collection popular

True
"""

وهذا يتضمن الحزم التالية:

<collection id="popular" name="Popular packages">
      <item ref="cmudict" />
      <item ref="gazetteers" />
      <item ref="genesis" />
      <item ref="gutenberg" />
      <item ref="inaugural" />
      <item ref="movie_reviews" />
      <item ref="names" />
      <item ref="shakespeare" />
      <item ref="stopwords" />
      <item ref="treebank" />
      <item ref="twitter_samples" />
      <item ref="omw" />
      <item ref="wordnet" />
      <item ref="wordnet_ic" />
      <item ref="words" />
      <item ref="maxent_ne_chunker" />
      <item ref="punkt" />
      <item ref="snowball_data" />
      <item ref="averaged_perceptron_tagger" />
    </collection>

في حال أردت تحميل كل النماذج/ البيانات:

import nltk
nltk.download('all')


في حال واجهك خطأ كهذا:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load
    opened_resource = _open(resource_url)
  File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open
    return find(path_, path + ['']).open()
  File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  Searched in:
    - '/Users/alvas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

فهذا يعني أنك تحاول استخدام نموذج أو بيانات غير موجودة، ويجب عليك تحميلها أولاً، أيضاً يخبرك بالنموذج/ البيانات المطلوب تنزيلها، وفي المثال هنا يطلب منك تثبيت النموذج punkt:

import nltk
nltk.download('punkt')

في حال واجهتك مشاكل أثناء تحميل بيانات ضخمة كهذا الخطأ:

import nltk
nltk.download('all')
"""
[nltk_data]    | Downloading package panlex_lite to
[nltk_data]    |     /Users/Harshil/nltk_data...
[nltk_data]    |   Unzipping corpora/panlex_lite.zip.
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
  nltk.download('all', halt_on_error = False)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 543, in incr_download
for msg in self.incr_download(info.children, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 529, in incr_download
for msg in self._download_list(info_or_id, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 572, in _download_list
for msg in self.incr_download(item, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
for msg in self._download_package(info, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
for msg in _unzip_iter(filepath, zipdir, verbose=False):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
outfile.write(contents)
OSError: [Errno 22] Invalid argument
"""

استخدم:

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
import nltk
dler = nltk.downloader.Downloader()
dler._update_index()
dler._status_cache['panlex_lite'] = 'installed' 
dler.download('popular')

 

انضم إلى النقاش

يمكنك أن تنشر الآن وتسجل لاحقًا. إذا كان لديك حساب، فسجل الدخول الآن لتنشر باسم حسابك.

زائر
أجب على هذا السؤال...

×   لقد أضفت محتوى بخط أو تنسيق مختلف.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   جرى استعادة المحتوى السابق..   امسح المحرر

×   You cannot paste images directly. Upload or insert images from URL.

  • إعلانات

  • تابعنا على



×
×
  • أضف...