python - download remote gz files that reside in a tree like directories does snot work -
i have been scratching head more 2 days, still cannot figure out how following! want download geo data sets in ftp://ftp.ncbi.nlm.nih.gov , in each data set, need see if contain keywords interested in. able manually download 1 of data sets , checked file desired keywords. however, since number of data sets huge, cannot manually. want write program me. first step, tried see if can download them. structure follows:
hots-> /geo/ -> datasets/ -> gds1nnn/ .... way through gds6nnn , each of them contain more 600 directories; ordered number i.e. gds1001. now, in each of these directories: ---> soft inside folder there 2 files named this: folder name (gds1001)+_full.soft.gz
this file think need download , see if keywords looking inside file.
here code:
ftp = ftp('ftp.ncbi.nlm.nih.gov') # remember need provide host name not complete address! ftp.login() #ftp.retrlines('list') ftp.cwd("/geo/datasets/gds1nnn/") ftp.retrlines('list') filenames = ftp.nlst() count = len(filenames) curr = 0 print ("found {} files".format(count)) filename in filenames: first_path=filename+"/soft/" second_path=first_path+filename+"_full.soft.gz" #print(second_path) local_filename = os.path.join(r'full path folder created') file = open(local_filename, 'wb') ftp.retrbinary('retr ' + second_path, file.write) file.close() ftp.quit()
output:
file = open(local_filename, 'wb') permissionerror: [errno 13] permission denied: full path folder created'
however, have both read , write permission on folder. help
the following code shows how can create folder each dataset , save content folder.
import sys, ftplib, os, itertools ftplib import ftp zipfile import zipfile ftp = ftp('ftp.ncbi.nlm.nih.gov') ftp.login() #ftp.retrlines('list') ftp.cwd("/geo/datasets/gds1nnn/") ftp.retrlines('list') filenames = ftp.nlst() curr = 0 #print ("found {} files".format(count)) count = 0 filename in filenames: array_db=[] os.mkdir( os.path.join('folder called "output' + filename ) ) first_path=filename+"/soft/" os.mkdir( os.path.join('folder called "output' + first_path ) ) second_path=first_path+filename+"_full.soft.gz" array_db.append(second_path) array in array_db: print(array) local_filename = os.path.join('folder called "output' + array ) file = open(local_filename, 'wb') ftp.retrbinary('retr ' + array, file.write) file.flush() file.close() ftp.quit()
Comments
Post a Comment