amazon web services - How to clean up S3 files that is used by AWS Firehose after loading the files? -
aws firehose uses s3 intermittent storage before data copied redshift. once data transferred redshift, how clean them automatically if succeeds.
i deleted files manually, went out of state complaining files got deleted, had delete , recreate firehose again resume.
deleting files after 7 days s3 rules work? or there automated way, firehose can delete successful files got moved redshift.
once you're done loading destination table, execute similar (the below snippet typical shell script):
aws s3 ls $aws_bucket/$table_name.txt.gz if [ "$?" = "0" ] aws s3 rm $aws_bucket/$table_name.txt.gz fi
this'll check whether table you've loaded exists on s3 or not , drop it. execute part of cronjob.
if etl/elt not recursive, can write snippet towards end of script. it'll delete file on s3 after populating table. however, before execution of part, make sure target table has been populated.
if etl/elt recursive, may put somewhere @ beginning of script check , remove files created in previous run. this'll retain files created till next run , should preferred file act backup in case last load fails (or need flat file of last load other purpose).
Comments
Post a Comment