Goal:
This article shows an example java code for:Using Spark job to upload files to AWS S3 with Server Side Encryption enabled
Env:
MapR 5.1 with Hadoop 2.7.0(with aws-java-sdk-1.7.4.jar shipped together)Spark 1.5.2
Solution:
1. Download my source code from github
git clone git@github.com:viadea/Spark_Upload_S3.gitPlease note that in AWS SDK 1.7.4, to enable SSE feature, the method "setServerSideEncryption" in java class "ObjectMetadata" should be used:
objectMetadata.setServerSideEncryption("AES256");In the later version of AWS SDK, say 1.7.15, this method was replaced by method "setSSEAlgorithm":
objectMetadata.setSSEAlgorithm(ObjectMetadata.AES_256_SERVER_SIDE_ENCRYPTION);
So please make sure you are using the right method in right AWS SDK version, otherwise, you may trigger "NoSuchMethod" error.
2. Compile using maven
mvn clean packagePlease note that in pom.xml, I am using aws java sdk 1.7.4 as dependency because Hadoop 2.7.0 also ships with the same version -- aws-java-sdk-1.7.4.jar.
This is to make sure the libs used by spark application are in sync with Hadoop cluster:
<dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk</artifactId> <version>1.7.4</version> </dependency>
3. Run the spark job
/opt/mapr/spark/spark-1.5.2/bin/spark-submit \ --class example.uploads3.UploadS3 \ --master yarn \ /mapr/my2.cluster.com/github/Spark_Upload_S3/target/spark_upload_s3-1.0.jar \ /user/mapr/input/data.txtThis sample job will upload the data.txt to S3 bucket named "haos3" with key name "test/byspark.txt".
4. Confirm that this file will be SSE encrypted.
Check AWS S3 web page, and click "Properties" for this file, we should see SSE enabled with "AES-256" algorithm:Reference:
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/ObjectMetadata.html
No comments:
Post a Comment