Hadoop File Already Exists Exception : org.apache.hadoop.mapred.FileAlreadyExistsException

Aim behind writing this article is to make developers aware about the issue which they might face while developing the MapReduce application. Well the above error org.apache.hadoop.mapred.FileAlreadyExistsException is one of the most basic exception which every beginner face while writing their first map reduce program.

 1Exception in thread main org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/home/facebook/crawler-output already exists
 2    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
 3    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:269)
 4    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
 5    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
 6    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
 7    at java.security.AccessController.doPrivileged(Native Method)
 8    at javax.security.auth.Subject.doAs(Subject.java:415)
 9    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
10    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
11    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
12    at com.wagh.wordcountjob.WordCount.main(WordCount.java:68)
13    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
14    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
15    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16    at java.lang.reflect.Method.invoke(Method.java:606)
17    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
18    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


Let's start from scratch - Why do we get this exception?

To run a map reduce job you have to write a command similar to below command

1$hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}

Example : -

1hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output


Just pay attention on the {output\directory\path} i.e. /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".

Solution: -

Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation).

As mentioned in the above example the same command can be run in following manner - "hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"

So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.