Thursday 28 July 2016

Changing the Log4j debug with `hadoop jar` in MapReduce jobs

Source: http://goo.gl/v7P1x2

One of the most common questions I come across when trying to help debug MapReduce jobs is: "How do I change the Log4j level for my job?" Many times, a user has a JAR with a class that implements Tool that they invoke using the hadoop jar command. The desire is to change the log level without changing any code or global configuration files:
  1. hadoop jar MyApplication.jar com.myorg.application.ApplicationJob <args ...>
There is an extremely large amount of misinformation because how to do this has drastically changed from the 0.20.x and 1.x Apache Hadoop days. Most posts will inform you of some solution involving environment variables or passing Java opts to the mappers/reducers. In practice, there is actually a very straightforward solution.
To change the Mapper Log4j level, set mapreduce.map.log.level. To change the Reducer Log4j level, set mapreduce.reduce.log.level. If for some reason you need to change the Log4j level on the MapReduce ApplicationMaster (e.g. to debug InputSplit generation), you need to set yarn.app.mapreduce.am.log.level. This is the proper way for the Apache Hadoop 2.x release line. These options do not allow configuration of a Log4j level on a certain class or package -- this would require custom logging setup to be provided by your application.
It's important to remember that you are able to define configuration properties (which will appear in your job via the Hadoop Configuration) using the `hadoop jar` command:
  1. hadoop jar <jarfile> <classname> [-Dkey=value ...] [arg, ...]
The `-Dkey=value` section can be used to define the Log4j configuration properties when you launch the job.
For example, to set the DEBUG Log4j level on Mappers:
  1. hadoop jar MyApplication.jar com.myorg.application.ApplicationJob -Dmapreduce.map.log.level=DEBUG <args ...>
To set the WARN Log4j level on Reducers:
  1. hadoop jar MyApplication.jar com.myorg.application.ApplicationJob -Dmapreduce.reduce.log.level=WARN <args ...>
To set the DEBUG Log4j level on the MapReduce Application Master:
  1. hadoop jar MyApplication.jar com.myorg.application.ApplicationJob -Dyarn.app.mapreduce.am.log.level=DEBUG <args ...>
And, of course, each of these options can be used with one another:
  1. hadoop jar MyApplication.jar com.myorg.application.ApplicationJob -Dmapreduce.map.log.level=DEBUG -Dmapreduce.reduce.log.level=DEBUG -Dyarn.app.mapreduce.am.log.level=DEBUG <args ...>

No comments:

Post a Comment