Skip to content

NUTCH-3167 Upgrade to Hadoop 3.5.0#911

Open
lewismc wants to merge 7 commits intoapache:masterfrom
lewismc:NUTCH-3167
Open

NUTCH-3167 Upgrade to Hadoop 3.5.0#911
lewismc wants to merge 7 commits intoapache:masterfrom
lewismc:NUTCH-3167

Conversation

@lewismc
Copy link
Copy Markdown
Member

@lewismc lewismc commented Apr 21, 2026

PR for NUTCH-3167

  • Upgraded Hadoop client stack to 3.5.0
  • Introduces a single hadoop.version property (used in ivy/ivy.xml and Hadoop Javadoc links)
  • Makes Java 17 the minimum runtime and default bytecode level (javac.version=17 in default.properties)
  • Documentation and build.xml comments are updated to match that strategy.
  • CI on master-build.yml is aligned with the new baseline:
    • Temurin JDK 17 and 21 on Ubuntu for build, javadoc, and RAT; builds use -Djavac.version=17 with bytecode verification for major version 61
    • the old Java 11 runtime verification job is replaced by runtime-smoke on JDK 17 and 21.
    • Tests keep a 17/21 × Ubuntu/macOS matrix
    • related workflow tweaks appear in junit-report.yml and sonarcloud.yml (artifact paths / naming consistent with the build job).
  • Code changes reduce reliance on deprecated Hadoop APIs where files were touched: move from legacy mapred types to mapreduce where appropriate
  • refresh SegmentReader sequence-file access
  • adjust FetcherOutputFormat exception typing
  • Use findCounter in crawl test helpers
  • ReducerContextWrapper is refactored to use a Mockito mock of Reducer.Context instead of an anonymous subclass that forced overrides of deprecated JobContext methods (large net deletion in that test utility)
  • ivy/mvn.template is updated for compiler 17 in line with the new default
  • Formatting in ivy.xml updated from tabs to spaces and spaces removed from end of lines to comply with Yetus checks

Once GitHub CI passes I'll trigger the Nutch-Smoke-Test-Single-Node-Hadoop-Cluster.

@lewismc lewismc changed the title Nutch 3167 NUTCH-3167 Upgrade to Hadoop 3.5.0 Apr 21, 2026
@lewismc lewismc requested a review from sebastian-nagel April 21, 2026 23:54
@lewismc lewismc self-assigned this Apr 21, 2026
@lewismc
Copy link
Copy Markdown
Member Author

lewismc commented Apr 21, 2026

Nutch-Smoke-Test-Single-Node-Hadoop-Cluster build # 50 result.
Looks like the smoke test failed as follows


2026-04-22 00:09:44,324 INFO crawl.DeduplicationJob: Deduplication finished, elapsed: 94020 ms
--
  |   |   | Skipping indexing ...
  |   |   | Wed Apr 22 00:09:44 UTC 2026 : Finished loop with 1 iterations
  |   |   | + echo https://www.sitemaps.org/sitemap.xml
  |   |   | + hadoop fs -copyFromLocal -f sitemaps.txt crawl/seeds/
  |   |   | + nutch sitemap crawl/crawldb -sitemapUrls crawl/seeds/sitemaps.txt
  |   |   | + /home/jenkins/jenkins-agent/workspace/Nutch/Nutch-Smoke-Test-Single-Node-Hadoop-Cluster/nutch/runtime/deploy/bin/nutch sitemap crawl/crawldb -sitemapUrls crawl/seeds/sitemaps.txt
  |   |   | 2026-04-22 00:09:51,890 INFO util.SitemapProcessor: SitemapProcessor: sitemap urls dir: crawl/seeds/sitemaps.txt
  |   |   | 2026-04-22 00:09:51,893 INFO util.SitemapProcessor: SitemapProcessor: starting
  |   |   | 2026-04-22 00:09:53,699 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
  |   |   | 2026-04-22 00:09:54,367 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/jenkins/.staging/job_1776816079197_0010
  |   |   | 2026-04-22 00:09:58,112 INFO input.FileInputFormat: Total input files to process : 1
  |   |   | 2026-04-22 00:09:58,190 INFO input.FileInputFormat: Total input files to process : 2
  |   |   | 2026-04-22 00:09:59,865 INFO mapreduce.JobSubmitter: number of splits:3
  |   |   | 2026-04-22 00:10:01,240 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1776816079197_0010
  |   |   | 2026-04-22 00:10:01,241 INFO mapreduce.JobSubmitter: Executing with tokens: []
  |   |   | 2026-04-22 00:10:01,494 INFO conf.Configuration: resource-types.xml not found
  |   |   | 2026-04-22 00:10:01,495 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
  |   |   | 2026-04-22 00:10:01,576 INFO impl.YarnClientImpl: Submitted application application_1776816079197_0010
  |   |   | 2026-04-22 00:10:01,618 INFO mapreduce.Job: The url to track the job: http://asf924.gq1.ygridcore.net:8088/proxy/application_1776816079197_0010/
  |   |   | 2026-04-22 00:10:01,618 INFO mapreduce.Job: Running job: job_1776816079197_0010
  |   |   | 2026-04-22 00:10:22,922 INFO mapreduce.Job: Job job_1776816079197_0010 running in uber mode : false
  |   |   | 2026-04-22 00:10:22,927 INFO mapreduce.Job:  map 0% reduce 0%
  |   |   | 2026-04-22 00:10:28,035 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000000_0, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:28,054 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000001_0, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:33,118 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000002_0, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:34,132 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000000_1, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:38,197 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000001_1, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:40,217 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000002_1, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:42,238 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000000_2, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:44,257 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000001_2, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:46,277 INFO mapreduce.Job: Task Id : attempt_1776816079197_0010_m_000002_2, Status : FAILED
  |   |   | Error: java.lang.ClassNotFoundException: org.apache.commons.jexl3.JexlContext
  |   |   | at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  |   |   | at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  |   |   | at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  |   |   | at java.base/java.lang.Class.forName0(Native Method)
  |   |   | at java.base/java.lang.Class.forName(Class.java:467)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2661)
  |   |   | at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2626)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2722)
  |   |   | at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2749)
  |   |   | at org.apache.hadoop.mapred.JobConf.getOutputValueClass(JobConf.java:1108)
  |   |   | at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:859)
  |   |   | at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1032)
  |   |   | at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:410)
  |   |   | at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:712)
  |   |   | at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
  |   |   | at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
  |   |   | at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
  |   |   | at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  |   |   | at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  |   |   | at org.apache.hadoop.security.authentication.util.SubjectUtil.doAs(SubjectUtil.java:328)
  |   |   | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1958)
  |   |   | at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
  |   |   |  
  |   |   | 2026-04-22 00:10:50,312 INFO mapreduce.Job:  map 100% reduce 100%
  |   |   | 2026-04-22 00:10:51,333 INFO mapreduce.Job: Job job_1776816079197_0010 failed with state FAILED due to: Task failed task_1776816079197_0010_m_000000
  |   |   | Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0
  |   |   |  
  |   |   | 2026-04-22 00:10:51,464 INFO mapreduce.Job: Counters: 14
  |   |   | Job Counters
  |   |   | Failed map tasks=10
  |   |   | Killed map tasks=2
  |   |   | Killed reduce tasks=1
  |   |   | Launched map tasks=11
  |   |   | Other local map tasks=8
  |   |   | Data-local map tasks=3
  |   |   | Total time spent by all maps in occupied slots (ms)=124431
  |   |   | Total time spent by all reduces in occupied slots (ms)=0
  |   |   | Total time spent by all map tasks (ms)=41477
  |   |   | Total vcore-milliseconds taken by all map tasks=41477
  |   |   | Total megabyte-milliseconds taken by all map tasks=127417344
  |   |   | Map-Reduce Framework
  |   |   | CPU time spent (ms)=0
  |   |   | Physical memory (bytes) snapshot=0
  |   |   | Virtual memory (bytes) snapshot=0
  |   |   | 2026-04-22 00:10:51,474 ERROR util.SitemapProcessor: SitemapProcessor job did not succeed, job id: job_1776816079197_0010, job status: FAILED, reason: Task failed task_1776816079197_0010_m_000000
  |   |   | Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0
  |   |   |  
  |   |   | 2026-04-22 00:10:51,517 ERROR util.SitemapProcessor: SitemapProcessor: java.lang.RuntimeException: SitemapProcessor job did not succeed, job id: job_1776816079197_0010, job status: FAILED, reason: Task failed task_1776816079197_0010_m_000000
  |   |   | Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0
  |   |   |  
  |   |   | at org.apache.nutch.util.SitemapProcessor.sitemap(SitemapProcessor.java:498)
  |   |   | at org.apache.nutch.util.SitemapProcessor.run(SitemapProcessor.java:599)
  |   |   | at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
  |   |   | at org.apache.nutch.util.SitemapProcessor.main(SitemapProcessor.java:533)
  |   |   | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  |   |   | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  |   |   | at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  |   |   | at java.base/java.lang.reflect.Method.invoke(Method.java:569)
  |   |   | at org.apache.hadoop.util.RunJar.run(RunJar.java:333)
  |   |   | at org.apache.hadoop.util.RunJar.main(RunJar.java:254)


May have a classpath issue for JEXL which we need to address. I'll investigate.

@lewismc
Copy link
Copy Markdown
Member Author

lewismc commented Apr 22, 2026

The junit-report and sonarcloud downstream/dependent GitHub CI workflows fail due to restrictions imposes by ASF Infra. Although they as somewhat tangential, I will update this PR with the fixes so we have full visibility into the CI checks. These checks will fail until we merge this branch into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant