Build #3,866

License check with RAT and Checkstyle

Build: #3866 was successful Changes by albertshau <ashau@google.com>

Build result summary

Details

Completed
Duration
4 minutes
Labels
None
Agent
bamboo-agent10
Revision
4494b2ac00f01e681898352d20eec26f0ff61b45 4494b2ac00f01e681898352d20eec26f0ff61b45
Successful since
#3770 ()

Code commits

Author Commit Message Commit date
albertshau <ashau@google.com> albertshau <ashau@google.com> 4494b2ac00f01e681898352d20eec26f0ff61b45 4494b2ac00f01e681898352d20eec26f0ff61b45 Merge pull request #10121 from caskdata/feature/CDAP-13246-provisioner-failure-handling
CDAP-13246 provisioner failure handling
Albert Shau <albert@cask.co> Albert Shau <albert@cask.co> 4e7e05a90c82b704c9c7edac788c1dfad6a36cba m 4e7e05a90c82b704c9c7edac788c1dfad6a36cba CDAP-13246 provisioner failure handling
Adding logic to handle failures during provisioning.

When a RetryableProvisionException is thrown, the method will
be retried up to a time limit. The time limit is hardcoded today,
but will be configurable per profile later.
Also added handling for scenarios where a cluster is request to
be created, but when polling for status, the cluster returns a
non-running status. In these scenarios, the cluster is usually
deleted, then the create is retried.

Also added logic to pick up tasks that were being executed while
CDAP was shut down. Each task will store state about what it is
about to do. If CDAP is shut down in the middle of a task, when
it comes back up, it will scan the state store and re-create
tasks that were in progress.

Also some cleanup to state transition logic by moving it out of
AppMetadataStore and into the respective ProgramRunStatus and
ProgramRunClusterStatus enums.

Fixing a bad test that relied on an invalid state transition, and
adding a test to make sure we can't go from pending to completed.

JIRA issues

IssueDescriptionStatus
Unknown Issue TypeCDAP-13246Could not obtain issue details from JIRA