albertshau <ashau@google.com>: Author Summary
Build | Completed | Code commits | Tests |
---|---|---|---|
CDAP › RUT › #1343 | 1 week ago | 1 of 2408 failed | |
IT › UPD2 › #767 | 1 week ago |
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2. Since it hasn't been extensively tested with Spark2 pipelines, removing the default in the app and relying on the provisioner to set depending on the Spark version. For dataproc, the images with Spark3 already default the setting to true. Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app |
Testless build |
CDAP › DUT › #3157 | 1 week ago |
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2. Since it hasn't been extensively tested with Spark2 pipelines, removing the default in the app and relying on the provisioner to set depending on the Spark version. For dataproc, the images with Spark3 already default the setting to true. Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app |
1 of 2015 failed |
CDAP › URUT › #1251 | 1 week ago |
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2. Since it hasn't been extensively tested with Spark2 pipelines, removing the default in the app and relying on the provisioner to set depending on the Spark version. For dataproc, the images with Spark3 already default the setting to true. Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app |
Testless build |
CDAP › UDUT › #1241 | 1 week ago |
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2. Since it hasn't been extensively tested with Spark2 pipelines, removing the default in the app and relying on the provisioner to set depending on the Spark version. For dataproc, the images with Spark3 already default the setting to true. |
Testless build |
CDAP › DRC › #5230 | 1 week ago |
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2. Since it hasn't been extensively tested with Spark2 pipelines, removing the default in the app and relying on the provisioner to set depending on the Spark version. For dataproc, the images with Spark3 already default the setting to true. Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app |
Testless build |
CDAP › BPP › #1505 | 1 week ago |
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2. Since it hasn't been extensively tested with Spark2 pipelines, removing the default in the app and relying on the provisioner to set depending on the Spark version. For dataproc, the images with Spark3 already default the setting to true. |
Testless build |
IT › ITM › #241 | 2 weeks ago |
Merge pull request #1043 from yeweidaniel/integration-doc2
Update docs |
45 of 45 failed |
IT › UPD2 › #745 | 4 weeks ago |
Merge pull request #12940 from cdapio/feature/CDAP-16812-update-dataproc-info
CDAP-16812 update some dataproc info blurbs CDAP-16812 update some dataproc info blurbs
Updated descriptions and labels for the two different service accounts to make it more clear what they do. Also updated the init actions description to say the scripts should be on GCS, with a placeholder example. |
Testless build |
CDAP › DUT › #3132 | 4 weeks ago |
Merge pull request #12940 from cdapio/feature/CDAP-16812-update-dataproc-info
CDAP-16812 update some dataproc info blurbs CDAP-16812 update some dataproc info blurbs
Updated descriptions and labels for the two different service accounts to make it more clear what they do. Also updated the init actions description to say the scripts should be on GCS, with a placeholder example. |
2873 passed |
Build | Completed | Code commits | Tests |
---|---|---|---|
IT › ITM › #241 | 2 weeks ago |
Merge pull request #1043 from yeweidaniel/integration-doc2
Update docs |
45 of 45 failed |
CDAP › DUT › #3124 | 1 month ago |
Merge pull request #12899 from cdapio/feature/CDAP-17059-fail-middle-action-pipeline
CDAP-17059 validate no actions in the middle of a pipeline CDAP-17059 validate no actions in the middle of a pipeline
|
1 of 1993 failed |
CDAP › RUT › #1306 | 1 month ago |
CDAP-17059 validate no actions in the middle of a pipeline
Merge pull request #12899 from cdapio/feature/CDAP-17059-fail-middle-action-pipeline
CDAP-17059 validate no actions in the middle of a pipeline |
1 of 2394 failed |
HYP › BAD › #339 | 1 month ago |
Merge pull request #1268 from cdapio/feature/CDAP-17249-regex-path-docs
CDAP-17249 add docs for regex path CDAP-17249 add docs for regex path
|
955 passed |
IT › ITN › #600 | 1 month ago |
Merge pull request #1043 from yeweidaniel/integration-doc2
Update docs |
6 of 540 failed |
CDAP › RUT › #1290 | 1 month ago |
CDAP-17425 expose max preview records to sources
Merge pull request #12874 from cdapio/feature/CDAP-17425-expose-preview-info
CDAP-17425 expose max preview records to sources |
1 of 1169 failed |
CDAP › DUT › #3061 | 4 months ago |
Merge pull request #12763 from cdapio/bugfix/CDAP-17237-fix-pipeline-hconf
CDAP-17237 fix pipeline hconf clearing CDAP-17237 fix pipeline hconf clearing
Fixed a bug where the hadoop conf is cleared before adding sink specific properties. This ensures that cluster specific defaults are correctly included in the conf instead of being wiped. |
2356 passed |
CDAP › DUT › #3022 | 5 months ago |
Merge pull request #12558 from cdapio/feature/CDAP-17078-spark-stage-consolidation
CDAP-17078 consolidate stages within a group CDAP-17078 consolidate stages within a group
Changed the SparkPipelineRunner to use a CombinerDag to group sinks and their preceding transforms together. These grouped stages are treated similarly to how a single sink is treated, with flatMapToPair() called on the input RDD to transform it into a PairRDD, then calling save() to write the RDD out. This capability is off by default, but can be turned on by setting a runtime argument. Instead of flatMapToPair() calling just the sink's transform method, a new MultiSinkFunction class is used to direct incoming records to the correct logical branches of the pipeline. This requires that each input be tagged with which stage it came from (stage and port), as well as its type (output, or error). In order to do this, refactored the SparkPipelineRunner a bit to maintain the RDD<RecordInfo> for each stage rather than RDD<StructuredRecord>, as the RecordInfo class contains that extra information. Also added a MultiOutputFormat that will take the output of the MultiSinkFunction and delegate writes to the correct underlying OutputFormat. Since the OutputFormat lives in the pipeline app, this approach means CDAP datasets cannot be combined. This caused a problem with dataset lineage, since it is implemented by implemented by wrapping OutputFormats into a hidden ExternalDataset class in CDAP. Instead of doing this indirect wrapping, changed the SparkSinkFactory class to explicitly register lineage through direct calls instead of hiding it under several layers of abstraction. |
1 of 1969 failed |
CDAP › DUT › #3003 | 5 months ago |
Merge pull request #12494 from cdapio/feature/CDAP-17078-transform-executor-refactor
CDAP-17078 refactored MapReduceTransformExecutorFactory CDAP-17078 refactored MapReduceTransformExecutorFactory
Refactored the transform executor factory used in mapreduce so that much of the logic can be re-used in Spark as well. |
1 of 1965 failed |
CDAP › RUT › #1147 | 7 months ago |
CDAP-16943 use byte[] instead of ByteBuffer for record conversion
When converting from a spark Row to a cdap StructuredRecord, use a byte[] for byte fields instead of a ByteBuffer. This is because downstream plugins are less likely to have issues dealing with byte[] and because ByteBuffer is not serializable, which can cause issues in certain Spark pipelines. CDAP-16955 add metrics for records into an autojoin
Added a no-op map to auto-joiner input to count records in for the stage, similar to what is done for normal joiners. Enhanced autojoin unit tests to check values of records.in and records.out metrics Merge pull request #12311 from cdapio/bugfix/CDAP-16955-autojoin-records-in
CDAP-16955 add metrics for records into an autojoin Merge pull request #12325 from cdapio/feature/CDAP-16943-dataframes-bytearr
CDAP-16943 use byte[] instead of ByteBuffer for record conversion |
1 of 1974 failed |
Build | Completed | Code commits | Tests |
---|---|---|---|
HYP › BAD › #341 | 1 month ago |
Merge pull request #1277 from cdapio/feature/PLUGIN-102-fix-file-sink-compatibility
PLUGIN-102 fix backwards incompatibility from format fix PLUGIN-102 fix backwards incompatibility from format fix
Keep the same method signature for protected method. |
958 passed |
HYP › WT › #355 | 1 month ago |
PLUGIN-464 fix flatten to be a no-op on empty lists
Fixed a bug where input rows with an empty list would get filtered out by the flatten directive. Merge pull request #462 from data-integrations/feature/PLUGIN-464-flatten-fix
PLUGIN-464 fix flatten to be a no-op on empty lists |
396 passed |
CDAP › DUT › #3126 | 1 month ago |
CDAP-16527 include field name is casting errors for records
For logical type utility methods, including the field name when the type is not as expected. This is only required because the builder does not verify that a valid type is being set for a field. If/when that verification is added, this logic can be removed. Merge pull request #12898 from cdapio/feature/CDAP-16527-include-field-name-is-classcast
CDAP-16527 include field name in casting errors for records CDAP-16527 fix decimal getter and check messages in tests
|
2873 passed |
CDAP › RUT › #1292 | 1 month ago |
Merge pull request #12876 from cdapio/feature/CDAP-17428-consolidation-default-on
CDAP-17428 default stage consolidation to true Merge pull request #12877 from cdapio/feature/CDAP-17408-fix-aggregator-partitions
CDAP-17408 fix to honor partitions set by aggregators CDAP-17428 default stage consolidation to true
CDAP-17408 fix to honor partitions set by aggregators
|
2847 passed |
CDAP › RUT › #1234 | 4 months ago |
CDAP-17232 wait longer for program state to reduce flakiness
Increase the time to wait for program state in gateway tests from 10 seconds to 30 seconds to reduce test flakiness on slow machines. Merge pull request #12700 from cdapio/bugfix/CDAP-17232-wait-longer-for-programs
CDAP-17232 wait longer for program state to reduce flakiness |
2839 passed |
CDAP › DUT › #3028 | 5 months ago |
Merge pull request #12594 from cdapio/feature/CDAP-17078-consolidate-multi-output-sinks
CDAP-17078 consolidate multiple outputs from same sink CDAP-17078 consolidate multiple outputs from same sink
Some sinks have multiple outputs, which can cause a lot of data recomputation. Added logic to consolidate these outputs. |
2827 passed |
CDAP › DUT › #3010 | 5 months ago |
CDAP-17078 add a combiner dag to consolidate nodes
Merge pull request #12483 from cdapio/feature/CDAP-17078-dag-consolidation-logic
CDAP-17078 add a combiner dag to consolidate nodes |
2805 passed |
CDAP › RUT › #1172 | 6 months ago |
CDAP-17024 set spark sql case sensitivity
Set spark sql to be case sensitive so that autojoins behave in the same way as batchjoiners. Merge pull request #12412 from cdapio/bugfix/CDAP-17024-autojoin-case-sensitive
CDAP-17024 set spark sql case sensitivity |
2788 passed |
CDAP › RUT › #1160 | 6 months ago |
CDAP-17000 increase spark network timeout by default
Merge pull request #12376 from cdapio/feature/CDAP-17000-spark-network-timeout
CDAP-17000 increase spark network timeout by default |
2787 passed |
CDAP › DUT › #2967 | 6 months ago |
CDAP-16935 partition dataframes before join
partition dataframes right before the join using the same partitioning as the join would, except using the number of partitions specified by the plugin instead of a global number defined by the spark conf. Merge pull request #12361 from cdapio/feature/CDAP-16935-autojoin-set-partitions
CDAP-16935 partition dataframes before join |
2787 passed |
adaptive execution is meant for Spark3, but it also affects Spark2.
Since it hasn't been extensively tested with Spark2 pipelines,
removing the default in the app and relying on the provisioner
to set depending on the Spark version.
For dataproc, the images with Spark3 already default the setting
to true.
CDAP-17559 remove Spark adaptive excecution default in app