albertshau <ashau@google.com>: Author Summary

Builds triggered by albertshau <ashau@google.com>

Builds triggered by an author are those builds which contains changes committed by the author.
1406
387 (28%)
1019 (72%)

Breakages and fixes

Broken means the build has failed but the previous build was successful.
Fixed means that the build was successful but the previous build has failed.
121 (9% of all builds triggered)
117 (8% of all builds triggered)
-4
Build Completed Code commits Tests
CDAP › RUT › #1343 1 week ago
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2.
Since it hasn't been extensively tested with Spark2 pipelines,
removing the default in the app and relying on the provisioner
to set depending on the Spark version.

For dataproc, the images with Spark3 already default the setting
to true.
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app
1 of 2408 failed
IT › UPD2 › #767 1 week ago
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2.
Since it hasn't been extensively tested with Spark2 pipelines,
removing the default in the app and relying on the provisioner
to set depending on the Spark version.

For dataproc, the images with Spark3 already default the setting
to true.
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app
Testless build
CDAP › DUT › #3157 1 week ago
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2.
Since it hasn't been extensively tested with Spark2 pipelines,
removing the default in the app and relying on the provisioner
to set depending on the Spark version.

For dataproc, the images with Spark3 already default the setting
to true.
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app
1 of 2015 failed
CDAP › URUT › #1251 1 week ago
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2.
Since it hasn't been extensively tested with Spark2 pipelines,
removing the default in the app and relying on the provisioner
to set depending on the Spark version.

For dataproc, the images with Spark3 already default the setting
to true.
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app
Testless build
CDAP › UDUT › #1241 1 week ago
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2.
Since it hasn't been extensively tested with Spark2 pipelines,
removing the default in the app and relying on the provisioner
to set depending on the Spark version.

For dataproc, the images with Spark3 already default the setting
to true.
Testless build
CDAP › DRC › #5230 1 week ago
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2.
Since it hasn't been extensively tested with Spark2 pipelines,
removing the default in the app and relying on the provisioner
to set depending on the Spark version.

For dataproc, the images with Spark3 already default the setting
to true.
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app
Testless build
CDAP › BPP › #1505 1 week ago
Merge pull request #13047 from cdapio/feature/CDAP-17559-spark3-adaptive-default
CDAP-17559 remove Spark adaptive excecution default in app
CDAP-17559 remove Spark adaptive excecution default in app
adaptive execution is meant for Spark3, but it also affects Spark2.
Since it hasn't been extensively tested with Spark2 pipelines,
removing the default in the app and relying on the provisioner
to set depending on the Spark version.

For dataproc, the images with Spark3 already default the setting
to true.
Testless build
IT › ITM › #241 2 weeks ago
Merge pull request #1043 from yeweidaniel/integration-doc2
Update docs
45 of 45 failed
IT › UPD2 › #745 4 weeks ago
Merge pull request #12940 from cdapio/feature/CDAP-16812-update-dataproc-info
CDAP-16812 update some dataproc info blurbs
CDAP-16812 update some dataproc info blurbs
Updated descriptions and labels for the two different service
accounts to make it more clear what they do. Also updated the
init actions description to say the scripts should be on GCS,
with a placeholder example.
Testless build
CDAP › DUT › #3132 4 weeks ago
Merge pull request #12940 from cdapio/feature/CDAP-16812-update-dataproc-info
CDAP-16812 update some dataproc info blurbs
CDAP-16812 update some dataproc info blurbs
Updated descriptions and labels for the two different service
accounts to make it more clear what they do. Also updated the
init actions description to say the scripts should be on GCS,
with a placeholder example.
2873 passed
Build Completed Code commits Tests
IT › ITM › #241 2 weeks ago
Merge pull request #1043 from yeweidaniel/integration-doc2
Update docs
45 of 45 failed
CDAP › DUT › #3124 1 month ago
Merge pull request #12899 from cdapio/feature/CDAP-17059-fail-middle-action-pipeline
CDAP-17059 validate no actions in the middle of a pipeline
CDAP-17059 validate no actions in the middle of a pipeline
1 of 1993 failed
CDAP › RUT › #1306 1 month ago
CDAP-17059 validate no actions in the middle of a pipeline
Merge pull request #12899 from cdapio/feature/CDAP-17059-fail-middle-action-pipeline
CDAP-17059 validate no actions in the middle of a pipeline
1 of 2394 failed
HYP › BAD › #339 1 month ago
Merge pull request #1268 from cdapio/feature/CDAP-17249-regex-path-docs
CDAP-17249 add docs for regex path
CDAP-17249 add docs for regex path
955 passed
IT › ITN › #600 1 month ago
Merge pull request #1043 from yeweidaniel/integration-doc2
Update docs
6 of 540 failed
CDAP › RUT › #1290 1 month ago
CDAP-17425 expose max preview records to sources
Merge pull request #12874 from cdapio/feature/CDAP-17425-expose-preview-info
CDAP-17425 expose max preview records to sources
1 of 1169 failed
CDAP › DUT › #3061 4 months ago
Merge pull request #12763 from cdapio/bugfix/CDAP-17237-fix-pipeline-hconf
CDAP-17237 fix pipeline hconf clearing
CDAP-17237 fix pipeline hconf clearing
Fixed a bug where the hadoop conf is cleared before adding sink
specific properties. This ensures that cluster specific defaults
are correctly included in the conf instead of being wiped.
2356 passed
CDAP › DUT › #3022 5 months ago
Merge pull request #12558 from cdapio/feature/CDAP-17078-spark-stage-consolidation
CDAP-17078 consolidate stages within a group
CDAP-17078 consolidate stages within a group
Changed the SparkPipelineRunner to use a CombinerDag to group
sinks and their preceding transforms together. These grouped
stages are treated similarly to how a single sink is treated,
with flatMapToPair() called on the input RDD to transform it
into a PairRDD, then calling save() to write the RDD out.
This capability is off by default, but can be turned on by
setting a runtime argument.

Instead of flatMapToPair() calling just the sink's transform
method, a new MultiSinkFunction class is used to direct incoming
records to the correct logical branches of the pipeline.
This requires that each input be tagged with which stage it
came from (stage and port), as well as its type (output, or error).
In order to do this, refactored the SparkPipelineRunner a bit
to maintain the RDD<RecordInfo> for each stage rather than
RDD<StructuredRecord>, as the RecordInfo class contains that
extra information.

Also added a MultiOutputFormat that will take the output of the
MultiSinkFunction and delegate writes to the correct underlying
OutputFormat. Since the OutputFormat lives in the pipeline
app, this approach means CDAP datasets cannot be combined.
This caused a problem with dataset lineage, since it is
implemented by implemented by wrapping OutputFormats into a hidden
ExternalDataset class in CDAP. Instead of doing this indirect
wrapping, changed the SparkSinkFactory class to explicitly
register lineage through direct calls instead of hiding it
under several layers of abstraction.
1 of 1969 failed
CDAP › DUT › #3003 5 months ago
Merge pull request #12494 from cdapio/feature/CDAP-17078-transform-executor-refactor
CDAP-17078 refactored MapReduceTransformExecutorFactory
CDAP-17078 refactored MapReduceTransformExecutorFactory
Refactored the transform executor factory used in mapreduce so
that much of the logic can be re-used in Spark as well.
1 of 1965 failed
CDAP › RUT › #1147 7 months ago
CDAP-16943 use byte[] instead of ByteBuffer for record conversion
When converting from a spark Row to a cdap StructuredRecord,
use a byte[] for byte fields instead of a ByteBuffer.
This is because downstream plugins are less likely to have
issues dealing with byte[] and because ByteBuffer is not
serializable, which can cause issues in certain Spark pipelines.
CDAP-16955 add metrics for records into an autojoin
Added a no-op map to auto-joiner input to count records in for
the stage, similar to what is done for normal joiners.
Enhanced autojoin unit tests to check values of
records.in and records.out metrics
Merge pull request #12311 from cdapio/bugfix/CDAP-16955-autojoin-records-in
CDAP-16955 add metrics for records into an autojoin
Merge pull request #12325 from cdapio/feature/CDAP-16943-dataframes-bytearr
CDAP-16943 use byte[] instead of ByteBuffer for record conversion
1 of 1974 failed
Build Completed Code commits Tests
HYP › BAD › #341 1 month ago
Merge pull request #1277 from cdapio/feature/PLUGIN-102-fix-file-sink-compatibility
PLUGIN-102 fix backwards incompatibility from format fix
PLUGIN-102 fix backwards incompatibility from format fix
Keep the same method signature for protected method.
958 passed
HYP › WT › #355 1 month ago
PLUGIN-464 fix flatten to be a no-op on empty lists
Fixed a bug where input rows with an empty list would get
filtered out by the flatten directive.
Merge pull request #462 from data-integrations/feature/PLUGIN-464-flatten-fix
PLUGIN-464 fix flatten to be a no-op on empty lists
396 passed
CDAP › DUT › #3126 1 month ago
CDAP-16527 include field name is casting errors for records
For logical type utility methods, including the field name when
the type is not as expected.

This is only required because the builder does not verify that
a valid type is being set for a field. If/when that verification
is added, this logic can be removed.
Merge pull request #12898 from cdapio/feature/CDAP-16527-include-field-name-is-classcast
CDAP-16527 include field name in casting errors for records
CDAP-16527 fix decimal getter and check messages in tests
2873 passed
CDAP › RUT › #1292 1 month ago
Merge pull request #12876 from cdapio/feature/CDAP-17428-consolidation-default-on
CDAP-17428 default stage consolidation to true
Merge pull request #12877 from cdapio/feature/CDAP-17408-fix-aggregator-partitions
CDAP-17408 fix to honor partitions set by aggregators
CDAP-17428 default stage consolidation to true
CDAP-17408 fix to honor partitions set by aggregators
2847 passed
CDAP › RUT › #1234 4 months ago
CDAP-17232 wait longer for program state to reduce flakiness
Increase the time to wait for program state in gateway tests from
10 seconds to 30 seconds to reduce test flakiness on slow machines.
Merge pull request #12700 from cdapio/bugfix/CDAP-17232-wait-longer-for-programs
CDAP-17232 wait longer for program state to reduce flakiness
2839 passed
CDAP › DUT › #3028 5 months ago
Merge pull request #12594 from cdapio/feature/CDAP-17078-consolidate-multi-output-sinks
CDAP-17078 consolidate multiple outputs from same sink
CDAP-17078 consolidate multiple outputs from same sink
Some sinks have multiple outputs, which can cause a lot of
data recomputation. Added logic to consolidate these outputs.
2827 passed
CDAP › DUT › #3010 5 months ago
CDAP-17078 add a combiner dag to consolidate nodes
Merge pull request #12483 from cdapio/feature/CDAP-17078-dag-consolidation-logic
CDAP-17078 add a combiner dag to consolidate nodes
2805 passed
CDAP › RUT › #1172 6 months ago
CDAP-17024 set spark sql case sensitivity
Set spark sql to be case sensitive so that autojoins behave in
the same way as batchjoiners.
Merge pull request #12412 from cdapio/bugfix/CDAP-17024-autojoin-case-sensitive
CDAP-17024 set spark sql case sensitivity
2788 passed
CDAP › RUT › #1160 6 months ago
CDAP-17000 increase spark network timeout by default
Merge pull request #12376 from cdapio/feature/CDAP-17000-spark-network-timeout
CDAP-17000 increase spark network timeout by default
2787 passed
CDAP › DUT › #2967 6 months ago
CDAP-16935 partition dataframes before join
partition dataframes right before the join using the same
partitioning as the join would, except using the number of
partitions specified by the plugin instead of a global number
defined by the spark conf.
Merge pull request #12361 from cdapio/feature/CDAP-16935-autojoin-set-partitions
CDAP-16935 partition dataframes before join
2787 passed