VTK/SoftwareQuality/ContinuousBuildTestFailures: Difference between revisions

From KitwarePublic
< VTK
Jump to navigationJump to search
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
The VTK continuous build can be a powerful tool to catch defects before the nightly builds.
The [http://open.cdash.org/index.php?project=VTK&filtercount=1&showfilters=1&field1=groupname/string&compare1=61&value1=Continuous VTK continuous build] can be a powerful tool to catch defects before the nightly builds. The original purpose of the continuous build was to notify developers, via email, if their changes introduced new defects (compile errors, warnings, failing tests). Recently, the effectiveness of the continuous build has been reduced because there are so many ecurring continuous test failures.


Recurring test failures on the VTK continuous build reduce the effectiveness of this facility.
Since October, 2012, the number of recurring test failures for the continuous build has varied between 7 and 11. This effort seeks to reduce VTK Continuous recurring test failures to 0 and keep them at 0.
 
Since October, 2012, the number of recurring test failures for the continuous build has varied between 7 and 11.
 
This experiment seeks to reduce VTK Continuous recurring test failures to 0 and keep them at 0.


This experiment uses the [http://en.wikipedia.org/wiki/DMAIC#DMAIC '''DMAIC'''] methodology of the [http://en.wikipedia.org/wiki/Six_Sigma ''Six Sigma management process''] to ''"Define"'', ''"Measure"'', ''"Analyze"'', ''"Improve"'' and ''"Control"'' to resolve these issues.
This experiment uses the [http://en.wikipedia.org/wiki/DMAIC#DMAIC '''DMAIC'''] methodology of the [http://en.wikipedia.org/wiki/Six_Sigma ''Six Sigma management process''] to ''"Define"'', ''"Measure"'', ''"Analyze"'', ''"Improve"'' and ''"Control"'' to resolve these issues.
Line 17: Line 13:


==Define==
==Define==
Keep the number of VTK Continuous recurring test failures(defects) to 0. When the defects are above 0, developers find it difficult to notice that their changes introduce new defects.
Keep the number of VTK Continuous recurring test failures(defects) to 0. When defects are above 0, developers find it difficult to notice that their changes introduce new defects.


==Measure==
==Measure==
Line 36: Line 32:
Initial analysis revealed:
Initial analysis revealed:
* vtkChartsCoreCxx-TestColorTransferFunction
* vtkChartsCoreCxx-TestColorTransferFunction
** Image regression
* vtkChartsCoreCxx-TestLinePlot3D
* vtkChartsCoreCxx-TestLinePlot3D
** Image regression
* vtkChartsCoreCxx-TestMultipleRenderers
* vtkChartsCoreCxx-TestMultipleRenderers
** Invalid test results
* vtkCommonCoreTcl-otherPrint
* vtkCommonCoreTcl-otherPrint
** Crashing
* vtkCommonCoreTcl-TestEmptyInput
* vtkCommonCoreTcl-TestEmptyInput
** Crashing
* vtkCommonCoreTcl-TestSetGet
* vtkCommonCoreTcl-TestSetGet
** Crashing
* vtkFiltersHyperTreeCxx-TestHyperTreeGridTernary3DCut
* vtkFiltersHyperTreeCxx-TestHyperTreeGridTernary3DCut
** Image regression
* vtkIOExportCxx-TestStackedPlotGL2PS-VerifyRasterizedPNG
* vtkIOExportCxx-TestStackedPlotGL2PS-VerifyRasterizedPNG
** Image regression
* vtkIOMovieCxx-TestOggTheoraWriter
* vtkIOMovieCxx-TestOggTheoraWriter
** Timeout. Not clear why this test timesout on this platform.
* vtkIOSQLCxx-SQLiteTableReadWrite
* vtkIOSQLCxx-SQLiteTableReadWrite
** File regression
* vtkRenderingCoreCxx-TestSplitViewportStereoHorizontal
* vtkRenderingCoreCxx-TestSplitViewportStereoHorizontal
 
** Failing virtually everywhere
==Improve==
==Improve==
A number of [http://review.source.kitware.com/#/q/status:open,n,z Gerrit] reviews resolved the following issues.
* vtkChartsCoreCxx-TestColorTransferFunction
** [http://review.source.kitware.com/#/c/9765/ COMP: Adjust tolerance for TestColorTransferFunction]
::TestColorTransferFunction has been failing for almost 4 months on some window platforms. Increase the error tolerance to pass the test.
* vtkChartsCoreCxx-TestLinePlot3D
** Added additional Baseline
* vtkChartsCoreCxx-TestMultipleRenderers
** [http://review.source.kitware.com/#/c/9795/ Fix TestMultipleRenderers on Dash3's continuous]
::Apparently this test has been failing on dash3 since October. This problem is ultimately due to the fact that we've enabled the depth buffer within vtkOpenGLContextDevice3D::Begin(). This commit changes this behavior, so we now enable the depth buffer at the beginning of each 3D context "Draw" method, and then disable it at the end.
::While investigating this problem, I realized that we copied code from vtkOpenGLContextDevice2D and pasted it into the new 3D device class. It turns out that this is unnecessary, so this commit removes the copied & pasted code from vtkOpenGLContextDevice3D.
* vtkCommonCoreTcl-otherPrint
** [http://review.source.kitware.com/#/c/9635/ COMP: Uninitialized memory reads]
::Running the otherPrint test with valgrind uncovered several uninitialized memory read defects
* vtkCommonCoreTcl-TestEmptyInput
** [http://review.source.kitware.com/#/c/9650/ BUG: AMR Readers crash TestEmptyInput test]
::The AMR readers are crashing the TestEmptyInput test. This patch exempts those readers from the test since they do not have a SetInputData method. Other readers have been exempted in the past.
* vtkCommonCoreTcl-TestSetGet
** [http://review.source.kitware.com/#/c/9709/ BUG: TestSetGet crashes]
::TestSetGet tries to test the singleton vtkTextRenderer. This causes segfaults on some systems. This patch exempts vtkTextRenderer from the test.
* vtkFiltersHyperTreeCxx-TestHyperTreeGridTernary3DCut
** [http://review.source.kitware.com/#/c/9655/ COMP: Regression test failures on some platforms]
::Some windows and linux systems fail the regression tests by minor amounts. This patch specifies a larger image tolerance.
::In particular, the VTK continuous build has a recurring regression failure for one of these tests. With the increased tolerance, the test passes.
::Tolerances were set so that the tests pass on a Fedora/Mesa system and were verified on a Fedora/OpenGL system.
* vtkIOExportCxx-TestStackedPlotGL2PS-VerifyRasterizedPNG
** Added additional baseline.
::Note from David Lonie: Interesting...the same test in another build on the same machine passes, using the same ghostscript executable to produce the image:
::http://open.cdash.org/testDetails.php?test=161292428&build=2810949
:: Only difference is static vs shared. The image produced by the failing test is valid, ghostscript is just adding a couple extra pixels to the width/height. I've added a new baseline so this should be good to go on the next build.
* vtkIOMovieCxx-TestOggTheoraWriter
** [http://review.source.kitware.com/#/c/9776/ Fix uninitialized variable occurrences in oggtheora]
* vtkIOSQLCxx-SQLiteTableReadWrite
** [http://review.source.kitware.com/#/c/9722/ COMP: ASCII file compare fails on Windows vs Linux]
::The ASCII file compare failed when the test was run on windows machines. This patch provides an ASCII compare function that gnores cr/lf and lf differences. If this new compare function proves useful it may be include in the Test utilities.
* vtkRenderingCoreCxx-TestSplitViewportStereoHorizontal
** [http://review.source.kitware.com/#/c/9747 Turn off multisamples so antialiasing doesn't cause test failures]
::TestSplitViewportStereoHorizontal is failing on several platforms.
::Some linux platforms failed because their graphics cards don't do antialiasing. This commit turns of antialiasing on ALL platforms provides a new non-antialiased baseline, and increases the error threshold.
::All windows machines failed because SetStereoRender was called before the StereoType was set. This commit should fix that, because it calls SetStereoType before SetStereoRender.
::All OS X machines still fail because the ViewAngle is, for some unknown reason, not interpreted properly for this test on that platform.
==Control==


==Control==
As of February 15, 2013 the continuous build defects are 0!


Once reduced to 0, developer diligence is needed to keep the defects to 0. The burden is on the Gerrit reviewers.
Once reduced to 0, developer diligence is needed to keep the defects to 0. The burden is on the Gerrit reviewers.

Latest revision as of 20:13, 15 February 2013

The VTK continuous build can be a powerful tool to catch defects before the nightly builds. The original purpose of the continuous build was to notify developers, via email, if their changes introduced new defects (compile errors, warnings, failing tests). Recently, the effectiveness of the continuous build has been reduced because there are so many ecurring continuous test failures.

Since October, 2012, the number of recurring test failures for the continuous build has varied between 7 and 11. This effort seeks to reduce VTK Continuous recurring test failures to 0 and keep them at 0.

This experiment uses the DMAIC methodology of the Six Sigma management process to "Define", "Measure", "Analyze", "Improve" and "Control" to resolve these issues.

The basic methodology (from Wikipedia) consists of the following five steps:

  • Define process goals that are consistent with customer demands and VTK's strategy.
  • Measure key aspects of the current process and collect relevant data.
  • Analyze the data to verify cause-and-effect relationships. Determine what the relationships are, and attempt to ensure that all factors have been considered.
  • Improve or optimize the process.
  • Control to ensure that any deviations from target are corrected before they result in defects. Set up pilot runs to establish software quality, move on to production, set up control mechanisms and continuously monitor the process.

Define

Keep the number of VTK Continuous recurring test failures(defects) to 0. When defects are above 0, developers find it difficult to notice that their changes introduce new defects.

Measure

As of February 1, 2013, there were 11 defects on the one VTK Continuous build:

  • vtkChartsCoreCxx-TestColorTransferFunction
  • vtkChartsCoreCxx-TestLinePlot3D
  • vtkChartsCoreCxx-TestMultipleRenderers
  • vtkCommonCoreTcl-otherPrint
  • vtkCommonCoreTcl-TestEmptyInput
  • vtkCommonCoreTcl-TestSetGet
  • vtkFiltersHyperTreeCxx-TestHyperTreeGridTernary3DCut
  • vtkIOExportCxx-TestStackedPlotGL2PS-VerifyRasterizedPNG
  • vtkIOMovieCxx-TestOggTheoraWriter
  • vtkIOSQLCxx-SQLiteTableReadWrite
  • vtkRenderingCoreCxx-TestSplitViewportStereoHorizontal

Analyze

Initial analysis revealed:

  • vtkChartsCoreCxx-TestColorTransferFunction
    • Image regression
  • vtkChartsCoreCxx-TestLinePlot3D
    • Image regression
  • vtkChartsCoreCxx-TestMultipleRenderers
    • Invalid test results
  • vtkCommonCoreTcl-otherPrint
    • Crashing
  • vtkCommonCoreTcl-TestEmptyInput
    • Crashing
  • vtkCommonCoreTcl-TestSetGet
    • Crashing
  • vtkFiltersHyperTreeCxx-TestHyperTreeGridTernary3DCut
    • Image regression
  • vtkIOExportCxx-TestStackedPlotGL2PS-VerifyRasterizedPNG
    • Image regression
  • vtkIOMovieCxx-TestOggTheoraWriter
    • Timeout. Not clear why this test timesout on this platform.
  • vtkIOSQLCxx-SQLiteTableReadWrite
    • File regression
  • vtkRenderingCoreCxx-TestSplitViewportStereoHorizontal
    • Failing virtually everywhere

Improve

A number of Gerrit reviews resolved the following issues.

TestColorTransferFunction has been failing for almost 4 months on some window platforms. Increase the error tolerance to pass the test.
Apparently this test has been failing on dash3 since October. This problem is ultimately due to the fact that we've enabled the depth buffer within vtkOpenGLContextDevice3D::Begin(). This commit changes this behavior, so we now enable the depth buffer at the beginning of each 3D context "Draw" method, and then disable it at the end.
While investigating this problem, I realized that we copied code from vtkOpenGLContextDevice2D and pasted it into the new 3D device class. It turns out that this is unnecessary, so this commit removes the copied & pasted code from vtkOpenGLContextDevice3D.
Running the otherPrint test with valgrind uncovered several uninitialized memory read defects
The AMR readers are crashing the TestEmptyInput test. This patch exempts those readers from the test since they do not have a SetInputData method. Other readers have been exempted in the past.
TestSetGet tries to test the singleton vtkTextRenderer. This causes segfaults on some systems. This patch exempts vtkTextRenderer from the test.
Some windows and linux systems fail the regression tests by minor amounts. This patch specifies a larger image tolerance.
In particular, the VTK continuous build has a recurring regression failure for one of these tests. With the increased tolerance, the test passes.
Tolerances were set so that the tests pass on a Fedora/Mesa system and were verified on a Fedora/OpenGL system.
  • vtkIOExportCxx-TestStackedPlotGL2PS-VerifyRasterizedPNG
    • Added additional baseline.
Note from David Lonie: Interesting...the same test in another build on the same machine passes, using the same ghostscript executable to produce the image:
http://open.cdash.org/testDetails.php?test=161292428&build=2810949
Only difference is static vs shared. The image produced by the failing test is valid, ghostscript is just adding a couple extra pixels to the width/height. I've added a new baseline so this should be good to go on the next build.
The ASCII file compare failed when the test was run on windows machines. This patch provides an ASCII compare function that gnores cr/lf and lf differences. If this new compare function proves useful it may be include in the Test utilities.
TestSplitViewportStereoHorizontal is failing on several platforms.
Some linux platforms failed because their graphics cards don't do antialiasing. This commit turns of antialiasing on ALL platforms provides a new non-antialiased baseline, and increases the error threshold.
All windows machines failed because SetStereoRender was called before the StereoType was set. This commit should fix that, because it calls SetStereoType before SetStereoRender.
All OS X machines still fail because the ViewAngle is, for some unknown reason, not interpreted properly for this test on that platform.

Control

As of February 15, 2013 the continuous build defects are 0!

Once reduced to 0, developer diligence is needed to keep the defects to 0. The burden is on the Gerrit reviewers.