Notifications

Clear all

15th Feb 2024: Astro Pixel Processor 2.0.0-beta29 released - macOS native File Chooser, macOS CMD-Q fixed, read-only Fits on network fixed and other bug fixes

7th December 2023: added payment option Alipay to purchase Astro Pixel Processor from China, Hong Kong, Macau, Taiwan, Korea, Japan and other countries where Alipay is used.

[Sticky] INTEGRATION SPEED: improvements in version 1.055 and the influence of CPU cores, Memory and HDD on performance

FAQ - Frequently Asked Questions

Last Post by Mabula-Admin 7 years ago

4 Posts

2 Users

2 Likes

38.7 K Views

RSS

Mabula-Admin

(@mabula-admin)

Universe Admin

Joined: 7 years ago

Posts: 4366

Topic starter October 12, 2017 14:47

Hi all,

Several APP users indicated that APP was really slow with integration when they tried to integratie stacks of several 100s of frames. (APP versions up until and including 1.054)

In version 1.055 (soon to be released), I have improved integration speed significantly using 2 upgrades in the integration engine.

First, reading from the file mapper on the harddisk, (containing all data/frames/layers of the integration) is now simultaneous with calculating the pixel stacks. This increases speed immediately with a factor of 1.5-1.75x depending on hardware configuration.
Furthermore, because the integration is strongly limited by harddrive performance in the old integration engine, APP will now greatly reduce the amount of IO calls needed for integration by dynamically adjusting the Read and Write buffers for the file mapper. APP used to have fixed read/write buffers of only 8KiloBytes. Depending on the amount of system memory, these buffers can now become as large 1024 KiloByte or 1 MegaByte. If you integrate hundreds of frames, integration speed will be increased now with a factor of 2-10x on both SSD and conventional SATA/PATA drives from this improvement alone.

These two upgrades combined will give a speed increase with a factor of 2-20x largely depending on the hardware, which is a huge performance increase. The actual speed increase depends on lots of factors, like the amount and speed of CPU cores, the amount of memory available, harddisk latency and, the actual maximum read/write speeds of your harddisk.

Since numbers and graphs illustrate the differences better, I have been running lots of tests with different settings.

First I presenst the results with the old integration module, before showing the results with the new integration module.
To compare the speed with the old and new integration module I have integrated 100x 16MegaPixel RGB frames of a Nikon D5100 camera. In bytes this means data integration on 18GB worth of data. If Multi-Band Blending is used, this is 1.5x 18GB = 27 GB worth of data that needs to be integrated.
Finally I will show the results of integrating 400 frames with the new integration module on both an SSD and a conventional harddrive.

I will show the final conclusions as well here, because it's a lot of information to consume and not everyone will be interested to see all the test results:

Final conclusions:

the new integration module gives a very nice speed increase in integration, especially for APP users that integrate on conventional harddrives.
the old integration module took 142 minutes to integrate 100 frames, and probably would have taken about 9-10 hours to integrate 400 frames.
the new integration module can integrate 400 frames of 16Mega Pixels in 3 RGB channels on a conventional drive in only 27 minutes. This is a speed increase of 20x with only 4GB of RAM memory allocated to APP.
more cpu cores will increase integration speed and
more RAM memory allocated to APP will as well. Allthough the effect of more RAM memory wasn't tested, it's clear it will. With more RAM memory, APP can make the READ/WRITE IO buffers larger and that's one of the 2 improvements in the new integration module.

Clearly, this is a huge improvement. If you take the whole process into account from frame loading, calibration, star analysis, registration, normalization and integration. APP should now integrate stacks of several hundred frames within a couple of hours on a conventional drive.

Details of the test PC:

ASUS PRIME X399-A

AMD Ryzen Threadripper 1950X 16cores / 32 threads running at 3,8 GHz.

32GB DDR4 2400MHZ quad channel

Harddrives used in testing:

SSD : M2 Samsung SSD 960 EVO

Conventional drive : SATA-600 Western Digital WDC WD20EARS-00MVWB0

Graphics card is not shown since the GPU is not used yet in Astro Pixel Processor (but first testing with GPU enabled modules will start soon)

Operating System: Windows 10 professional

Details of testing:

Since this is very new hardware with lots of CPU cores and lots of memory, I will be running APP with only 4 cpu threads enabled and 4GB of memory in all tests, (except the last test to illustrate tinfluence of more CPU cores). This way, CPU power and Memory usage in the tests, will be much more comparable to the hardware specifications of the PC of the average APP user.

In all tests,

Frame loading times reported, actually means loading the frames and applying registration parameters (so data interpolation with Lanczos-3) and applying normalization parameters.
I use the reference composition mode to ensure that the field of view of the integration is exactly 16 Mega Pixels in 3 color channels. This equates to 184 MegaBytes of data per frame, since the integration is done in 32bits depth.
and Lanczos-3 with no under-overshoot enabled

So 100 frames * 184 MegaBytes = 18GB of data. With MBB enabled this is 27GB of data that needs to be processed.

So 400 frames * 184 MegaBytes = 72GB of data. With MBB enabled this is 108GB of data that needs to be processed.

All graphs that are shown, show memory usage in time. You'll see separate regions in the graphs, first region is always frame loading, then LNC if enabled, then the actual integration, and the final peak is always analysis of the integration result for location, scale, noise and SNR.

All test duration times are reported in minutes for easy comparison.

Test 1: OLD integration module

Integrate 100 frames

no MBB

no LNC

average integration

no outlier rejection

using the SATA 600 conventional drive:

frame loading: 12 min

actual integration: 2min ! completely from cache !

100f 4GB 4Th noMBB noLNC noOR conventional ref LZ3

using the M2 SSD drive:

frame loading: 12 min

actual integration: 2 min

100f 4GB 4Th noMBB noLNC noOR SSD ref LZ3

Test 2: OLD integration module

Integrate 100 frames

no MBB

no LNC

average integration

outlier rejection: winsorized 2 iterations with kappa 3

using the SATA 600 conventional drive:

frame loading: 12 min

actual integration: 6 min ! integration completely from cache, no IO reads at all during integration !

100f 4GB 4Th noMBB noLNC Winsor2x3 conventional ref LZ3

using the M2 SSD drive:

frame loading: 12 min

actual integration: 6 min

100f 4GB 4Th noMBB noLNC Winsor2x3 SSD ref LZ3

Test 3: OLD integration module

Integrate 100 frames

MBB 10%

LNC 1x 4th degree

average integration

outlier rejection: winsorized 2 iterations with kappa 3

using the SATA 600 conventional drive:

frame loading: 13 min

LNC: 10 min

actual integration: 142 min = 2 hours 22 min ! REALLY SLOW !

100f 4GB 4Th MBB10 LNC1x4th Winsor2x3 conventional ref LZ3

using the M2 SSD drive:

frame loading: 13 min

LNC: 7 min

actual integration: 9 min

100f 4GB 4Th MBB10 LNC1x4th Winsor2x3 SSD ref LZ3

Summary of performance using the old integration module using these three tests:

If the stack/integration size is less then the amount of system memory minus the amount of memory needed for the Operating System to do it's regular work, then integration will be done using memory caching if the Operating System's kernel supports this (most OS's will). This will give fast integration probably only limited by CPU power. This was the case for test 1 and 2 where integration on the SSD and the conventional harddrive were equally fast. This memory caching feature will only really be used if you have at least 16GB of memory in your system and will only benefit large stacks if you have much more memory installed. So for a stack of 100GB you would need to have 128GB of memory installed to have the integration done using OS memory caching.
Test 3 wasn't performed using memory cache, in this case, the slow integration speed on the conventional harddrive reveals itself. Actual integration on the conventional drive took 142 minutes, or 2 hours and 22 minutes. On the SSD drive it took only 9 minutes. This slow integration was reported by several users.
Frame loading is the same on the different harddrives. So IO on the harddisk doesn't have much influence here. In this case NEF frames were used, and NEF read speed is mainly limited by the NEF raw conversion itself.
LNC is a process that happens between frame loading and the actual integration and it will benefit clearly from a faster harddrive.

The following test is a repeat of test 3, because that wasn't using the OS memory caching, now showing the behaviour of the new integration module with the 2 mentioned upgrades.

Test 3: NEW integration module

Integrate 100 frames

MBB 10%

LNC 1x 4th degree

average integration

outlier rejection: winsorized 2 iterations with kappa 3

using the SATA 600 conventional drive:

frame loading: 13 min

LNC: 9 min

actual integration: 7 minutes ! ( old integration module took 142 minutes) IO on the harddrive is used optimally now.

100f 4GB 4Th MBB10 LNC1x4th Winsor2x3 conventional ref LZ3 NEW

using the M2 SSD drive:

frame loading: 13 min

LNC: 6 min

actual integration: 6 min

100f 4GB 4Th MBB10 LNC1x4th Winsor2x3 SSD ref LZ3 NEW

First conclusions:

The new integration module managed to improve speed especially on the conventional drive

speed improvement on conventional harddrive: form 142 minutes to only 7 minutes... That is a factor of 20x times faster !
speed improvement on the SSD drive: from 9 minutes to 6 minutes... A modest speed increase for already fast integration.

The integration speed on the conventional and SSD drive are almost the same now, which must mean that integration speed now becomes much more limited to CPU power, which I'll demonstrate in test 5. Remember, in these tests I have used only 4 cpu threads and 4 GB of memory.

The next test is using the new integration module and we are now going to integrate 400 frames instead of 100 frames. We are going to check that 4 times as much frames will take 4 times longer to make sure that the application isn't slowing down for whatever reason. This will be a repeat of test 2, but with 400 frames.

Test 4: NEW integration module

Integrate 400 frames

no MBB

no LNC

average integration

outlier rejection: winsorized 2 iterations with kappa 3

using the SATA 600 conventional drive:

frame loading: 45 min

actual integration: 27 min

using the M2 SSD drive:

frame loading: 45 min

actual integration: 26 min

400f 4GB 4Th noMBB noLNC Winsor2x3 conventional ref LZ3 NEW

Conclusion: SSD and conventional drive have identical integration speed in this case with the new integration module. So it's clearly limited by other factors, on of them must be CPU power in this case. I showed only 1 graph, because they are more or less identical. Compared to test 2, we also see that frame loading and the actual integration scale linearly between integrating 100 frames or 400 frames. Frame loading duration increased from 12 minutes to 45 minutes, and integration duration increased from 6 to 26 minutes.

Test 5 is a repeat of test 4, but now with all 32 CPU threads enabled to check for dependence on CPU power, only shown for the SSD drive:

Test 5: NEW integration module with 32 CPU threads enabled in APP

Integrate 400 frames

no MBB

no LNC

average integration

outlier rejection: winsorized 2 iterations with kappa 3

using the M2 SSD drive:

frame loading: 43 min

actual integration: 9 min , compare to 26minutes using only 4 cpu threads

400f 4GB 4Th noMBB noLNC Winsor2x3 SSD ref LZ3 NEW 32cores

The additional left graph shows CPU usage, showing that all of the 32 cpu threads are used extensively by APP during integration.

Test 5 confirms, integration speed using the new integration module is now limited by CPU power.

Final conclusions:

the new integration module gives a very nice speed increase in integration, especially for APP users that integrate on conventional harddrives.
the old integration module took 142 minutes to integrate 100 frames, and probably would have taken about 9-10 hours to integrate 400 frames.
the new integration module can integrate 400 frames of 16Mega Pixels in 3 RGB channels on a conventional drive in only 27 minutes. This is a speed increase of 20x with only 4GB of RAM memory allocated to APP.
more cpu cores will increase integration speed and
more RAM memory allocated to APP will as well. Allthough the effect of more RAM memory wasn't tested, it's clear it will. With more RAM memory, APP can make the READ/WRITE IO buffers larger and that's one of the 2 improvements in the new integration module.

Additional information on integration: one of the main purposes of integrating our images is to reduce noise in the resulting integration and thereby increasing the Signal to Noise ratio.

This particular dataset was made using a Nikon D5100 BCF mod with a 50mm Nikkor objective on a simple tripod. Single exposures were only 2,5 seconds on ISO 3200. Each image was shot with 2 seconds in between.

To illustrate how the noise droppes in integration i hereby show 3 images.

single exposure of 2,5 seconds
integration of 100 exposures giving total integration time of 250 seconds
integration of 400 exposures giving total integration time of 1000 seconds.

To properly compare the frames, all 3 were corrected for gradients and background calibrated first. Then the single image and the 2 integrations were registered and normalized for location and scale/dispersion (multiple-scale) with each other and then saved. Shown are screenshots of APP showing a zoomed in part of the registered-normalized frames with the histogram visible. (The histogram that APP shows, is only of the data that is visible in the image viewer window, so in this case, the histogram shows the histogram of the zoomed-in data.).

The normalization ensures that we can properly compare the images using the exact same stretch parameters. Because I normalized for scale/dispersion, I basically normalized for noise. Noise is highly correlated to the dispersion of the data, because we only have stars and sky background. If there was clear nebulosity, the correlation with noise whould have been much lower. So the 3 images shown will show similar dispersion/noise and thus are visually compared for the amount of signal that is showing due to integrating more frames.

Single frame:

100 frame integration:

400 frame integration:

We can clearly see that 1 frame compared to 100 frames gives a dramatic increase in signal shown. 400 frames relative to 100 frames does show an increase but not as dramatic.

The maximal noise drop theoretically is the square root of the number of images used in the integration.

So 100 frames would give a maximal noise drop of 10. APP analyses each integration, the results can be found in the fits header:

SIMPLE =                    T / Java FITS: Thu Oct 12 14:40:46 CEST 2017
    BITPIX =                  -32 / bits per data value
    NAXIS   =                    3 / number of axes
    NAXIS1 =                 4928 / size of the n'th axis
    NAXIS2 =                 3264 / size of the n'th axis
    NAXIS3 =                    3 / size of the n'th axis
    EXTEND =                    T / Extensions are permitted
    BSCALE =                  1.0 / scale factor
    BZERO   =                  0.0 / no offset
    DATE    = '2017-10-12T13:04:00' / creation date of stack
    SOFTWARE= 'Astro Pixel Processor by Aries Productions' / software
    VERSION = '1.054   '           / Astro Pixel Processor version
    STACK   = 'stack   '           / stack of lights
    CFAIMAGE= 'no      '           / Color Filter Array pattern
    EXPTIME =                254.0 / exposure time (s)
    LOK-1   = ' 1,76E-02'          / lokation of channel 1
    LOK-2   = ' 1,76E-02'          / lokation of channel 2
    LOK-3   = ' 1,78E-02'          / lokation of channel 3
    SCALE-1 = ' 8,62E-04'          / dispersion of channel 1
    SCALE-2 = ' 6,92E-04'          / dispersion of channel 2
    SCALE-3 = ' 8,21E-04'          / dispersion of channel 3
    NOISE-1 = ' 3,45E-04'          / noise level of channel 1
    NOISE-2 = ' 2,99E-04'          / noise level of channel 2
    NOISE-3 = ' 4,80E-04'          / noise level of channel 3
    SNR-1   = ' 2,70E+00'          / Signal to Noise Ratio of channel 1
    SNR-2   = ' 2,66E+00'          / Signal to Noise Ratio of channel 2
    SNR-3   = ' 2,13E+00'          / Signal to Noise Ratio of channel 3
    medNR-1 = ' 9,61E+00'          / median noise reduction, channel 1
    medNR-2 = ' 9,10E+00'          / median noise reduction, channel 2
    medNR-3 = ' 8,95E+00'          / median noise reduction, channel 3
    medENR-1= ' 1,95E+00'          / effective median noise reduction, channel 1
    medENR-2= ' 1,95E+00'          / effective median noise reduction, channel 2
    medENR-3= ' 2,47E+00'          / effective median noise reduction, channel 3
    refNR-1 = ' 9,67E+00'          / reference noise reduction, channel 1
    refNR-2 = ' 9,10E+00'          / reference noise reduction, channel 2
    refNR-3 = ' 8,92E+00'          / reference noise reduction, channel 3
    refENR-1= ' 1,97E+00'          / effective reference noise reduction, channel 1
    refENR-2= ' 1,95E+00'          / effective reference noise reduction, channel 2
    refENR-3= ' 2,48E+00'          / effective reference noise reduction, channel 3

refNR-x : shows almost a noise drop for all channels of about 9, so that's nearly perfect.

400 frame would give a maximal noise drop of 20. Which is twice higer than the 100 frames:

SIMPLE =                    T / Java FITS: Fri Oct 13 14:08:33 CEST 2017
    BITPIX =                  -32 / bits per data value
    NAXIS   =                    3 / number of axes
    NAXIS1 =                 4928 / size of the n'th axis
    NAXIS2 =                 3264 / size of the n'th axis
    NAXIS3 =                    3 / size of the n'th axis
    EXTEND =                    T / Extensions are permitted
    BSCALE =                  1.0 / scale factor
    BZERO   =                  0.0 / no offset
    DATE    = '2017-10-13T19:02:48' / creation date of stack
    SOFTWARE= 'Astro Pixel Processor by Aries Productions' / software
    VERSION = '1.054   '           / Astro Pixel Processor version
    STACK   = 'stack   '           / stack of lights
    CFAIMAGE= 'no      '           / Color Filter Array pattern
    EXPTIME =               1014.0 / exposure time (s)
    LOK-1   = ' 1,77E-02'          / lokation of channel 1
    LOK-2   = ' 1,77E-02'          / lokation of channel 2
    LOK-3   = ' 1,78E-02'          / lokation of channel 3
    SCALE-1 = ' 8,05E-04'          / dispersion of channel 1
    SCALE-2 = ' 6,77E-04'          / dispersion of channel 2
    SCALE-3 = ' 5,98E-04'          / dispersion of channel 3
    NOISE-1 = ' 1,88E-04'          / noise level of channel 1
    NOISE-2 = ' 1,68E-04'          / noise level of channel 2
    NOISE-3 = ' 2,64E-04'          / noise level of channel 3
    SNR-1   = ' 4,86E+00'          / Signal to Noise Ratio of channel 1
    SNR-2   = ' 4,58E+00'          / Signal to Noise Ratio of channel 2
    SNR-3   = ' 3,29E+00'          / Signal to Noise Ratio of channel 3
    medNR-1 = ' 1,78E+01'          / median noise reduction, channel 1
    medNR-2 = ' 1,64E+01'          / median noise reduction, channel 2
    medNR-3 = ' 1,64E+01'          / median noise reduction, channel 3
    medENR-1= ' 3,37E+00'          / effective median noise reduction, channel 1
    medENR-2= ' 3,43E+00'          / effective median noise reduction, channel 2
    medENR-3= ' 3,29E+00'          / effective median noise reduction, channel 3
    refNR-1 = ' 1,78E+01'          / reference noise reduction, channel 1
    refNR-2 = ' 1,64E+01'          / reference noise reduction, channel 2
    refNR-3 = ' 1,65E+01'          / reference noise reduction, channel 3
    refENR-1= ' 3,36E+00'          / effective reference noise reduction, channel 1
    refENR-2= ' 3,43E+00'          / effective reference noise reduction, channel 2
    refENR-3= ' 3,32E+00'          / effective reference noise reduction, channel 3

refNR-x shows a noise drop of about 16-18, so almost reaching 20 😉

Finally. here are 2 JPGs of a single frame and the 400 frame integration:

St avg 1014.0s WSC 2 3.0 x 1.0 LZ3 NS ref eq add sc BWMV nor AA RL noMBB cbg reg norm St

xsnrg reacted

ReplyQuote

Mabula-Admin

(@mabula-admin)

Universe Admin

Joined: 7 years ago

Posts: 4366

Topic starter October 14, 2017 14:20

Let me know if all of this is clear and if there are any questions.

Kind regards,

Mabula

ReplyQuote

xsnrg

(@xsnrg)

Red Giant

Joined: 7 years ago

Posts: 35

October 14, 2017 15:11

Very much looking forward to 1.055. Nice work!

Mabula-Admin reacted

ReplyQuote

Mabula-Admin

(@mabula-admin)

Universe Admin

Joined: 7 years ago

Posts: 4366

Topic starter October 17, 2017 13:29

Thanks !

ReplyQuote

RE: Request that you don't throw up a dialogs mid-process that interrupt them, but you ask beforehand if possible.

Many improvements in the newest beta 29, but disappoint...

By Daniel Heyer , 9 hours ago
RE: APP removing Ra&Dec When Stacking - Please Change.

I also wish APP would copy the original fits header dat...

By Daniel Heyer , 9 hours ago
RE: Release 2.0.0 forecast

Hi APP team, any news regarding Release 2.0.0 and man...

By Ferestar , 2 days ago
RE: Registration baffled again

I had the same problem a few months ago. Read this thre...

By anofeles , 3 days ago
Registration baffled again

I thought I had solved this problem by reducing the num...

By Theodore Mitrou , 3 days ago
RE: Error messages

Hi I am unsure of what exactly this explaination is sup...

By Ricb , 4 days ago
Star Analysis

Hi there, in older versions of APP I could check my fra...

By Centaurus , 5 days ago
Master Dark

Hi, I need help please. I wanted to update my MD librar...

By Centaurus , 5 days ago
RE: Need License Code

I misplaced my licence code, how do I go about recoveri...

By Thomas Brooks , 5 days ago
Trying to register a trial of APP

Just wanted to notice users and also APP team that is n...

By constantin ene , 5 days ago
Weird distortion unless using mosaic (not using FAT32)

Hi all, Since a couple of months, I have been getting...

By Wim Delen , 6 days ago
RE: Crosshatch pattern when drizzling using Bayer Xtrans drizzle?

Thank you but what causes the crosshatch pattern and wh...

By Nick Axaris , 6 days ago
RE: Crosshatch pattern when drizzling using Bayer Xtrans drizzle?

Here are a couple of videos that may help you.The first...

By anofeles , 6 days ago
RE: UNIX-style File Dialog Boxes in APP 2.0 Beta

I just downloaded the beta29 version and saw that the n...

By jcnowack , 6 days ago
Improved Quality Score ranking

I have instances where my image is poor due to wind and...

By Craig Harding , 6 days ago
RE: Saving Projects

Yes, development is incredibly slow!😕

By Robsi , 1 week ago
RE: Easy settings reset at startup ?

This might be relevant to you too:

By mestutters , 1 week ago
RE: Easy settings reset at startup ?

@mestutters thanks very much - this looks exactly like ...

By Kelvin Hennessy , 1 week ago
RE: Saving Projects

We soon will have second anniversary of APP 2.0.0 beta....

By JuergenN , 1 week ago
RE: Baffled with Registration

Hi there. It’s happening exactly the same to me since f...

By javierpolancos , 1 week ago
Noob with Java error

This is only my second attempt to use APP. I have 146...

By Perry Smith , 1 week ago
RE: Sliders unreadable

I have the same issue as @5baco in that the CFG screen ...

By jgwyther , 1 week ago
RE: Baffled with Registration

Actually fixed my own problem by reducing the max star ...

By Theodore Mitrou , 2 weeks ago
RE: General "cancel" function?

@mabula-admin have you seen my above comments ? Do you ...

By Kelvin Hennessy , 2 weeks ago
RE: FITS header information

Looks like the image coordinates are still not in the F...

By HE04 , 2 weeks ago
RE: Need License Code

I need a copy of my license code order #25726 my mai...

By Florian Bergmann , 3 weeks ago
RE: General "cancel" function?

@mabula-admin this just happened to me again on a mosai...

By Kelvin Hennessy , 3 weeks ago
APP removing Ra&Dec When Stacking - Please Change.

I often use APP for stacking (and other factors) but th...

By Gilmour Dickson , 1 month ago
RE: Sliders unreadable

Hi Mabula, here are the Properties of the pc. Regar...

By 5Baco , 1 month ago
RE: UNIX-style File Dialog Boxes in APP 2.0 Beta

Hi Will @gulielmus, Exactly ! thanks 😉 Please try...

By Mabula-Admin , 2 months ago
RE: Release 2.0.0 forecast

Hi @ferestar, We are working as fast as we can to del...

By Mabula-Admin , 6 months ago
RE: Request that you don't throw up a dialogs mid-process that interrupt them, but you ask beforehand if possible.

Also APP needs to just skip over a file that can not be...

By Daniel Heyer , 10 months ago
RE: Error messages

I have had the same camera. I used an offset of 13.Dr. ...

By Heno , 2 years ago
RE: FITS header information

Great thank you!

By Martin Scott , 2 years ago

15 Forums
4,615 Topics
29.7 K Posts
135 Online
31.3 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed