Integration speed s...
 
Share:
Notifications
Clear all

May 27 2026 APP 2.0.0-beta45 has been released !

Fully Multi-Threaded LNC, many improvements for the registration engine, platform upgrade, and further tuning of internal memory consumption and memory release back to OS.

Apr 14 2026: Google Pay, Apple Pay & WeChat Pay added as payment options

Update on the 2.0.0 release & the full manual

We are getting close to the 2.0.0 stable release and the full manual. The manual will soon become available on the website and also in PDF format. Both versions will be identical and once released, will start to follow the APP release cycle and thus will stay up-to-date to the latest APP version.

Once 2.0.0 is released, the price for APP will increase. Owner's license holders will not need to pay an upgrade fee to use 2.0.0, neither do Renter's license holders.

 

Integration speed seems slow

14 Posts
4 Users
9 Reactions
6,435 Views
(@lead_weight)
Red Giant
Joined: 9 years ago
Posts: 34
Topic starter  

I've been extremely happy with the quality of my integrations using APP. All the setup, steps 1-5 process within minutes. But when I get to integration for a stack of 300 images it can take overnight. I have a 6 core 3.5 Ghz Mac Pro. I see CPU usage at less than 10%. Some of the early processes like Analyze Stars uses 100% cpu. Comparable projects in PixInsight with 300 stacked images takes about 2 hours on my computer.



   
Mabula-Admin reacted
ReplyQuote
(@lead_weight)
Red Giant
Joined: 9 years ago
Posts: 34
Topic starter  

I'm not sure what changed, but I just processed 130 images today, and it only took an hour. CPU performance chart attached. 

The parts prior to the spike are creating the masters. The spike at 90% CPU usage is star registration and the big chunk past that is integration of all calibrated images (about 75% CPU usage, with the last bit being 50% cpu for pixel integration for the final image. This seemed reasonable in terms of overall time spent. Not sure how you go about utilizing more multithreading to saturate the CPUs.

Screen Shot 2017 10 07 at 2.13.51 PM


   
Mabula-Admin reacted
ReplyQuote
(@mabula-admin)
Universe Admin
Joined: 9 years ago
Posts: 5318
 
Posted by: Lead_weight

I've been extremely happy with the quality of my integrations using APP. All the setup, steps 1-5 process within minutes. But when I get to integration for a stack of 300 images it can take overnight. I have a 6 core 3.5 Ghz Mac Pro. I see CPU usage at less than 10%. Some of the early processes like Analyze Stars uses 100% cpu. Comparable projects in PixInsight with 300 stacked images takes about 2 hours on my computer.

Hi Lead_weight,

Thank you for bringing this to my attention.

First of all, integration speed is dependant on several factors given a fixed number of frames to integrate and a fixed number of pixels per frame. Factors are:

  • harddisk speed, integration as it is implemented in APP is quite IO intensive. A faster hard drive with low latency will make a lot of difference.
  • the use of an outlier rejection filter in integration. Without outlier rejection, integration speed is the highest. Sigma and Winsorized rejection are a bit slower. The linear fit rejection is really slow with a lot of frames currently (and I don't recommend using it, because APP has LNC), in my priorities list it's scheduled for an upgrade for speed though.
  • LNC can by time consuming in itself if you use a lot of iterations. All layers/frames of the stack are iteratively adjusted for illumination differences. 
  • Number and speed of of cpus have a lot of influence off course.
  • average or median integration has little influence.
  • The loading of the frames is slowest with lanczos interpolation (but sharpest) and fastest with nearest neighbour interpolation (but worst).

So depending on your hardware configuration and the settings that you used, this could explain the long integration time that you experienced.

Do you recall the settings?

And does the Mac Pro has a fast SSD harddisk?

Kind regards,

Mabula



   
ReplyQuote
(@mabula-admin)
Universe Admin
Joined: 9 years ago
Posts: 5318
 
Posted by: Lead_weight

I'm not sure what changed, but I just processed 130 images today, and it only took an hour. CPU performance chart attached. 

The parts prior to the spike are creating the masters. The spike at 90% CPU usage is star registration and the big chunk past that is integration of all calibrated images (about 75% CPU usage, with the last bit being 50% cpu for pixel integration for the final image. This seemed reasonable in terms of overall time spent. Not sure how you go about utilizing more multithreading to saturate the CPUs.

Screen Shot 2017 10 07 at 2.13.51 PM

Hi Lead_weight,

Maybe it's a change in the mentioned settings, like the outlier rejection filter?

I am aware that I can probably optimise integration speed by doing the extensive IO and integration calculations at the same time. Currently they are separated, that's the cause for the CPU utilisation to seem low, although it's using all cores, just not all of the time. And for some hardware configurations it might be useful to give the user the option to choose the integration buffer. With a bigger buffer, the harddisk IO steps are reduced in size, which could help with conventional (read not SSDs) harddrives I think.

I'll have a thorough look tomorrow at my code to check if we can improve substantially here 😉

Mabula

 



   
ReplyQuote
(@lead_weight)
Red Giant
Joined: 9 years ago
Posts: 34
Topic starter  

For outlier rejection I used "winsor sigma clip" as the only option outside of the defaults. I'm using an SSD with a read/write of around 950 MB/s

I'm starting to think I might have enabled multi-band blending when the processing took overnight. This last image integration which only took an hour did not have that option checked.



   
ReplyQuote
(@mabula-admin)
Universe Admin
Joined: 9 years ago
Posts: 5318
 

Thank you Lead_weight.

Yes, the Multi-Band Blending will take longer(did forget it in my summary I realize). APP will need to write and read additional data per pixel for the Multi-Band Blending function to work.

I will check however if I can make the integration more efficient with regards to system resources.

Kind regards,

Mabula



   
Lead_weight reacted
ReplyQuote
(@mabula-admin)
Universe Admin
Joined: 9 years ago
Posts: 5318
 

Hi Lead_weight and everyone else,

GOOD NEWS !

The last couple of days, I have been working on the integration engine to see if I can speed up integration times. Several users reported very long integration times when they were stacking 100-400 frames. 

I have managed to speed-up the integration time signigficantly for both SSD and conventional hard drives.

I have made the integration module more efficient towards resource use, enabling IO reading of the pixelstacks from the file mapper, and pixel stack calculations, at the same time. This  causes an immediatate acceleration of the speed with a factor of about 1.5-1.7x depending on several factors.

Besides that, increasing the read/write IO buffer sizes has a strong positive effect on integration speed for both SSD and conventional drives. By default, the read/write buffers were fixed at 8kilobytes. Increasing this to 256kiloBytes will really speed-up integration time 😉 

An example, integrate 100x RGB 20MegaPixel frames with Multi-Band Blending enabled, using 8 cpu cores at 3,4GHZ and 8GB memory assigned to APP. Average integration without outlier rejection, On a conventional SATA 600 hard drive.

with the old implementation: frame loading took 16 minutes and the actual integration time was almost 2 hours. Consistent with times reported by other APP users.

with the new implementation: frame loading took 16 minutes, and the actual integration took 21 minutes with  64kB read/write buffer, and only 14 minutes with 256kB buffers 😉

Performing the new implementation on a SSD drive that can read/write at speeds of 3000/2000 MB/s, reduces the actual integration time to only 2 minutes !!!

I will probably open a separate topic on this with some graphs, showing the differences.

Very good news I think!

Kind regards,

Mabula



   
xsnrg reacted
ReplyQuote
(@lead_weight)
Red Giant
Joined: 9 years ago
Posts: 34
Topic starter  

Wow, that's a serious improvement! Very exciting.



   
Mabula-Admin reacted
ReplyQuote
 Tim
(@tim)
Red Giant
Joined: 9 years ago
Posts: 47
 

This is excellent, Mabula! Is this something that will be in the next update?



   
Mabula-Admin reacted
ReplyQuote
(@mabula-admin)
Universe Admin
Joined: 9 years ago
Posts: 5318
 

Yes, very big improvement indeed !

I am currently finalizing implementation of this, so yes it will be in the next release of APP 😉

Depending on the amount of memory in the system and the amount of frames and their byte size that will be stacked, APP will automatically use the biggest practical read and write buffer size. So the user won't be bothered with setting buffer sizes and this will Ensure the best integration performance.

Once done, I will run several tests monitoring time spent and computer resources used and I will publish these with graphs 😉 That will probably be tomorrow I think...

Mabula



   
ReplyQuote
(@xsnrg)
Red Giant
Joined: 9 years ago
Posts: 33
 

That is a great improvement Mabula, congratulations!

I do wonder, as many of these high-power CPU machines also have high-powered GPUs, if some of the heavy calculations could also be offloaded to these floating point monsters if they are so equipped?



   
Mabula-Admin reacted
ReplyQuote
(@mabula-admin)
Universe Admin
Joined: 9 years ago
Posts: 5318
 

Very very good idea xsnrg 😉

GPU support for heavy calculation will (no-doubt) come as well in future versions, the required software libraries are already included in APP (so research has been done allready on which to use) and I will start testing soon. Probably with improved stretching capabilities with the processing sliders on the right side.

For simple calculations, the GPUs won't be that effective, they really excel with difficult calculations. However I think that the amount of calculation units of current graphics cards dwarf the amount of cpu cores availble in most systems... 😉 Current graphics cards have over 1000 calculation units...

I am planning for an implementation in certain modules, that will use GPU and/or CPU if available and only CPU otherwise. So with a switchbox in the CFG menu, you will be able to turn on/off GPU support. I think that will be very usefull.

Mabula

 



   
ReplyQuote
(@xsnrg)
Red Giant
Joined: 9 years ago
Posts: 33
 

Having a NVidia 10 series card in my system, I would be happy to test things for you when you are ready!   This is very exciting, indeed.



   
Mabula-Admin reacted
ReplyQuote
(@mabula-admin)
Universe Admin
Joined: 9 years ago
Posts: 5318
 

I have just installed a Nvidia 10 card myself, exactly for testing purposes ( DUAL GTX 1060 6GB )

If GPU support will greatly improve speed of certain functions, then probably gaming rigs with multiple video cards will be very nice systems to run APP 😉

I am quite anxious to start testing actually, but I need to finish some other work first for APP's next release...

Mabula



   
xsnrg reacted
ReplyQuote
Share: