Fully Multi-Threaded LNC, many improvements for the registration engine, platform upgrade, and further tuning of internal memory consumption and memory release back to OS.
Apr 14 2026: Google Pay, Apple Pay & WeChat Pay added as payment options
Update on the 2.0.0 release & the full manual
We are getting close to the 2.0.0 stable release and the full manual. The manual will soon become available on the website and also in PDF format. Both versions will be identical and once released, will start to follow the APP release cycle and thus will stay up-to-date to the latest APP version.
Once 2.0.0 is released, the price for APP will increase. Owner's license holders will not need to pay an upgrade fee to use 2.0.0, neither do Renter's license holders.
I've been extremely happy with the quality of my integrations using APP. All the setup, steps 1-5 process within minutes. But when I get to integration for a stack of 300 images it can take overnight. I have a 6 core 3.5 Ghz Mac Pro. I see CPU usage at less than 10%. Some of the early processes like Analyze Stars uses 100% cpu. Comparable projects in PixInsight with 300 stacked images takes about 2 hours on my computer.
I'm not sure what changed, but I just processed 130 images today, and it only took an hour. CPU performance chart attached.Â
The parts prior to the spike are creating the masters. The spike at 90% CPU usage is star registration and the big chunk past that is integration of all calibrated images (about 75% CPU usage, with the last bit being 50% cpu for pixel integration for the final image. This seemed reasonable in terms of overall time spent. Not sure how you go about utilizing more multithreading to saturate the CPUs.
I've been extremely happy with the quality of my integrations using APP. All the setup, steps 1-5 process within minutes. But when I get to integration for a stack of 300 images it can take overnight. I have a 6 core 3.5 Ghz Mac Pro. I see CPU usage at less than 10%. Some of the early processes like Analyze Stars uses 100% cpu. Comparable projects in PixInsight with 300 stacked images takes about 2 hours on my computer.
Hi Lead_weight,
Thank you for bringing this to my attention.
First of all, integration speed is dependant on several factors given a fixed number of frames to integrate and a fixed number of pixels per frame. Factors are:
harddisk speed, integration as it is implemented in APP is quite IO intensive. A faster hard drive with low latency will make a lot of difference.
the use of an outlier rejection filter in integration. Without outlier rejection, integration speed is the highest. Sigma and Winsorized rejection are a bit slower. The linear fit rejection is really slow with a lot of frames currently (and I don't recommend using it, because APP has LNC), in my priorities list it's scheduled for an upgrade for speed though.
LNC can by time consuming in itself if you use a lot of iterations. All layers/frames of the stack are iteratively adjusted for illumination differences.Â
Number and speed of of cpus have a lot of influence off course.
average or median integration has little influence.
The loading of the frames is slowest with lanczos interpolation (but sharpest) and fastest with nearest neighbour interpolation (but worst).
So depending on your hardware configuration and the settings that you used, this could explain the long integration time that you experienced.
I'm not sure what changed, but I just processed 130 images today, and it only took an hour. CPU performance chart attached.Â
The parts prior to the spike are creating the masters. The spike at 90% CPU usage is star registration and the big chunk past that is integration of all calibrated images (about 75% CPU usage, with the last bit being 50% cpu for pixel integration for the final image. This seemed reasonable in terms of overall time spent. Not sure how you go about utilizing more multithreading to saturate the CPUs.
Hi Lead_weight,
Maybe it's a change in the mentioned settings, like the outlier rejection filter?
I am aware that I can probably optimise integration speed by doing the extensive IO and integration calculations at the same time. Currently they are separated, that's the cause for the CPU utilisation to seem low, although it's using all cores, just not all of the time. And for some hardware configurations it might be useful to give the user the option to choose the integration buffer. With a bigger buffer, the harddisk IO steps are reduced in size, which could help with conventional (read not SSDs) harddrives I think.
I'll have a thorough look tomorrow at my code to check if we can improve substantially here 😉
For outlier rejection I used "winsor sigma clip" as the only option outside of the defaults. I'm using an SSD with a read/write of around 950 MB/s
I'm starting to think I might have enabled multi-band blending when the processing took overnight. This last image integration which only took an hour did not have that option checked.
Yes, the Multi-Band Blending will take longer(did forget it in my summary I realize). APP will need to write and read additional data per pixel for the Multi-Band Blending function to work.
I will check however if I can make the integration more efficient with regards to system resources.
The last couple of days, I have been working on the integration engine to see if I can speed up integration times. Several users reported very long integration times when they were stacking 100-400 frames.Â
I have managed to speed-up the integration time signigficantly for both SSD and conventional hard drives.
I have made the integration module more efficient towards resource use, enabling IO reading of the pixelstacks from the file mapper, and pixel stack calculations, at the same time. This causes an immediatate acceleration of the speed with a factor of about 1.5-1.7x depending on several factors.
Besides that, increasing the read/write IO buffer sizes has a strong positive effect on integration speed for both SSD and conventional drives. By default, the read/write buffers were fixed at 8kilobytes. Increasing this to 256kiloBytes will really speed-up integration time 😉Â
An example, integrate 100x RGB 20MegaPixel frames with Multi-Band Blending enabled, using 8 cpu cores at 3,4GHZ and 8GB memory assigned to APP. Average integration without outlier rejection, On a conventional SATA 600 hard drive.
with the old implementation: frame loading took 16 minutes and the actual integration time was almost 2 hours. Consistent with times reported by other APP users.
with the new implementation: frame loading took 16 minutes, and the actual integration took 21 minutes with 64kB read/write buffer, and only 14 minutes with 256kB buffers 😉
Performing the new implementation on a SSD drive that can read/write at speeds of 3000/2000 MB/s, reduces the actual integration time to only 2 minutes !!!
I will probably open a separate topic on this with some graphs, showing the differences.
I am currently finalizing implementation of this, so yes it will be in the next release of APP 😉
Depending on the amount of memory in the system and the amount of frames and their byte size that will be stacked, APP will automatically use the biggest practical read and write buffer size. So the user won't be bothered with setting buffer sizes and this will Ensure the best integration performance.
Once done, I will run several tests monitoring time spent and computer resources used and I will publish these with graphs 😉 That will probably be tomorrow I think...
That is a great improvement Mabula, congratulations!
I do wonder, as many of these high-power CPU machines also have high-powered GPUs, if some of the heavy calculations could also be offloaded to these floating point monsters if they are so equipped?
GPU support for heavy calculation will (no-doubt) come as well in future versions, the required software libraries are already included in APP (so research has been done allready on which to use) and I will start testing soon. Probably with improved stretching capabilities with the processing sliders on the right side.
For simple calculations, the GPUs won't be that effective, they really excel with difficult calculations. However I think that the amount of calculation units of current graphics cards dwarf the amount of cpu cores availble in most systems... 😉 Current graphics cards have over 1000 calculation units...
I am planning for an implementation in certain modules, that will use GPU and/or CPU if available and only CPU otherwise. So with a switchbox in the CFG menu, you will be able to turn on/off GPU support. I think that will be very usefull.
I have just installed a Nvidia 10 card myself, exactly for testing purposes ( DUAL GTX 1060 6GB )
If GPU support will greatly improve speed of certain functions, then probably gaming rigs with multiple video cards will be very nice systems to run APP 😉
I am quite anxious to start testing actually, but I need to finish some other work first for APP's next release...