Mar 28 2026 APP 2.0.0-beta40 will be released in 7 days.
It did take a long time to have the work finished on this and it will have a major performance boost of 30-50% over 2.0.0-beta39 from calibration to integration. We extensively optimized many critical parts of APP. All has been tested to guarantee correct optimizations. Drizzle and image resampling is much faster for instance, those modules have been completely rewritten. Much less memory usage. LNC 2.0 will be released which works much better and faster than LNC in it's current state. And more, all will be added to the release notes in the coming weeks...
Update on the 2.0.0 release & the full manual
We are getting close to the 2.0.0 stable release and the full manual. The manual will soon become available on the website and also in PDF format. Both versions will be identical and once released, will start to follow the APP release cycle and thus will stay up-to-date to the latest APP version.
Once 2.0.0 is released, the price for APP will increase. Owner's license holders will not need to pay an upgrade fee to use 2.0.0, neither do Renter's license holders.
I have noticed since I replaced my Xeon 6538N (32 Core, 64 Threads) with an 8558P (48 Core, 96 Threads) that APP only sees #CPU 48 instead of 96. With the 6538N it always showed #CPU 64
Windows and other apps see 48 Cores, 96 Threads, so wonder what is going on.
Simon
My 64 core Epyc only shows 64 threads even though it has 64 cores/128 threads. I think the 64 threads is a limitation.
I suspect this is the way Windows is reporting the logical cores when you exceed 64 cores in total, there's another image processing application that does something very similar and it seems to be to do with the way the application asks the Windows OS how many "Threads" are available.
Even after installing WMIC in a Windows 11 24H2 system (Since it is no longer part of the OS), it still only reports #CPU 48 instead of #CPU 96. Same system with Ubuntu reports #CPU 96
How is APP getting this, is it using the GetActiveProcessorCount Win32 API function with the ALL_PROCESSOR_GROUPS argument?
Would be good if the APP team could reply on this one 🙂
I have just verified from a 32C system (6538N) same system board, memory etc that APP on Windows sees #CPU 64
So this clearly is an issue with APP on Windows when you go above 32 physical cores it seems, as a test I could select one of the SPP profiles which will limit the 8558P CPU to 32 cores, but I don't feel this is neccesary
Hi @stastro & @imnewhere
I will check these CPU issues, it is very likely not a limitation of our code, rather the development platform, the GraalVM JDK 21 which is using an Oracle java JDK 21 call to get the amount of CPUs and use them. So maybe the windows version of the platform did not yet fully support that CPU. Probably a newer version of the platform will I suspect.
I have made a note on my issue list, to verify with you if the issue is gone in the next release which will have a more up-to-date development platform. If it is not fixed then, I think we need to dig a little bit deeper to have this solved of course.
Mabula
APP does not use more than one NUMA group of processors in Windows.
On my HP server there is a BIOS setting especially for Apps that are not programmed to make use of NUMA grouped cores (like APP), so that these Apps can see and use all the available cores.
Since I am only messing around with old hardware, not new stuff, I do not know if there is anything like this in the BIOS of modern/new mainboards.
APP does not use more than one NUMA group of processors in Windows.
On my HP server there is a BIOS setting especially for Apps that are not programmed to make use of NUMA grouped cores (like APP), so that these Apps can see and use all the available cores.
Since I am only messing around with old hardware, not new stuff, I do not know if there is anything like this in the BIOS of modern/new mainboards.
This is not neccesarily related, as my XEON system is a single socket so therefore NUMA is not playing a part in this at all.
@stastro If tere are more than 64 logical cores than Windows will group these cores as it can not handle more than 64.
If there are more than one physical processors than it creates one NUMA group per CPU.
Windows does this since 2008, this was Microsoft's "workaround" for the new multicore-CPUs, and it stayed like this until today.
@walsc Yes you are correct, to a degree. Windows will group "Physical Cores" into one group, and "Logical Cores" into another group. Up to a maximum count of 64. I have another system with 32 Cores (64 threads) and will validate how Windows treats that as well.
Just booted the 32 Core Platform, and Windows groups all Physical and Logical cores into a single group, so this goes some way to explaining why APP only sees # CPU 48 on a 48 core (96 thread) system, and #CPU 64 on a 32 Core (64 thread) system. @mabula-admin something to look at perhaps?
Mabula mentioned that maybe his development platform might cause this problem. I hope he finds some time to look into this.
But in the BIOS of my HP-machine I can choose how NUMA will be handled in the OS. If I set it to "clustered" then APP only reports 28 CPUs, if I set it to "flat" then APP reports 56 CPUs.
Yes, 2x 14-core Xeons.
I also tested older ones - 2x 6-core and 2x 8-core. No problems with thread-count in APP with these.
Yes, 2x 14-core Xeons.
I also tested older ones - 2x 6-core and 2x 8-core. No problems with thread-count in APP with these.
Ok then your statement about configuring NUMA makes sense, NUMA is becoming less and less relevant on modern architectures now due to the increasing number of QPI links in conjunction with faster memory. However, it will always still be faster for a CPU to access the memory channels it has direct access to rather than using a QPI link for the memory channels belonging to the other CPU.
When "Clustered" is enabled, any application that is not NUMA aware, will only run on a single CPU and memory will be accessed from that CPU only, hence why APP sees half the cores when you have it set to enabled. NUMA aware applications manage the core affinity and memory relationship.
In a flat configuration, you are essentially defining a single NUMA node, and both CPUs can access all memory.
Yes, right. I think there is not really much speed to gain avoiding linked QPI memory access because of the growing overhead and copying/verifying.
Essentially we have to conclude that APP is not a NUMA-aware application at it's current state.
Hi Walter @walsc & Simon @stastro,
I still suspect the issue is not caused by our code, but either by our development platform or perhaps BIOS settings and how Windows will deal with that. The issue is still on my list as well.
How is APP getting this, is it using the GetActiveProcessorCount Win32 API function with the ALL_PROCESSOR_GROUPS argument?
No we use the Java JDK calls:
public final static OperatingSystemMXBean osBean = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
public final static int noProcAvailable = osBean.getAvailableProcessors();
The latest version APP 2.0.0-beta30 has a newer development platform compared to beta29. We now use Oracle GraalVM 23.0.1 instead of 21.0.2 in beta29. Can you please test this issue with beta30 and let me know if things are okay now or not?
Thanks,
Mabula
Hi Walter @walsc & Simon @stastro,
I see the following now, maybe this will solve it, another JVM flag for the APP startup:
-XX:+UseNUMA does no harm in these cases.
HotSpot JVM automatically turns off the flag when run on a single node (the source).
I will investigate more and will add the flag for a test version for you okay?
Mabula
@mabula-admin It does not rectify the issue, still only sees 48 Threads on my 48 Core CPU under Windows, but 96 Threads under Linux
On my 32 Core system, it shows 64 Threads, so it would appear to be some reading as I know Windows Groups CPUs into blocks of 64, so a 32 Core CPU with HT enabled (64 Threads) would be grouped under a single group of 64. One you break through 64 Threads in total, like in my 48 Core system for example, it has to create two groups of 48, so it could well be related to this.
Regards
Simon
Hi Simon @stastro,
Thank you very much for your quick reply.
I have found the following additional info for our development platfrom, especially limitations on windows:
https://bugs.openjdk.org/browse/JDK-8046153
https://bugs.openjdk.org/browse/JDK-8244065
So, I shoudl probably try to make a windows test version for you with the following flags for the application startup:
-XX:+UseNUMA
-XX:+ForceNUMA
-XX:+UseNUMAInterleaving
I will try to make the test version in the coming days 😉
Mabula
@mabula-admin If you made those changes in the Beta31-Test that's available, then the issue still remains:
Hi Simon @stastro,
I did not yet make those changes for NUMA/reported and used threads in 2.0.0-beta31-test.
The beta31-test version has the changes as described at the moment in the release notes for beta31, which I included below for your convenience. And it has the fix for the crash on windows as reported here by Walter @walsc : https://www.astropixelprocessor.com/community/main-forum/how-to-speed-up-registration-process-on-large-mosaics/paged/2/#post-31937 . Walter confirmed yesterday that this issue is fixed !
Regarding NUMA, i will try to release another test version with these NUMA arguments for the JVM startup as soon as possible for you and Walter to test. I will try to have them for you at the end of the day 😉 I will notify you once you can download and test. I need to test if these arguments will not affect performance or problems on all the platforms and I need to study in detail what these arguments would actually do. From what I understand now, is that they are clearly needed for the Windows platform to use NUMA and thus all the cores in your system.
Astro Pixel Processor 2.0.0-beta31 release notes - work in progress...
- FIXED External OpenGL image viewer on 2nd monitor
We have fixed several bugs related to the external OpenGL image viewer window. Most importantly, when the external OpenGL image Viewer window is on a 2nd monitor, after restarting APP, the window would be recreated on the same place on the 2nd monitor but the application could hang while starting up, due to an OpenGL initialization issue. This issue is fixed, so you can now use the external OpenGL image viewer fine on external or not-primary monitors. The bug fix has another upside as well. The actual drawing performance of the image data is much faster now depending on the graphical chip and the operating System.
- IMPROVED External OpenGL image viewer performance
Due to the above bug fix, the OpenGL initialization for the external OpenGL image viewer has been improved which greatly improves the drawing performance depending on graphical hardware and the operating system. On my test system with and Nvidia RTX 2080 videocard and windows 10, the performance is greatly enhanced, zooming in and out with the mousewheel on the image is very fast now in this external OpenGL window.
- FIXED release resources of OpenGL and external image viewer windows at application shutdown
This was an older bug on our list which is solved now. When the application is closed, the external image viewer windows are properly disposed and OpenGL is explicitely shutdown to make sure that all those resources are released. It is implemented in a shutdown hook, so all resources are released even in situation of an application crash.
- FIXED FOCUS issues with multi window setup
As reported in this post https://www.astropixelprocessor.com/community/rfcs-request-for-changes/multi-monitor-support-image-window-on-2nd-monitor/#post-31710 there were focus issues with the external image viewer windows relative to the main application window or the console panel window. If the external image viewer would be in front of the main application window and it would fully cover it, you could not get the main window to become visisble. This is now fixed for all situations where APP application windows are in front of each other. Each APP window behind another APP window can be moved in front and in focus.
The NUMA options to start APP are not the solution it seems, they only affect memory usage accross different CPU nodes. It will not enable more than 64 logical cores.
For macOS and Linux we have no issue here, right? The issue is using more than 64 logical cores on Windows.
So I think i am getting now to the source of the issue now. Please read and see this:
https://bugs.openjdk.org/browse/JDK-8338083
"Windows processor groups support a maximum of 64 logical processors per processor group. OpenJDK currently uses the GetSystemInfo API to determine the number of processors on a Windows machine. However, GetSystemInfo only returns the number of processors in the current processor group, which is at most 64. Therefore, OpenJDK cannot currently use over 64 processors on Windows."
So in the latest version of the java JDK and GraalVM which we use, we have a solution for the problem with the startup jvm flag of UseAllWindowsProcessorGroups
But it will only work on Windows versions that "automatically schedule threads across all processor groups".
A quick google for this indicates:
"In order for applications to automatically take advantage of all the processors in a machine with more than 64 processors, starting with Windows 11 and Windows Server 2022 the OS has changed to make processes and their threads span all processors in the system, across all processor groups, by default"
So you need to have Windows 11 installed or a Server version since 2022. Do you meet these requirements?
I will now build a version with this tag and will let you know when you can download it 😉
Mabula
APP 2.0.0-beta31-test2 is now building, once finished, it will be uploaded to the release server automatically. In about 15 minutes you will be able to download this version form the downloads section. Look for the folder 2.0.0-beta31-test2.
Let me know if all is okay now with the number of CPU threads on Windows and please note, Windows 11 or Windows Server (from 2022) is required for this to work.
Mabula
Please try this Windows version to see if you can use more than 64 CPU threads, note Windows 11 or WIndows Server 2022 (or later) is required.
Mabula
@mabula-admin I can confirm, this works 🙂
And to reply back to your other questions, Linux is not affected, and I am using Windows 11 24H2. Thank you for digging into this and finding a resolution
That is great news ! Did you also see in processing that you actually have 2x processing power now with 96 threads available? I assume you did and this works fully as expected now?
Regarding NUMA, I will enable the useNUMA flag for all platforms in the beta31 release and we will need to see if it is actually used on some systems. Maybe on your Linux systems it will make things faster, but I doubt it will on Windows from what I read. There is also a chance the useNUMA does nothing, even not on Linux, because it seems to need to use the parallel garbage collector in the JVM, which we for good reasons don't use at the moment. It seems clear that enabling the flag can not harm the application, because the JVM can detect from the hardware and JVM setup if it can be used or not, if it can't, it will still be disabled automatically.
Mabula
@mabula-admin Doing some testing right now, I have enabled 86 / 96 threads but I have observed something that I was not expecting.
In my current image processing run I have noticed:
1. Analysis only consumes around 33% of total CPU
2. Align never exceeds 50% of CPU
3. Normalize seems to run at between 15-20% of CPU
I used Perfmon to look at this, but I also loaded up Task Manager's performance graph and asked it to show me all "Logical Cores", and it would appear that it is only using 50% of the CPU:
So maybe this goes deeper than just being able to see all the cores?
Hi Simon @stastro,
Thank you very much. Probably the NUMA flags are relevant now I think after further reading. We want APP to use all NUMA nodes, so it can use all cores and their memory freely within the whole application.
We can enabile the cores with the UseAllWindowsProcessorGroups flag.
Then NUMA is needed to be able to use the memory efficiently with the UseNUMA flag
and finally, the memory needs to be interleaved between the NUMA nodes with the UseNUMAInterleaving flag I think. This last flag is also especially needed for the windows platform when the java process would use all NUMA nodes.
I will send you another test version that uses all 3 of the flags 😉 will build it now and will let you know once it is available
Mabula













