r.ShaderCompiler.DistributedMinBatchSize

r.ShaderCompiler.DistributedMinBatchSize

#Overview

name: r.ShaderCompiler.DistributedMinBatchSize

This variable is created as a Console Variable (cvar).

It is referenced in 40 C++ source files.

#Summary

#Usage in the C++ source code

The purpose of r.ShaderCompiler.DistributedMinBatchSize is to set the minimum number of shaders to compile with a distributed shader compiler controller. Smaller numbers of shaders will be compiled locally instead of using the distributed compilation system.

Key points about this setting variable:

  1. It is used by the shader compilation system, specifically for distributed compilation.

  2. The Unreal Engine shader compiler subsystem relies on this variable to determine when to use distributed compilation vs local compilation.

  3. The value is set via console variable, with a default provided in the code.

  4. It interacts with the MinBatchSize variable, which shares the same value.

  5. Developers should be aware that setting this too low could result in inefficient use of the distributed compilation system for very small shader batches.

  6. Best practices would be to tune this value based on the specific project and build environment to balance compilation speed and resource utilization.

The associated MinBatchSize variable is used in various parts of the engine, including:

When using MinBatchSize, developers should consider:

In general, this variable allows fine-tuning of parallel processing behavior across multiple engine subsystems. Careful adjustment can potentially improve performance, but should be done with testing to verify the impact.

#References in C++ code

#Callsites

This variable is referenced in the following C++ source code:

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/ShaderCompiler/ShaderCompilerDistributed.cpp:17

Scope (from outer to inner):

file
namespace    DistributedShaderCompilerVariables

Source code excerpt:


	FAutoConsoleVariableRef CVarDistributedMinBatchSize(
		TEXT("r.ShaderCompiler.DistributedMinBatchSize"),
		MinBatchSize,
		TEXT("Minimum number of shaders to compile with a distributed controller.\n")
		TEXT("Smaller number of shaders will compile locally."),
		ECVF_Default);

	static int32 GDistributedControllerTimeout = 15 * 60;

#Associated Variable and Callsites

This variable is associated with another variable named MinBatchSize. They share the same value. See the following C++ source code.

#Loc: <Workspace>/Engine/Plugins/Experimental/NaniteDisplacedMesh/Source/NaniteDisplacedMesh/Private/NaniteDisplacedMeshCompiler.cpp:387

Scope (from outer to inner):

file
function     void FNaniteDisplacedMeshCompilingManager::ProcessNaniteDisplacedMeshes

Source code excerpt:

void FNaniteDisplacedMeshCompilingManager::Reschedule()
{
	// TODO Prioritize nanite displaced mesh that are nearest to the viewport
}

void FNaniteDisplacedMeshCompilingManager::ProcessNaniteDisplacedMeshes(bool bLimitExecutionTime, int32 MinBatchSize)
{
	using namespace NaniteDisplacedMeshCompilingManagerImpl;
	TRACE_CPUPROFILER_EVENT_SCOPE(FNaniteDisplacedMeshCompilingManager::ProcessNaniteDisplacedMeshes);
	const int32 NumRemainingMeshes = GetNumRemainingAssets();
	// Spread out the load over multiple frames but if too many meshes, convergence is more important than frame time
	const int32 MaxMeshUpdatesPerFrame = bLimitExecutionTime ? FMath::Max(64, NumRemainingMeshes / 10) : INT32_MAX;

	FObjectCacheContextScope ObjectCacheScope;
	if (NumRemainingMeshes && NumRemainingMeshes >= MinBatchSize)
	{
		TSet<UNaniteDisplacedMesh*> NaniteDisplacedMeshesToProcess;
		for (TWeakObjectPtr<UNaniteDisplacedMesh>& NaniteDisplacedMesh : RegisteredNaniteDisplacedMesh)
		{
			if (NaniteDisplacedMesh.IsValid())
			{
				NaniteDisplacedMeshesToProcess.Add(NaniteDisplacedMesh.Get());
			}
		}

		{

#Loc: <Workspace>/Engine/Plugins/Experimental/NaniteDisplacedMesh/Source/NaniteDisplacedMesh/Private/NaniteDisplacedMeshCompiler.h:77

Scope (from outer to inner):

file
class        class FNaniteDisplacedMeshCompilingManager : public IAssetCompilingManager, public FGCObject

Source code excerpt:


	TUniquePtr<FAsyncCompilationNotification> Notification;

	void FinishCompilationsForGame();
	void Reschedule();
	void ProcessNaniteDisplacedMeshes(bool bLimitExecutionTime, int32 MinBatchSize = 1);
	void UpdateCompilationNotification();

	void PostCompilation(TArrayView<UNaniteDisplacedMesh* const> InNaniteDisplacedMeshes);
	void PostCompilation(UNaniteDisplacedMesh* InNaniteDisplacedMesh);

	void OnPostReachabilityAnalysis();
	FDelegateHandle PostReachabilityAnalysisHandle;
};

#endif // WITH_EDITOR

#Loc: <Workspace>/Engine/Plugins/Importers/USDImporter/Source/USDSchemas/Private/USDInfoCache.cpp:669

Scope (from outer to inner):

file
namespace    UE::USDInfoCacheImpl::Private
function     void RecursivePropagateVertexAndMaterialSlotCounts

Source code excerpt:

		ChildSubtreeVertexCounts.SetNumUninitialized(NumChildren);

		TArray<TArray<UsdUtils::FUsdPrimMaterialSlot>> ChildSubtreeMaterialSlots;
		ChildSubtreeMaterialSlots.SetNum(NumChildren);

		const int32 MinBatchSize = 1;

		ParallelFor(
			TEXT("RecursivePropagateVertexAndMaterialSlotCounts"),
			Prims.Num(),
			MinBatchSize,
			[&](int32 Index)
			{
				RecursivePropagateVertexAndMaterialSlotCounts(
					Prims[Index],
					Context,
					MaterialPurposeToken,
					Impl,
					Registry,
					InOutSubtreeToMaterialSlots,
					InOutPointInstancerPaths,
					ChildSubtreeVertexCounts[Index],

#Loc: <Workspace>/Engine/Plugins/Importers/USDImporter/Source/USDSchemas/Private/USDInfoCache.cpp:1010

Scope (from outer to inner):

file
namespace    UE::USDInfoCacheImpl::Private
function     void RecursiveQueryCollapsesChildren

Source code excerpt:

		for (pxr::UsdPrim Child : PrimChildren)
		{
			Prims.Emplace(Child);
		}

		const int32 MinBatchSize = 1;

		ParallelFor(
			TEXT("RecursiveQueryCollapsesChildren"),
			Prims.Num(),
			MinBatchSize,
			[&](int32 Index)
			{
				RecursiveQueryCollapsesChildren(Prims[Index], Context, Impl, Registry, *AssetCollapsedRootOverride, *ComponentCollapsedRootOverride);
			}
		);

		{
			FWriteScopeLock ScopeLock(Impl.InfoMapLock);

			UE::UsdInfoCache::Private::FUsdPrimInfo& Info = Impl.InfoMap.FindOrAdd(UE::FSdfPath{UsdPrimPath});

#Loc: <Workspace>/Engine/Plugins/Importers/USDImporter/Source/USDSchemas/Private/USDInfoCache.cpp:1243

Scope (from outer to inner):

file
namespace    UE::USDInfoCacheImpl::Private
function     void RecursiveCheckForGeometryCache

Source code excerpt:

		Depths.SetNum(Prims.Num());

		TArray<EGeometryCachePrimState> States;
		States.SetNum(Prims.Num());

		const int32 MinBatchSize = 1;
		ParallelFor(
			TEXT("RecursiveCheckForGeometryCache"),
			Prims.Num(),
			MinBatchSize,
			[&Prims, &Context, &Impl, bIsInsideSkelRoot, &Depths, &States](int32 Index)
			{
				RecursiveCheckForGeometryCache(
					Prims[Index],
					Context,
					Impl,
					bIsInsideSkelRoot || Prims[Index].IsA<pxr::UsdSkelRoot>(),
					Depths[Index],
					States[Index]
				);
			}

#Loc: <Workspace>/Engine/Plugins/Importers/USDImporter/Source/USDUtilities/Private/USDSkeletalDataConversion.cpp:691

Scope (from outer to inner):

file
namespace    UsdToUnrealImpl
function     void CreateMorphTargets

Source code excerpt:

					}
				}
			}
		}

		const int32 MinBatchSize = 1;

		ParallelFor(
			TEXT("CreateMorphTarget"),
			MorphTargetJobs.Num(),
			MinBatchSize,
			[&MorphTargetJobs, &OrigIndexToBuiltIndicesPerLOD, &TempMeshBundlesPerLOD, ImportedResource](int32 Index)
			{
				TRACE_CPUPROFILER_EVENT_SCOPE(USDSkeletalDataConversion::CreateMorphTargetJob);

				FMorphTargetJob& Job = MorphTargetJobs[Index];
				if (!Job.BlendShape || !Job.MorphTarget)
				{
					return;
				}

				for (int32 LODIndex : Job.BlendShape->LODIndicesThatUseThis)

#Loc: <Workspace>/Engine/Source/Runtime/Core/Private/GenericPlatform/GenericPlatformFile.cpp:625

Scope (from outer to inner):

file
function     bool IPlatformFile::IterateDirectoryRecursively

Source code excerpt:

	};

	TArray<FString> DirectoriesToVisit;
	DirectoriesToVisit.Add(Directory);

	constexpr int32 MinBatchSize = 1;
	const EParallelForFlags ParallelForFlags = FTaskGraphInterface::IsRunning() && Visitor.IsThreadSafe()
		? EParallelForFlags::Unbalanced : EParallelForFlags::ForceSingleThread;
	std::atomic<bool> bResult{true};
	TArray<TArray<FString>> DirectoriesToVisitNext;
	while (bResult && DirectoriesToVisit.Num() > 0)
	{
		ParallelForWithTaskContext(TEXT("IterateDirectoryRecursively.PF"),
			DirectoriesToVisitNext,
			DirectoriesToVisit.Num(),
			MinBatchSize,
			[this, &Visitor, &DirectoriesToVisit, &bResult](TArray<FString>& Directories, int32 Index)
			{
				FRecurse Recurse(Visitor, Directories);
				if (bResult.load(std::memory_order_relaxed) && !IterateDirectory(*DirectoriesToVisit[Index], Recurse))
				{
					bResult.store(false, std::memory_order_relaxed);
				}
			},
			ParallelForFlags);
		DirectoriesToVisit.Reset(Algo::TransformAccumulate(DirectoriesToVisitNext, &TArray<FString>::Num, 0));
		for (TArray<FString>& Directories : DirectoriesToVisitNext)

#Loc: <Workspace>/Engine/Source/Runtime/Core/Private/GenericPlatform/GenericPlatformFile.cpp:635

Scope (from outer to inner):

file
function     bool IPlatformFile::IterateDirectoryRecursively

Source code excerpt:

			DirectoriesToVisitNext,
			DirectoriesToVisit.Num(),
			MinBatchSize,
			[this, &Visitor, &DirectoriesToVisit, &bResult](TArray<FString>& Directories, int32 Index)
			{
				FRecurse Recurse(Visitor, Directories);
				if (bResult.load(std::memory_order_relaxed) && !IterateDirectory(*DirectoriesToVisit[Index], Recurse))
				{
					bResult.store(false, std::memory_order_relaxed);

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:77

Scope (from outer to inner):

file
namespace    ParallelForImpl
function     inline int32 GetNumberOfThreadTasks

Source code excerpt:

	inline void CallBody(const FunctionType& Body, const TArrayView<TYPE_OF_NULLPTR>&, int32, int32 Index)
	{
		Body(Index);
	}

	inline int32 GetNumberOfThreadTasks(int32 Num, int32 MinBatchSize, EParallelForFlags Flags)
	{
		int32 NumThreadTasks = 0;
		const bool bIsMultithread = FApp::ShouldUseThreadingForPerformance() || FForkProcessHelper::IsForkedMultithreadInstance();
		if (Num > 1 && (Flags & EParallelForFlags::ForceSingleThread) == EParallelForFlags::None && bIsMultithread)
		{
			NumThreadTasks = FMath::Min(int32(LowLevelTasks::FScheduler::Get().GetNumWorkers()), (Num + (MinBatchSize/2))/MinBatchSize);
		}

		if (!LowLevelTasks::FScheduler::Get().IsWorkerThread())
		{
			NumThreadTasks++; //named threads help with the work
		}

		// don't go wider than number of cores
		NumThreadTasks = FMath::Min(NumThreadTasks, FPlatformMisc::NumberOfCoresIncludingHyperthreads());

		return FMath::Max(NumThreadTasks, 1);

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:101

Scope (from outer to inner):

file
namespace    ParallelForImpl

Source code excerpt:


	/** 
		*	General purpose parallel for that uses the taskgraph
		*	@param DebugName; Debugname and Profiling TraceTag
		*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
		*	@param MinBatchSize; Minimum size a Batch should have
		*	@param Body; Function to call from multiple threads
		*	@param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
		*	@param Flags; Used to customize the behavior of the ParallelFor if needed.
		*   @param Contexts; Optional per thread contexts to accumulate data concurrently.
		*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
	**/
	template<typename BodyType, typename PreWorkType, typename ContextType>
	inline void ParallelForInternal(const TCHAR* DebugName, int32 Num, int32 MinBatchSize, BodyType Body, PreWorkType CurrentThreadWorkToDoBeforeHelping, EParallelForFlags Flags, const TArrayView<ContextType>& Contexts)
	{
		if (Num == 0)
		{
			// Contract is that prework should always be called even when number of tasks is 0.
			// We omit the trace scope here to avoid noise when the prework is empty since this amounts to just calling a function anyway with nothing specific to parallelfor itself.
			CurrentThreadWorkToDoBeforeHelping();
			return;
		}

		SCOPE_CYCLE_COUNTER(STAT_ParallelFor);
		TRACE_CPUPROFILER_EVENT_SCOPE(ParallelFor);
		check(Num >= 0);

		int32 NumWorkers = GetNumberOfThreadTasks(Num, MinBatchSize, Flags);

		if (!Contexts.IsEmpty())
		{
			// Use at most as many workers as there are contexts when task contexts are used.
			NumWorkers = FMath::Min(NumWorkers, Contexts.Num());
		}

		//single threaded mode
		if (NumWorkers <= 1)
		{
			// do the prework

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:123

Scope (from outer to inner):

file
namespace    ParallelForImpl
function     inline void ParallelForInternal

Source code excerpt:

		check(Num >= 0);

		int32 NumWorkers = GetNumberOfThreadTasks(Num, MinBatchSize, Flags);

		if (!Contexts.IsEmpty())
		{
			// Use at most as many workers as there are contexts when task contexts are used.
			NumWorkers = FMath::Min(NumWorkers, Contexts.Num());
		}

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:482

Scope: file

Source code excerpt:


/**
	*	General purpose parallel for that uses the taskgraph
	*   @param DebugName; ProfilingScope and Debugname
	*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
	*   @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
	*	@param Body; Function to call from multiple threads
	*	@param bForceSingleThread; Mostly used for testing, if true, run single threaded instead.
	*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template<typename FunctionType>
inline void ParallelForTemplate(const TCHAR* DebugName, int32 Num, int32 MinBatchSize, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
	ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, TArrayView<TYPE_OF_NULLPTR>());
}

/** 
	*	General purpose parallel for that uses the taskgraph for unbalanced tasks
	*	Offers better work distribution among threads at the cost of a little bit more synchronization.
	*	This should be used for tasks with highly variable computational time.
	*
	*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
	*	@param Body; Function to call from multiple threads
	*	@param Flags; Used to customize the behavior of the ParallelFor if needed.
	*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:515

Scope: file

Source code excerpt:

	*	Offers better work distribution among threads at the cost of a little bit more synchronization.
	*	This should be used for tasks with highly variable computational time.
	*
	*   @param DebugName; ProfilingScope and Debugname
	*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
	*   @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
	*	@param Body; Function to call from multiple threads
	*	@param Flags; Used to customize the behavior of the ParallelFor if needed.
	*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
inline void ParallelFor(const TCHAR* DebugName, int32 Num, int32 MinBatchSize, TFunctionRef<void(int32)> Body, EParallelForFlags Flags = EParallelForFlags::None)
{
	ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, TArrayView<TYPE_OF_NULLPTR>());
}

/** 
	*	General purpose parallel for that uses the taskgraph
	*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
	*	@param Body; Function to call from multiple threads
	*	@param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
	*	@param bForceSingleThread; Mostly used for testing, if true, run single threaded instead.
	*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
inline void ParallelForWithPreWork(int32 Num, TFunctionRef<void(int32)> Body, TFunctionRef<void()> CurrentThreadWorkToDoBeforeHelping, bool bForceSingleThread, bool bPumpRenderingThread = false)
{
	ParallelForImpl::ParallelForInternal(TEXT("ParallelFor Task"), Num, 1, Body, CurrentThreadWorkToDoBeforeHelping,
		(bForceSingleThread ? EParallelForFlags::ForceSingleThread : EParallelForFlags::None) |
		(bPumpRenderingThread ? EParallelForFlags::PumpRenderingThread : EParallelForFlags::None), TArrayView<TYPE_OF_NULLPTR>());

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:557

Scope: file

Source code excerpt:

	*   @param DebugName; ProfilingScope and Debugname
	*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
	*   @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
	*	@param Body; Function to call from multiple threads
	*	@param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
	*	@param Flags; Used to customize the behavior of the ParallelFor if needed.
	*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
inline void ParallelForWithPreWork(const TCHAR* DebugName, int32 Num, int32 MinBatchSize, TFunctionRef<void(int32)> Body, TFunctionRef<void()> CurrentThreadWorkToDoBeforeHelping, EParallelForFlags Flags = EParallelForFlags::None)
{
	ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, CurrentThreadWorkToDoBeforeHelping, Flags, TArrayView<TYPE_OF_NULLPTR>());
}

/** 
 * General purpose parallel for that uses the taskgraph
 * @param DebugName; ProfilingScope and DebugName
 * @param OutContexts; Array that will hold the user-defined, task-level context objects (allocated per parallel task)
 * @param Num; number of calls of Body; Body(0), Body(1), ..., Body(Num - 1)
 * @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
 * @param ContextConstructor; Function to call to initialize each task context allocated for the operation
 * @param Body; Function to call from multiple threads
 * @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
 * @param Flags; Used to customize the behavior of the ParallelFor if needed.
 * Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
 */
template <typename ContextType, typename ContextAllocatorType, typename ContextConstructorType, typename BodyType, typename PreWorkType>
inline void ParallelForWithPreWorkWithTaskContext(
	const TCHAR* DebugName,
	TArray<ContextType, ContextAllocatorType>& OutContexts,
	int32 Num,
	int32 MinBatchSize,
	ContextConstructorType&& ContextConstructor,
	BodyType&& Body,
	PreWorkType&& CurrentThreadWorkToDoBeforeHelping,
	EParallelForFlags Flags = EParallelForFlags::None)
{
	if (Num > 0)
	{
		const int32 NumContexts = ParallelForImpl::GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
		OutContexts.Reset(NumContexts);
		for (int32 ContextIndex = 0; ContextIndex < NumContexts; ++ContextIndex)
		{
			OutContexts.Emplace(ContextConstructor(ContextIndex, NumContexts));
		}
		ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Forward<BodyType>(Body), Forward<PreWorkType>(CurrentThreadWorkToDoBeforeHelping), Flags, TArrayView<ContextType>(OutContexts));
	}
}

/** 
 * General purpose parallel for that uses the taskgraph
 * @param DebugName; ProfilingScope and DebugName
 * @param OutContexts; Array that will hold the user-defined, task-level context objects (allocated per parallel task)
 * @param Num; number of calls of Body; Body(0), Body(1), ..., Body(Num - 1)
 * @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
 * @param Body; Function to call from multiple threads
 * @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
 * @param Flags; Used to customize the behavior of the ParallelFor if needed.
 * Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
 */
template <typename ContextType, typename ContextAllocatorType, typename BodyType, typename PreWorkType>
inline void ParallelForWithPreWorkWithTaskContext(
	const TCHAR* DebugName,
	TArray<ContextType, ContextAllocatorType>& OutContexts,
	int32 Num,
	int32 MinBatchSize,
	BodyType&& Body,
	PreWorkType&& CurrentThreadWorkToDoBeforeHelping,
	EParallelForFlags Flags = EParallelForFlags::None)
{
	if (Num > 0)
	{
		const int32 NumContexts = ParallelForImpl::GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
		OutContexts.Reset();
		OutContexts.AddDefaulted(NumContexts);
		ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Forward<BodyType>(Body), Forward<PreWorkType>(CurrentThreadWorkToDoBeforeHelping), Flags, TArrayView<ContextType>(OutContexts));
	}
}

/** 
 * General purpose parallel for that uses the taskgraph
 * @param DebugName; ProfilingScope and DebugName
 * @param Contexts; User-privided array of user-defined task-level context objects
 * @param Num; number of calls of Body; Body(0), Body(1), ..., Body(Num - 1)
 * @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
 * @param Body; Function to call from multiple threads
 * @param CurrentThreadWorkToDoBeforeHelping; The work is performed on the main thread before it starts helping with the ParallelFor proper
 * @param Flags; Used to customize the behavior of the ParallelFor if needed.
 * Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
 */
template <typename ContextType, typename BodyType, typename PreWorkType>
inline void ParallelForWithPreWorkWithExistingTaskContext(
	const TCHAR* DebugName,
	TArrayView<ContextType> Contexts,
	int32 Num,
	int32 MinBatchSize,
	BodyType&& Body,
	PreWorkType&& CurrentThreadWorkToDoBeforeHelping,
	EParallelForFlags Flags = EParallelForFlags::None)
{
	ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Forward<BodyType>(Body), Forward<PreWorkType>(CurrentThreadWorkToDoBeforeHelping), Flags, Contexts);
}

/** 
	*	General purpose parallel for that uses the taskgraph. This variant constructs for the caller a user-defined context
	* 	object for each task that may get spawned to do work, and passes it on to the loop body to give it a task-local
	*   "workspace" that can be mutated without need for synchronization primitives. For this variant, the user provides a

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:734

Scope: file

Source code excerpt:

	*   @param DebugName; ProfilingScope and Debugname
	*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
	*   @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
	* 	@param ContextConstructor; Function to call to initialize each task context allocated for the operation
	*	@param Body; Function to call from multiple threads
	*	@param Flags; Used to customize the behavior of the ParallelFor if needed.
	*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template <typename ContextType, typename ContextAllocatorType, typename ContextConstructorType, typename FunctionType>
inline void ParallelForWithTaskContext(const TCHAR* DebugName, TArray<ContextType, ContextAllocatorType>& OutContexts, int32 Num, int32 MinBatchSize, const ContextConstructorType& ContextConstructor, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
	if (Num > 0)
	{
		const int32 NumContexts = ParallelForImpl::GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
		OutContexts.Reset();
		OutContexts.AddUninitialized(NumContexts);
		for (int32 ContextIndex = 0; ContextIndex < NumContexts; ++ContextIndex)
		{
			new(&OutContexts[ContextIndex]) ContextType(ContextConstructor(ContextIndex, NumContexts));
		}
		ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, TArrayView<ContextType>(OutContexts));
	}
}

/** 
	*	General purpose parallel for that uses the taskgraph. This variant constructs for the caller a user-defined context
	* 	object for each task that may get spawned to do work, and passes it on to the loop body to give it a task-local

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:763

Scope: file

Source code excerpt:

	*   @param DebugName; ProfilingScope and Debugname
	*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
	*   @param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
	*	@param Body; Function to call from multiple threads
	*	@param Flags; Used to customize the behavior of the ParallelFor if needed.
	*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template <typename ContextType, typename ContextAllocatorType, typename FunctionType>
inline void ParallelForWithTaskContext(const TCHAR* DebugName, TArray<ContextType, ContextAllocatorType>& OutContexts, int32 Num, int32 MinBatchSize, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
	if (Num > 0)
	{
		const int32 NumContexts = ParallelForImpl::GetNumberOfThreadTasks(Num, MinBatchSize, Flags);
		OutContexts.Reset();
		OutContexts.AddDefaulted(NumContexts);
		ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, TArrayView<ContextType>(OutContexts));
	}
}

/**
*	General purpose parallel for that uses the taskgraph. This variant takes an array of user-defined context
*	objects for each task that may get spawned to do work (one task per context at most), and passes them to

#Loc: <Workspace>/Engine/Source/Runtime/Core/Public/Async/ParallelFor.h:786

Scope: file

Source code excerpt:

*	@param Contexts; User-privided array of user-defined task-level context objects
*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
*	@param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
*	@param Body; Function to call from multiple threads
*	@param Flags; Used to customize the behavior of the ParallelFor if needed.
*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template <typename ContextType, typename FunctionType>
inline void ParallelForWithExistingTaskContext(TArrayView<ContextType> Contexts, int32 Num, int32 MinBatchSize, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
	ParallelForImpl::ParallelForInternal(TEXT("ParallelFor Task"), Num, MinBatchSize, Body, [](){}, Flags, Contexts);
}

/**
*	General purpose parallel for that uses the taskgraph. This variant takes an array of user-defined context
*	objects for each task that may get spawned to do work (one task per context at most), and passes them to
*	the loop body to give it a task-local "workspace" that can be mutated without need for synchronization primitives.
*	@param DebugName; ProfilingScope and Debugname
*	@param Contexts; User-privided array of user-defined task-level context objects
*	@param Num; number of calls of Body; Body(0), Body(1)....Body(Num - 1)
*	@param MinBatchSize; Minimum Size of a Batch (will only launch DivUp(Num, MinBatchSize) Workers 
*	@param Body; Function to call from multiple threads
*	@param Flags; Used to customize the behavior of the ParallelFor if needed.
*	Notes: Please add stats around to calls to parallel for and within your lambda as appropriate. Do not clog the task graph with long running tasks or tasks that block.
**/
template <typename ContextType, typename FunctionType>
inline void ParallelForWithExistingTaskContext(const TCHAR* DebugName, TArrayView<ContextType> Contexts, int32 Num, int32 MinBatchSize, const FunctionType& Body, EParallelForFlags Flags = EParallelForFlags::None)
{
	ParallelForImpl::ParallelForInternal(DebugName, Num, MinBatchSize, Body, [](){}, Flags, Contexts);
}

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/Animation/AnimationSequenceCompiler.cpp:230

Scope (from outer to inner):

file
namespace    UE::Anim
function     void FAnimSequenceCompilingManager::ProcessAnimSequences

Source code excerpt:

		ProcessAnimSequences(bLimitExecutionTime);
		
		UpdateCompilationNotification();
	}

	void FAnimSequenceCompilingManager::ProcessAnimSequences(bool bLimitExecutionTime, int32 MinBatchSize)
	{
		const int32 NumRemaining = GetNumRemainingAssets();
		
		const int32 MaxToProcess = bLimitExecutionTime ? FMath::Max(64, NumRemaining / 10) : INT32_MAX;
		
		FObjectCacheContextScope ObjectCacheScope;
		if (NumRemaining && NumRemaining >= MinBatchSize)
		{
			TSet<UAnimSequence*> SequencesToProcess;
			for (TWeakObjectPtr<UAnimSequence>& AnimSequence : RegisteredAnimSequences)
			{
				if (AnimSequence.IsValid())
				{
					SequencesToProcess.Add(AnimSequence.Get());
				}
			}

			{

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/Animation/AnimationSequenceCompiler.h:36

Scope (from outer to inner):

file
namespace    UE::Anim
class        class FAnimSequenceCompilingManager : public IAssetCompilingManager

Source code excerpt:

		void FinishCompilation(TArrayView<UAnimSequence* const> InAnimSequences);
		void FinishCompilation(TArrayView<USkeleton* const> InSkeletons);

	protected:
		virtual void ProcessAsyncTasks(bool bLimitExecutionTime = false) override;
		void ProcessAnimSequences(bool bLimitExecutionTime, int32 MinBatchSize = 1);

		void PostCompilation(TArrayView<UAnimSequence* const> InAnimSequences);
		void ApplyCompilation(UAnimSequence* InAnimSequence);

		void UpdateCompilationNotification();

		void OnPostReachabilityAnalysis();
	private:
		friend class FAssetCompilingManager;
	
		TSet<TWeakObjectPtr<UAnimSequence>> RegisteredAnimSequences;

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/ShaderCompiler/ShaderCompilerDistributed.cpp:6

Scope (from outer to inner):

file
namespace    DistributedShaderCompilerVariables

Source code excerpt:

#include "ShaderCompiler.h"

namespace DistributedShaderCompilerVariables
{
	//TODO: Remove the XGE doublet
	int32 MinBatchSize = 50;
	FAutoConsoleVariableRef CVarXGEShaderCompileMinBatchSize(
        TEXT("r.XGEShaderCompile.MinBatchSize"),
        MinBatchSize,
        TEXT("This CVar is deprecated, please use r.ShaderCompiler.DistributedMinBatchSize"),
        ECVF_Default);

	FAutoConsoleVariableRef CVarDistributedMinBatchSize(
		TEXT("r.ShaderCompiler.DistributedMinBatchSize"),
		MinBatchSize,
		TEXT("Minimum number of shaders to compile with a distributed controller.\n")
		TEXT("Smaller number of shaders will compile locally."),
		ECVF_Default);

	static int32 GDistributedControllerTimeout = 15 * 60;
	static FAutoConsoleVariableRef CVarDistributedControllerTimeout(
		TEXT("r.ShaderCompiler.DistributedControllerTimeout"),
		GDistributedControllerTimeout,
		TEXT("Maximum number of seconds we expect to pass between getting distributed controller complete a task (this is used to detect problems with the distribution controllers).")
	);

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/ShaderCompiler/ShaderCompilerDistributed.cpp:172

Scope (from outer to inner):

file
function     int32 FShaderCompileDistributedThreadRunnable_Interface::CompilingLoop

Source code excerpt:

			// Grab as many jobs from the job queue as we can
			const EShaderCompileJobPriority Priority = (EShaderCompileJobPriority)PriorityIndex;
			const int32 MinBatchSize = (Priority == EShaderCompileJobPriority::Low) ? 1 : DistributedShaderCompilerVariables::MinBatchSize;
			const int32 NumJobs = Manager->AllJobs.GetPendingJobs(EShaderCompilerWorkerType::Distributed, Priority, MinBatchSize, INT32_MAX, PendingJobs);
			if (NumJobs > 0)
			{
				UE_LOG(LogShaderCompilers, Verbose, TEXT("Started %d 'Distributed' shader compile jobs with '%s' priority"),
					NumJobs,
					ShaderCompileJobPriorityToString((EShaderCompileJobPriority)PriorityIndex));
			}
			if (PendingJobs.Num() >= DistributedShaderCompilerVariables::MinBatchSize)
			{
				break;
			}
		}
	}

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/ShaderCompiler/ShaderCompilerDistributed.cpp:197

Scope (from outer to inner):

file
function     int32 FShaderCompileDistributedThreadRunnable_Interface::CompilingLoop

Source code excerpt:

		// Increase the batch size when more jobs are queued/in flight.

		// Build farm is much more prone to pool oversubscription, so make sure the jobs are submitted in batches of at least MinBatchSize
		int MinJobsPerBatch = GIsBuildMachine ? DistributedShaderCompilerVariables::MinBatchSize : 1;

		// Just to provide typical numbers: the number of total jobs is usually in tens of thousands at most, oftentimes in low thousands. Thus JobsPerBatch when calculated as a log2 rarely reaches the value of 16,
		// and that seems to be a sweet spot: lowering it does not result in faster completion, while increasing the number of jobs per batch slows it down.
		const uint32 JobsPerBatch = FMath::Max(MinJobsPerBatch, FMath::FloorToInt(FMath::LogX(2.f, PendingJobs.Num() + NumDispatchedJobs)));
		UE_LOG(LogShaderCompilers, Log, TEXT("Current jobs: %d, Batch size: %d, Num Already Dispatched: %d"), PendingJobs.Num(), JobsPerBatch, NumDispatchedJobs);

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/SkinnedAssetCompiler.cpp:280

Scope (from outer to inner):

file
function     void FSkinnedAssetCompilingManager::AddSkinnedAssets

Source code excerpt:

	TRACE_CPUPROFILER_EVENT_SCOPE(FSkinnedAssetCompilingManager::AddSkinnedAssets)
	check(IsInGameThread());

	// Wait until we gather enough mesh to process
	// to amortize the cost of scanning components
	//ProcessSkinnedAssets(32 /* MinBatchSize */);

	for (USkinnedAsset* SkinnedAsset : InSkinnedAssets)
	{
		check(SkinnedAsset->AsyncTask != nullptr);
		RegisteredSkinnedAsset.Emplace(SkinnedAsset);
	}

	UpdateCompilationNotification();
}

void FSkinnedAssetCompilingManager::FinishCompilation(TArrayView<USkinnedAsset* const> InSkinnedAssets)

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/SkinnedAssetCompiler.cpp:407

Scope (from outer to inner):

file
function     void FSkinnedAssetCompilingManager::ProcessSkinnedAssets

Source code excerpt:

void FSkinnedAssetCompilingManager::Reschedule()
{

}

void FSkinnedAssetCompilingManager::ProcessSkinnedAssets(bool bLimitExecutionTime, int32 MinBatchSize)
{
	using namespace SkinnedAssetCompilingManagerImpl;
	TRACE_CPUPROFILER_EVENT_SCOPE(FSkinnedAssetCompilingManager::ProcessSkinnedAssets);
	const int32 NumRemainingMeshes = GetNumRemainingJobs();
	// Spread out the load over multiple frames but if too many meshes, convergence is more important than frame time
	const int32 MaxMeshUpdatesPerFrame = bLimitExecutionTime ? FMath::Max(64, NumRemainingMeshes / 10) : INT32_MAX;

	FObjectCacheContextScope ObjectCacheScope;
	if (NumRemainingMeshes && NumRemainingMeshes >= MinBatchSize)
	{
		TSet<USkinnedAsset*> SkinnedAssetsToProcess;
		for (TWeakObjectPtr<USkinnedAsset>& SkinnedAsset : RegisteredSkinnedAsset)
		{
			if (SkinnedAsset.IsValid())
			{
				SkinnedAssetsToProcess.Add(SkinnedAsset.Get());
			}
		}

		{

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/StaticMeshCompiler.cpp:344

Scope (from outer to inner):

file
function     void FStaticMeshCompilingManager::AddStaticMeshes

Source code excerpt:

	TRACE_CPUPROFILER_EVENT_SCOPE(FStaticMeshCompilingManager::AddStaticMeshes)
	check(IsInGameThread());

	// Wait until we gather enough mesh to process
	// to amortize the cost of scanning components
	//ProcessStaticMeshes(32 /* MinBatchSize */);

	for (UStaticMesh* StaticMesh : InStaticMeshes)
	{
		check(StaticMesh->AsyncTask != nullptr);
		RegisteredStaticMesh.Emplace(StaticMesh);
	}

	TRACE_COUNTER_SET(QueuedStaticMeshCompilation, GetNumRemainingMeshes());
}

void FStaticMeshCompilingManager::FinishCompilation(TArrayView<UStaticMesh* const> InStaticMeshes)

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/StaticMeshCompiler.cpp:696

Scope (from outer to inner):

file
function     void FStaticMeshCompilingManager::ProcessStaticMeshes

Source code excerpt:

			}
		}
	}
}

void FStaticMeshCompilingManager::ProcessStaticMeshes(bool bLimitExecutionTime, int32 MinBatchSize)
{
	using namespace StaticMeshCompilingManagerImpl;
	LLM_SCOPE(ELLMTag::StaticMesh);
	TRACE_CPUPROFILER_EVENT_SCOPE(FStaticMeshCompilingManager::ProcessStaticMeshes);
	const int32 NumRemainingMeshes = GetNumRemainingMeshes();
	// Spread out the load over multiple frames but if too many meshes, convergence is more important than frame time
	const int32 MaxMeshUpdatesPerFrame = bLimitExecutionTime ? FMath::Max(64, NumRemainingMeshes / 10) : INT32_MAX;

	FObjectCacheContextScope ObjectCacheScope;
	if (NumRemainingMeshes && NumRemainingMeshes >= MinBatchSize)
	{
		TSet<UStaticMesh*> StaticMeshesToProcess;
		for (TWeakObjectPtr<UStaticMesh>& StaticMesh : RegisteredStaticMesh)
		{
			if (StaticMesh.IsValid())
			{
				StaticMeshesToProcess.Add(StaticMesh.Get());
			}
		}

		{

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Private/StaticMeshCompiler.cpp:706

Scope (from outer to inner):

file
function     void FStaticMeshCompilingManager::ProcessStaticMeshes

Source code excerpt:


	FObjectCacheContextScope ObjectCacheScope;
	if (NumRemainingMeshes && NumRemainingMeshes >= MinBatchSize)
	{
		TSet<UStaticMesh*> StaticMeshesToProcess;
		for (TWeakObjectPtr<UStaticMesh>& StaticMesh : RegisteredStaticMesh)
		{
			if (StaticMesh.IsValid())
			{

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Public/SkinnedAssetCompiler.h:87

Scope (from outer to inner):

file
class        class FSkinnedAssetCompilingManager : IAssetCompilingManager

Source code excerpt:

	bool bHasShutdown = false;
	TSet<TWeakObjectPtr<USkinnedAsset>> RegisteredSkinnedAsset;
	TUniquePtr<FAsyncCompilationNotification> Notification;
	void FinishCompilationsForGame();
	void Reschedule();
	void ProcessSkinnedAssets(bool bLimitExecutionTime, int32 MinBatchSize = 1);
	void UpdateCompilationNotification();

	void PostCompilation(USkinnedAsset* SkinnedAsset);
	void PostCompilation(TArrayView<USkinnedAsset* const> InSkinnedAssets);

	void OnPostReachabilityAnalysis();
	FDelegateHandle PostReachabilityAnalysisHandle;
	
	void OnPreGarbageCollect();
	FDelegateHandle PreGarbageCollectHandle;
};

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Public/SoundWaveCompiler.h:89

Scope (from outer to inner):

file
class        class FSoundWaveCompilingManager : IAssetCompilingManager

Source code excerpt:

	void FinishCompilationForObjects(TArrayView<UObject* const> InObjects) override;

	void UpdateCompilationNotification();
	void PostCompilation(TArrayView<USoundWave* const> InCompiledSoundWaves);
	void PostCompilation(USoundWave* SoundWave);
	void ProcessSoundWaves(bool bLimitExecutionTime, int32 MinBatchSize = 1);
	TArray<USoundWave*> GatherPendingSoundWaves();

	/** Notification for the amount of pending sound wave compilations */
	TUniquePtr<FAsyncCompilationNotification> Notification;
};

#endif

#if UE_ENABLE_INCLUDE_ORDER_DEPRECATED_IN_5_4
#include "CoreMinimal.h"
#include "AssetCompilingManager.h"

#Loc: <Workspace>/Engine/Source/Runtime/Engine/Public/StaticMeshCompiler.h:87

Scope (from outer to inner):

file
class        class FStaticMeshCompilingManager : IAssetCompilingManager

Source code excerpt:

	TSet<TWeakObjectPtr<UStaticMesh>> RegisteredStaticMesh;
	TUniquePtr<FAsyncCompilationNotification> Notification;

	void FinishCompilationsForGame();
	void Reschedule();
	void ProcessStaticMeshes(bool bLimitExecutionTime, int32 MinBatchSize = 1);
	void UpdateCompilationNotification();

	void PostCompilation(TArrayView<UStaticMesh* const> InStaticMeshes);
	void PostCompilation(UStaticMesh* StaticMesh);

	void OnPostReachabilityAnalysis();
	FDelegateHandle PostReachabilityAnalysisHandle;
};

#endif // #if WITH_EDITOR

#Loc: <Workspace>/Engine/Source/Runtime/Experimental/Chaos/Private/Chaos/Framework/Parallel.cpp:73

Scope (from outer to inner):

file
function     void Chaos::PhysicsParallelFor

Source code excerpt:

		InCallable(Idx);
	};

	const bool bSingleThreaded = !!GSingleThreadedPhysics || bDisablePhysicsParallelFor || bForceSingleThreaded;
	const EParallelForFlags Flags = (bSingleThreaded ? EParallelForFlags::ForceSingleThread : EParallelForFlags::None);
	const int32 MinBatchSize = ((MaxNumWorkers > 0) && (InNum > MaxNumWorkers)) ? FMath::DivideAndRoundUp(InNum, MaxNumWorkers) : 1;

	ParallelFor(TEXT("PhysicsParallelFor"), InNum, MinBatchSize, PassThrough, Flags);
	//::ParallelFor(InNum, PassThrough, !!GSingleThreadedPhysics || bDisablePhysicsParallelFor || bForceSingleThreaded);
}

void Chaos::PhysicsParallelForRange(int32 InNum, TFunctionRef<void(int32, int32)> InCallable, const int32 InMinBatchSize, bool bForceSingleThreaded)
{
	TRACE_CPUPROFILER_EVENT_SCOPE(Chaos_PhysicsParallelFor);
	using namespace Chaos;

	// Passthrough for now, except with global flag to disable parallel
#if PHYSICS_THREAD_CONTEXT
	const bool bIsInPhysicsSimContext = IsInPhysicsThreadContext();

#Loc: <Workspace>/Engine/Source/Runtime/Experimental/Chaos/Private/Chaos/Framework/Parallel.cpp:103

Scope (from outer to inner):

file
function     void Chaos::PhysicsParallelForRange

Source code excerpt:

	}
	NumWorkers = FMath::Min(NumWorkers, InNum);
	NumWorkers = FMath::Min(NumWorkers, MaxNumWorkers);
	check(NumWorkers > 0);
	int32 BatchSize = FMath::DivideAndRoundUp<int32>(InNum, NumWorkers);
	int32 MinBatchSize = FMath::Max(InMinBatchSize, MinRangeBatchSize);
	// @todo(mlentine): Find a better batch size in this case
	if (InNum < MinBatchSize)
	{
		NumWorkers = 1;
		BatchSize = InNum;
	}
	else
	{
		while (BatchSize < MinBatchSize && NumWorkers > 1)
		{
			NumWorkers /= 2;
			BatchSize = FMath::DivideAndRoundUp<int32>(InNum, NumWorkers);
		}
	}
	TArray<int32> RangeIndex;
	RangeIndex.Add(0);
	for (int32 i = 1; i <= NumWorkers; i++)
	{
		int32 PrevEnd = RangeIndex[i - 1];
		int32 NextEnd = FMath::Min(BatchSize + RangeIndex[i - 1], InNum);

#Loc: <Workspace>/Engine/Source/Runtime/Experimental/Chaos/Private/Chaos/Framework/Parallel.cpp:170

Scope (from outer to inner):

file
function     void Chaos::PhysicsParallelForWithContext

Source code excerpt:

		InCallable(Idx, ContextIndex);
	};

	const bool bSingleThreaded = !!GSingleThreadedPhysics || bDisablePhysicsParallelFor || bForceSingleThreaded;
	const EParallelForFlags Flags = bSingleThreaded ? (EParallelForFlags::ForceSingleThread) : (EParallelForFlags::None);
	const int32 MinBatchSize = ((MaxNumWorkers > 0) && (InNum > MaxNumWorkers)) ? FMath::DivideAndRoundUp(InNum, MaxNumWorkers) : 1;

	// Unfortunately ParallelForWithTaskContext takes an array of context objects - we don't use it and in our case
	// it ends up being an array where array[index] = index.
	// The reason we don't need it is that our ContextCreator returns the context index we want to use on a given
	// worker thread, and this is passed to the user function. The user function can just captures its array of
	// contexts and use the context indeex to get its context from it.
	TArray<int32, TInlineAllocator<16>> Contexts;

	::ParallelForWithTaskContext(TEXT("PhysicsParallelForWithContext"), Contexts, InNum, MinBatchSize, InContextCreator, PassThrough, Flags);
}


//class FRecursiveDivideTask
//{
//	TFuture<void> ThisFuture;
//	TFunctionRef<void(int32)> Callable;
//
//	int32 Begin;
//	int32 End;
//

#Loc: <Workspace>/Engine/Source/Runtime/Experimental/Chaos/Public/Chaos/Framework/Parallel.h:3

Scope (from outer to inner):

file
namespace    Chaos

Source code excerpt:


#include "Templates/Function.h"

namespace Chaos
{
	void CHAOS_API PhysicsParallelForRange(int32 InNum, TFunctionRef<void(int32, int32)> InCallable, const int32 MinBatchSize, bool bForceSingleThreaded = false);
	void CHAOS_API PhysicsParallelFor(int32 InNum, TFunctionRef<void(int32)> InCallable, bool bForceSingleThreaded = false);
	void CHAOS_API InnerPhysicsParallelForRange(int32 InNum, TFunctionRef<void(int32, int32)> InCallable, const int32 MinBatchSize, bool bForceSingleThreaded = false);
	void CHAOS_API InnerPhysicsParallelFor(int32 InNum, TFunctionRef<void(int32)> InCallable, bool bForceSingleThreaded = false);
	void CHAOS_API PhysicsParallelForWithContext(int32 InNum, TFunctionRef<int32 (int32, int32)> InContextCreator, TFunctionRef<void(int32, int32)> InCallable, bool bForceSingleThreaded = false);
	//void CHAOS_API PhysicsParallelFor_RecursiveDivide(int32 InNum, TFunctionRef<void(int32)> InCallable, bool bForceSingleThreaded = false);


	CHAOS_API extern int32 MaxNumWorkers;
	CHAOS_API extern int32 SmallBatchSize;
	CHAOS_API extern int32 LargeBatchSize;
#if UE_BUILD_SHIPPING
	const bool bDisablePhysicsParallelFor = false;
	const bool bDisableParticleParallelFor = false;

#Loc: <Workspace>/Engine/Source/Runtime/Projects/Private/PluginManager.cpp:1131

Scope (from outer to inner):

file
function     void FPluginManager::FindPluginsInDirectory

Source code excerpt:

	TArray<FString> DirectoriesToVisit;
	DirectoriesToVisit.Add(PluginsDirectory);

	FScopedSlowTask SlowTask(1000.f); // Pick an arbitrary amount of work that is resiliant to some floating point multiplication & division

	constexpr int32 MinBatchSize = 1;
	TArray<TArray<FString>> DirectoriesToVisitNext;
	FRWLock FoundFilesLock;
	while (DirectoriesToVisit.Num() > 0)
	{
		const float TotalWorkRemaining = SlowTask.TotalAmountOfWork - SlowTask.CompletedWork - SlowTask.CurrentFrameScope;
		SlowTask.EnterProgressFrame(TotalWorkRemaining);
		const int32 UnitsOfWorkTodoThisLoop = DirectoriesToVisit.Num();

		ParallelForWithTaskContext(TEXT("FindPluginsInDirectory.PF"),
			DirectoriesToVisitNext,
			DirectoriesToVisit.Num(),
			MinBatchSize,
			[&FoundFilesLock, &FileNames, &DirectoriesToVisit, &PlatformFile](TArray<FString>& OutDirectoriesToVisitNext, int32 Index)
			{
				// Track where we start pushing sub-directories to because we might want to discard them (if we end up finding a .uplugin).
				// Because of how `ParallelForWithTaskContext()` works, this array may already be populated from another execution,
				// so we have to be targeted about what we clear from the array.
				const int32 StartingDirIndex = OutDirectoriesToVisitNext.Num();

				FFindPluginsInDirectory_Visitor Visitor(OutDirectoriesToVisitNext); // This visitor writes directly to `OutDirectoriesToVisitNext`, which is why we have to manage its contents
				PlatformFile.IterateDirectory(*DirectoriesToVisit[Index], Visitor);
				if (!Visitor.FoundPluginFile.IsEmpty())
				{

#Loc: <Workspace>/Engine/Source/Runtime/Projects/Private/PluginManager.cpp:1143

Scope (from outer to inner):

file
function     void FPluginManager::FindPluginsInDirectory

Source code excerpt:

			DirectoriesToVisitNext,
			DirectoriesToVisit.Num(),
			MinBatchSize,
			[&FoundFilesLock, &FileNames, &DirectoriesToVisit, &PlatformFile](TArray<FString>& OutDirectoriesToVisitNext, int32 Index)
			{
				// Track where we start pushing sub-directories to because we might want to discard them (if we end up finding a .uplugin).
				// Because of how `ParallelForWithTaskContext()` works, this array may already be populated from another execution,
				// so we have to be targeted about what we clear from the array.
				const int32 StartingDirIndex = OutDirectoriesToVisitNext.Num();

#Loc: <Workspace>/Engine/Source/Runtime/Renderer/Private/DecalRenderingShared.cpp:356

Scope (from outer to inner):

file
function     FTransientDecalRenderDataList BuildVisibleDecalList

Source code excerpt:

			{
				TChunkedArray<FTransientDecalRenderData> VisibleDecals;
			};

			TArray<FVisibleDecalListContext> Contexts;
			const int32 MinBatchSize = 128;
			ParallelForWithTaskContext(
				TEXT("BuildVisibleDecalList_Parallel"),
				Contexts,
				Decals.Num(),
				MinBatchSize,
				[Decals, &View, ShaderPlatform, bIsPerspectiveProjection, FadeMultiplier](FVisibleDecalListContext& Context, int32 ItemIndex)
				{
					FTaskTagScope TaskTagScope(ETaskTag::EParallelRenderingThread);

					const FDeferredDecalProxy* DecalProxy = Decals[ItemIndex];

					FTransientDecalRenderData Data;

					if (ProcessDecal(DecalProxy, View, FadeMultiplier, ShaderPlatform, bIsPerspectiveProjection, Data))
					{
						Context.VisibleDecals.AddElement(MoveTemp(Data));

#Loc: <Workspace>/Engine/Source/Runtime/Renderer/Private/RayTracing/RayTracing.cpp:293

Scope (from outer to inner):

file
namespace    RayTracing
function     void GatherRelevantPrimitives

Source code excerpt:

				TChunkedArray<Nanite::CoarseMeshStreamingHandle> UsedCoarseMeshStreamingHandles;
				TChunkedArray<FPrimitiveSceneInfo*> DirtyCachedRayTracingPrimitives;
			};

			TArray<FGatherRelevantPrimitivesContext> Contexts;
			const int32 MinBatchSize = 128;
			ParallelForWithTaskContext(
				TEXT("GatherRayTracingRelevantPrimitives_Parallel"),
				Contexts,
				Scene.PrimitiveSceneProxies.Num(),
				MinBatchSize,
				[&Scene, &View, bGameView](FGatherRelevantPrimitivesContext& Context, int32 PrimitiveIndex)
			{
				// Get primitive visibility state from culling
				if (!View.PrimitiveRayTracingVisibilityMap[PrimitiveIndex])
				{
					return;
				}

				const ERayTracingPrimitiveFlags Flags = Scene.PrimitiveRayTracingFlags[PrimitiveIndex];

				check(!EnumHasAnyFlags(Flags, ERayTracingPrimitiveFlags::Exclude));

#Loc: <Workspace>/Engine/Source/Runtime/Renderer/Private/RayTracing/RayTracing.cpp:1302

Scope (from outer to inner):

file
namespace    RayTracing
function     bool GatherWorldInstancesForView
function     void DoTask

Source code excerpt:

					const uint32 BaseCachedStaticInstanceIndex = RayTracingScene.AddInstancesUninitialized(NumCachedStaticSceneInstances);

					const uint32 BaseCachedVisibleMeshCommandsIndex = VisibleRayTracingMeshCommands.AddUninitialized(NumCachedStaticVisibleMeshCommands);
					TArrayView<FVisibleRayTracingMeshCommand> CachedStaticVisibleRayTracingMeshCommands = TArrayView<FVisibleRayTracingMeshCommand>(VisibleRayTracingMeshCommands.GetData() + BaseCachedVisibleMeshCommandsIndex, NumCachedStaticVisibleMeshCommands);

					const int32 MinBatchSize = 128;
					ParallelFor(
						TEXT("RayTracingScene_AddCachedStaticInstances_ParallelFor"),
						RelevantCachedStaticPrimitives.Num(),
						MinBatchSize,
						[this, BaseCachedStaticInstanceIndex, CachedStaticVisibleRayTracingMeshCommands](int32 Index)
					{
						const FRelevantPrimitive& RelevantPrimitive = RelevantCachedStaticPrimitives[Index];
						const int32 PrimitiveIndex = RelevantPrimitive.PrimitiveIndex;
						FPrimitiveSceneInfo* SceneInfo = Scene.Primitives[PrimitiveIndex];
						FPrimitiveSceneProxy* SceneProxy = Scene.PrimitiveSceneProxies[PrimitiveIndex];
						ERayTracingPrimitiveFlags Flags = Scene.PrimitiveRayTracingFlags[PrimitiveIndex];
						const FPersistentPrimitiveIndex PersistentPrimitiveIndex = RelevantPrimitive.PersistentPrimitiveIndex;

						check(EnumHasAnyFlags(Flags, ERayTracingPrimitiveFlags::CacheInstances));